[Feat] Add vLLM-native UCM connector metrics#995
Conversation
Add YAML-controlled multiproc and vLLM connector metrics consumers. Fan out a single UCM metrics core drain into Python-side consumer buffers. Expose vLLM connector metrics with vllm: prefixes and worker_rank labels.
Switch UCM dashboard queries to vLLM connector metric names. Add a KV cache hit-rate breakdown panel to the vLLM dashboard.
| if not config_path: | ||
| return {} | ||
| try: | ||
| import yaml |
There was a problem hiding this comment.
💡 Suggestion: The yaml import is inside a try-except block in load_metrics_config, but the ImportError is logged and returns empty dict. This means the function silently fails if PyYAML is not installed. Consider raising an exception when PyYAML is required (if a YAML config path is provided) to make the failure more explicit, or document this dependency clearly.
| if len(current_counts) < len(bucket_counts): | ||
| current_counts.extend([0] * (len(bucket_counts) - len(current_counts))) | ||
| for index, count in enumerate(bucket_counts): | ||
| current_counts[index] += int(count) |
There was a problem hiding this comment.
💡 Suggestion: When count is very large, int(count) could be problematic if count is a float that represents a very large number. Consider validating that count is within reasonable bounds before accumulating, or using int(round(count)) to avoid floating-point precision issues when converting large floats to integers.
| getattr(value, "sum", 0.0) | ||
| ) | ||
|
|
||
| def _empty_stats(self): |
There was a problem hiding this comment.
💡 Suggestion: The _histogram_tuple function returns empty list for bucket_counts if the value doesn't match expected formats. This could lead to silently dropped histogram data. Consider logging a warning when histogram data format is unrecognized:
if not bucket_counts:
logger.warning(f"Unrecognized histogram format: {type(value)}")| def reduce(self) -> dict[str, int | float]: | ||
| return { | ||
| "ucm_load_requests_total": self._histogram_sum("load_requests_num"), | ||
| "ucm_load_blocks_total": self._histogram_sum("load_blocks_num"), |
There was a problem hiding this comment.
reduce() method returns a hardcoded dict of specific metric names (ucm_load_requests_total, etc.). This means if new metrics are added to the config, they won be included in the reduced output. Consider making this method configurable or iterating over all histogram metrics in self.data to provide a complete summary.
|
|
||
| CONSUMERS = (MULTIPROC_CONSUMER, VLLM_CONNECTOR_CONSUMER) | ||
| _DISPATCHER: "MetricsDispatcher | None" = None | ||
| _DISPATCHER_LOCK = threading.Lock() |
There was a problem hiding this comment.
get_metrics_dispatcher does not provide a way to reset the dispatcher. If the config changes (e.g., during hot-reload or testing), the old dispatcher will continue to be used. Consider adding a reset_metrics_dispatcher() function or passing a force_new parameter for scenarios where config needs to be reloaded:
def reset_metrics_dispatcher():
global _DISPATCHER
with _DISPATCHER_LOCK:
_DISPATCHER = None
Summary
This PR wires UCM metrics into vLLM's KV connector metrics path without changing the C++ metrics core semantics.
Changes
multiprocandvllm_connectorconsumers.multiproc_prefix,vllm_connector_prefix, and fixed consumer switches.vllm:ucm_*names andmodel_name,engine,worker_ranklabels.get_all_stats_and_clear()anducm:output whenmultiprocis enabled.vllm:ucm_*metrics and add a KV cache hit-rate breakdown panel for GPU prefix cache vs external connector cache.Why
The old UCM metrics path could only be scraped reliably from the master-side multiprocess logger in multi-node deployments. vLLM now supports KV connector metrics aggregation, so UCM connector metrics can be reported through the vLLM-native stats path and include worker-side connector observations.
Impact
metrics_config_pathremains the single configuration entry point.vllm:prefix required by vLLM's metrics snapshot path.Verification
uv run --no-project --with pytest python -m pytest --noconftest test\test_ucm_connector_metrics.py -qpython -m py_compile ucm\metrics_config.py ucm\metrics_dispatcher.py ucm\integration\vllm\metrics.py ucm\integration\vllm\ucm_connector.py ucm\observability.pygit diff --check/healthand/metricsreturning 200./metricsexposedvllm:ucm_*series withworker_ranklabels.vllm:prefix_cache_*andvllm:external_prefix_cache_*.