[Feat] Add vLLM-native UCM connector metrics by dante159753 · Pull Request #995 · ModelEngine-Group/unified-cache-management

dante159753 · 2026-06-05T01:11:48Z

Summary

This PR wires UCM metrics into vLLM's KV connector metrics path without changing the C++ metrics core semantics.

Changes

Add a Python-side UCM metrics dispatcher that drains the existing C++ metrics core once and fans out snapshots to multiproc and vllm_connector consumers.
Add YAML config support for multiproc_prefix, vllm_connector_prefix, and fixed consumer switches.
Expose vLLM connector Prometheus metrics with vllm:ucm_* names and model_name, engine, worker_rank labels.
Keep the existing multiprocess metrics path compatible through get_all_stats_and_clear() and ucm: output when multiproc is enabled.
Update Grafana dashboards to query vllm:ucm_* metrics and add a KV cache hit-rate breakdown panel for GPU prefix cache vs external connector cache.

Why

The old UCM metrics path could only be scraped reliably from the master-side multiprocess logger in multi-node deployments. vLLM now supports KV connector metrics aggregation, so UCM connector metrics can be reported through the vLLM-native stats path and include worker-side connector observations.

Impact

Existing metrics_config_path remains the single configuration entry point.
Deployments can keep multiprocess UCM metrics, enable vLLM connector metrics, or run both via the config consumers.
vLLM-native metrics use the vllm: prefix required by vLLM's metrics snapshot path.
Duration metrics are exported as seconds on the vLLM side while dashboards preserve the existing millisecond display where appropriate.

Verification

uv run --no-project --with pytest python -m pytest --noconftest test\test_ucm_connector_metrics.py -q
python -m py_compile ucm\metrics_config.py ucm\metrics_dispatcher.py ucm\integration\vllm\metrics.py ucm\integration\vllm\ucm_connector.py ucm\observability.py
git diff --check
Parsed all modified Grafana dashboard JSON files and checked duplicate panel ids / grid overlap.
Live verified on a2 with vLLM 0.18.0 + vLLM-Ascend 0.18.0rc1 using NPU 0:
- UCM branch installed and service started successfully with /health and /metrics returning 200.
- /metrics exposed vllm:ucm_* series with worker_rank labels.
- vLLM prefix cache counters used by the new dashboard panel were present: vllm:prefix_cache_* and vllm:external_prefix_cache_*.

Add YAML-controlled multiproc and vLLM connector metrics consumers. Fan out a single UCM metrics core drain into Python-side consumer buffers. Expose vLLM connector metrics with vllm: prefixes and worker_rank labels.

Switch UCM dashboard queries to vLLM connector metric names. Add a KV cache hit-rate breakdown panel to the vLLM dashboard.

ygwpz · 2026-06-05T06:10:36Z

+    if not config_path:
+        return {}
+    try:
+        import yaml


💡 Suggestion: The yaml import is inside a try-except block in load_metrics_config, but the ImportError is logged and returns empty dict. This means the function silently fails if PyYAML is not installed. Consider raising an exception when PyYAML is required (if a YAML config path is provided) to make the failure more explicit, or document this dependency clearly.

ygwpz · 2026-06-05T06:10:38Z

+        if len(current_counts) < len(bucket_counts):
+            current_counts.extend([0] * (len(bucket_counts) - len(current_counts)))
+        for index, count in enumerate(bucket_counts):
+            current_counts[index] += int(count)


💡 Suggestion: When count is very large, int(count) could be problematic if count is a float that represents a very large number. Consider validating that count is within reasonable bounds before accumulating, or using int(round(count)) to avoid floating-point precision issues when converting large floats to integers.

ygwpz · 2026-06-05T06:10:40Z

+            getattr(value, "sum", 0.0)
+        )
+
+    def _empty_stats(self):


💡 Suggestion: The _histogram_tuple function returns empty list for bucket_counts if the value doesn't match expected formats. This could lead to silently dropped histogram data. Consider logging a warning when histogram data format is unrecognized:

if not bucket_counts: logger.warning(f"Unrecognized histogram format: {type(value)}")

ygwpz · 2026-06-05T06:10:45Z

+    def reduce(self) -> dict[str, int | float]:
+        return {
+            "ucm_load_requests_total": self._histogram_sum("load_requests_num"),
+            "ucm_load_blocks_total": self._histogram_sum("load_blocks_num"),


⚠️ Warning: The reduce() method returns a hardcoded dict of specific metric names (ucm_load_requests_total, etc.). This means if new metrics are added to the config, they won be included in the reduced output. Consider making this method configurable or iterating over all histogram metrics in self.data to provide a complete summary.

ygwpz · 2026-06-05T06:11:31Z

+
+CONSUMERS = (MULTIPROC_CONSUMER, VLLM_CONNECTOR_CONSUMER)
+_DISPATCHER: "MetricsDispatcher | None" = None
+_DISPATCHER_LOCK = threading.Lock()


⚠️ Warning: The singleton pattern in get_metrics_dispatcher does not provide a way to reset the dispatcher. If the config changes (e.g., during hot-reload or testing), the old dispatcher will continue to be used. Consider adding a reset_metrics_dispatcher() function or passing a force_new parameter for scenarios where config needs to be reloaded:

def reset_metrics_dispatcher(): global _DISPATCHER with _DISPATCHER_LOCK: _DISPATCHER = None

dante159753 added 2 commits June 4, 2026 20:32

[Feat] Add Python-side UCM metrics dispatcher

5cdcf55

Add YAML-controlled multiproc and vLLM connector metrics consumers. Fan out a single UCM metrics core drain into Python-side consumer buffers. Expose vLLM connector metrics with vllm: prefixes and worker_rank labels.

[Feat] Update Grafana dashboards for vLLM UCM metrics

74e6f97

Switch UCM dashboard queries to vLLM connector metric names. Add a KV cache hit-rate breakdown panel to the vLLM dashboard.

dante159753 requested review from FangRun2, Infinite666, Tarrei, Wwwzff, harrisonyhq, mag1c-h, qyh111 and ygwpz as code owners June 5, 2026 01:11

dante159753 added 2 commits June 5, 2026 11:34

Add layerwise batch breakdown metrics

749d197

Merge branch 'layerwise-batch-breakdown' into ucm-vllm-connector-metrics

3e373d0

ygwpz reviewed Jun 5, 2026

View reviewed changes

[Feat] Align vLLM UCM metrics dashboards

954a4a8

dante159753 requested a review from flesher0813 as a code owner June 5, 2026 06:38

dante159753 added 2 commits June 5, 2026 15:20

[Feat] Add direct connector task breakdown metrics

052a683

[Feat] Isolate vLLM UCM Grafana dashboards

1209a79

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Add vLLM-native UCM connector metrics#995

[Feat] Add vLLM-native UCM connector metrics#995
dante159753 wants to merge 7 commits into
ModelEngine-Group:developfrom
dante159753:ucm-vllm-connector-metrics

dante159753 commented Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

ygwpz Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dante159753 commented Jun 5, 2026

Summary

Changes

Why

Impact

Verification

Uh oh!

ygwpz Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

ygwpz Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants