[opt]Defer UCM KV Dump Waiting Until Request Completion#989
Conversation
# Conflicts: # ucm/integration/vllm/device.py # ucm/integration/vllm/ucm_connector.py
# Conflicts: # ucm/integration/vllm/ucm_connector.py
|
I found one issue in the CP path: |
I think this does not apply to the CP path. UCMCPConnector does not override build_connector_meta(); its MRO is UCMCPConnector -> UCMLayerWiseConnector -> UCMDirectConnector, and UCMLayerWiseConnector does not override build_connector_meta either. Therefore CP uses UCMDirectConnector.build_connector_meta(), which already adds dump requests to scheduler-side _async_dump_req_ids before returning metadata. |
Purpose
Move UCM KV dump completion waiting out of
wait_for_save()so model execution is no longer blocked by async dump tasks, while still preserving delayed block release correctness for finished requests.This also fixes the non-HMA
MultiConnectorpath by forwardingrequest_finished()from the outerUCMConnectorto the actual inner connector, ensuring requests that UCM saves asynchronously are properly marked for delayed free before they are returned fromget_finished().Modifications
wait_for_save()to submit dump tasks without blocking on task completion.get_finished()for requests that have finished and require delayed block release.wait_for_save()path because the timing is no longer accurate after deferring completion.UCMConnector.request_finished()forwarding for the non-HMA path soMultiConnectorcan correctly count UCM async saves.origin/developand resolved conflicts with the new layerwise / pipeline store metrics changes.Test