Skip to content

[opt]Defer UCM KV Dump Waiting Until Request Completion#989

Open
qyh111 wants to merge 4 commits into
ModelEngine-Group:developfrom
qyh111:dev_qyh
Open

[opt]Defer UCM KV Dump Waiting Until Request Completion#989
qyh111 wants to merge 4 commits into
ModelEngine-Group:developfrom
qyh111:dev_qyh

Conversation

@qyh111
Copy link
Copy Markdown
Contributor

@qyh111 qyh111 commented Jun 1, 2026

Purpose

Move UCM KV dump completion waiting out of wait_for_save() so model execution is no longer blocked by async dump tasks, while still preserving delayed block release correctness for finished requests.

This also fixes the non-HMA MultiConnector path by forwarding request_finished() from the outer UCMConnector to the actual inner connector, ensuring requests that UCM saves asynchronously are properly marked for delayed free before they are returned from get_finished().

Modifications

  • Added async dump task tracking for UCM Direct and LayerWise connectors.
  • Changed wait_for_save() to submit dump tasks without blocking on task completion.
  • Moved dump completion waiting to get_finished() for requests that have finished and require delayed block release.
  • Added preemption handling to wait only for pending dump tasks related to preempted requests.
  • Added per-task event handle release after dump task completion or submission failure.
  • Removed save-duration metrics from the delayed wait_for_save() path because the timing is no longer accurate after deferring completion.
  • Updated LayerWise save flow to use the same delayed dump completion mechanism.
  • Added UCMConnector.request_finished() forwarding for the non-HMA path so MultiConnector can correctly count UCM async saves.
  • Rebased the branch on latest origin/develop and resolved conflicts with the new layerwise / pipeline store metrics changes.

Test

qyh111 added 4 commits June 1, 2026 19:28
# Conflicts:
#	ucm/integration/vllm/device.py
#	ucm/integration/vllm/ucm_connector.py
# Conflicts:
#	ucm/integration/vllm/ucm_connector.py
@qyh111 qyh111 changed the title Dev qyh delay wait for save Jun 1, 2026
@qyh111 qyh111 changed the title delay wait for save [opt]Defer UCM KV Dump Waiting Until Request Completion Jun 1, 2026
Comment thread ucm/integration/vllm/ucm_connector.py
Comment thread ucm/integration/vllm/ucm_connector.py
Comment thread ucm/integration/vllm/ucm_connector.py
Comment thread ucm/integration/vllm/ucm_connector.py
@dante159753
Copy link
Copy Markdown
Contributor

dante159753 commented Jun 2, 2026

I found one issue in the CP path: UCMCPConnector.build_connector_meta() does not mirror UCMDirectConnector.build_connector_meta() by adding requests with dump_block_ids to the scheduler-side _async_dump_req_ids set. As a result, request_finished() returns False in the CP path, so vLLM may free the blocks immediately while the worker-side dump task has been deferred until get_finished(). That leaves a window where the blocks can be reused or overwritten before the dump wait completes. Please mirror the DirectConnector tracking logic in this override before returning the metadata.

@qyh111
Copy link
Copy Markdown
Contributor Author

qyh111 commented Jun 2, 2026

I found one issue in the CP path: UCMCPConnector.build_connector_meta() does not mirror UCMDirectConnector.build_connector_meta() by adding requests with dump_block_ids to the scheduler-side _async_dump_req_ids set. As a result, request_finished() returns False in the CP path, so vLLM may free the blocks immediately while the worker-side dump task has been deferred until get_finished(). That leaves a window where the blocks can be reused or overwritten before the dump wait completes. Please mirror the DirectConnector tracking logic in this override before returning the metadata.

I think this does not apply to the CP path. UCMCPConnector does not override build_connector_meta(); its MRO is UCMCPConnector -> UCMLayerWiseConnector -> UCMDirectConnector, and UCMLayerWiseConnector does not override build_connector_meta either. Therefore CP uses UCMDirectConnector.build_connector_meta(), which already adds dump requests to scheduler-side _async_dump_req_ids before returning metadata.

Comment thread ucm/integration/vllm/ucm_connector.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants