Skip to content

[Feat]Wa dump blockwise#992

Open
wuhuxiao wants to merge 2 commits into
ModelEngine-Group:developfrom
wuhuxiao:whx_pr
Open

[Feat]Wa dump blockwise#992
wuhuxiao wants to merge 2 commits into
ModelEngine-Group:developfrom
wuhuxiao:whx_pr

Conversation

@wuhuxiao
Copy link
Copy Markdown
Contributor

@wuhuxiao wuhuxiao commented Jun 3, 2026

Purpose

Add configurable block-wise WA KV cache dumping for the FAWA connector.

Previously, WA cache dumping only persisted the final WA tail block for each chunk prefill. This made WA external-cache reuse depend on chunk boundaries and could miss intermediate canonical block boundaries. This PR adds an option to dump WA cache block-wise so each canonical hash block can persist its corresponding WA tail state.

Modifications

  • Add wa_dump_block_wise config option, defaulting to true.
  • Update FAWA dispatch metadata generation to support two WA dump modes:
    • block-wise WA dump: dump WA tail rows for every canonical block boundary.
    • chunk-wise WA dump: keep the old behavior and dump only the final WA tail per chunk.
  • Keep WA load behavior loading only the final matched WA boundary.
  • Update TP dump slicing so block-wise WA dump follows the same TP key partitioning as FA dump.
  • Split configured posix_capacity_gb between FA and WA stores.
  • Document wa_dump_block_wise in examples/ucm_config_example.yaml.

Test

@wuhuxiao wuhuxiao changed the title add wa dump blockwise [feat]add wa dump blockwise Jun 3, 2026
@wuhuxiao wuhuxiao changed the title [feat]add wa dump blockwise [Feat]Wa dump blockwise Jun 3, 2026
Comment thread ucm/integration/vllm/hma_connector.py
Comment thread ucm/integration/vllm/hma_connector.py Outdated
self._role != KVConnectorRole.WORKER and dp_rank == 0
)
if config.get("posix_capacity_gb", None) is not None:
config["posix_capacity_gb"] = int(config["posix_capacity_gb"]) // 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion: This division of posix_capacity_gb by 2 needs documentation. Why is the capacity halved? Is it to split capacity between FA and WA stores? Consider adding a comment explaining the rationale to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants