Avoid staging local KV cache payloads by leblancfg · Pull Request #353 · antirez/ds4

leblancfg · 2026-06-07T23:57:58Z

Problem

Disk KV saves currently write the session payload twice for local sessions:

serialize the DS4 payload to a staged temp file;
ds4_session_stage_payload(session, &staged, ...)
copy that staged payload into the final .kv.tmp... cache file;
ds4_session_write_staged_payload(&staged, fp, ...)
rename the final temp file into place.
rename(tmp, path)

Current path: session -> staged payload tmp -> final .kv.tmp -> rename.
This PR: session -> final .kv.tmp -> rename.

For long contexts, this payload can be multi-GiB. The extra staged-file copy adds latency to cold/continued/session saves and also raises peak temporary disk usage.

In this PR

When the engine can predict the payload size with ds4_session_payload_bytes(), write the payload directly into the final temporary cache file.

The cache file is still atomic from the reader's point of view: it is written as .tmp... and renamed only after the full header, text, payload and trailer are written successfully. If the direct write fails, the temp file is unlinked as before.

Unknown-size payloads still use the old staged path. In practice this preserves the existing distributed-session behavior, since distributed payload size is not predicted by ds4_session_payload_bytes().

Implementation details:

add ds4_session_save_payload_counted(), a small wrapper around the existing ds4_session_save_payload() serializer;
keep the payload format in one implementation instead of duplicating serialization code;
server and agent save paths use direct writes only when the expected payload size is known;
after direct write, verify that the measured bytes written match the expected payload size;
no loader or cache format change.

Benchmark

I used a long-context server run because this is where disk KV saves are large enough for the extra copy to matter. The benchmark does not try to show faster prefill, just measures the save path around normal long-context checkpoints.

Machine: Apple M2 Max, 64 GiB unified memory, Metal SSD streaming, 32GB routed expert cache.

Command:

./ds4-server \
  -m ./ds4flash.gguf \
  --ssd-streaming \
  --ssd-streaming-cache-experts 32GB \
  --ctx 524288 \
  --kv-disk-dir "$CACHE" \
  --kv-disk-space-mb 200000 \
  --kv-cache-min-tokens 512 \
  --kv-cache-cold-max-tokens 520192 \
  --kv-cache-continued-interval-tokens 10000 \
  --kv-cache-boundary-trim-tokens 32 \
  --kv-cache-boundary-align-tokens 2048

Request:

prompt: first 405,181 bytes of speed-bench/promessi_sposi.txt;
prompt tokens: 128,190;
model=deepseek-chat, thinking=false, temperature=0, max_tokens=1;
fresh KV cache dir for each run.

For main, I used a temporary timing-only log around ds4_session_stage_payload() so the old path can be measured as serialize to temp + copy temp to cache. That instrumentation is not part of this PR.

Per-checkpoint save times, excluding shutdown:

tokens saved	KV file	old: serialize temp	old: copy temp	old total	new direct	delta
20,480	291.77 MiB	142.0 ms	76.3 ms	218.3 ms	170.6 ms	-47.7 ms
40,960	560.66 MiB	267.7 ms	144.2 ms	411.9 ms	289.0 ms	-122.9 ms
61,440	829.55 MiB	392.3 ms	221.8 ms	614.1 ms	378.5 ms	-235.6 ms
81,920	1098.44 MiB	526.4 ms	441.3 ms	967.7 ms	515.9 ms	-451.8 ms
102,400	1367.33 MiB	616.5 ms	955.7 ms	1572.2 ms	602.6 ms	-969.6 ms
122,880	1636.22 MiB	740.3 ms	1049.0 ms	1789.3 ms	705.0 ms	-1084.3 ms
126,976	1690.00 MiB	728.4 ms	1014.2 ms	1742.6 ms	716.7 ms	-1025.9 ms

Tests

make clean
make -j4
git diff --check
./ds4_test --server
./ds4-eval --self-test-extractors
make q4k-dot-test
make cpu

Additional live cache check on the direct-save branch, reusing the 128k-token benchmark cache:

cache hit restored 126,976 tokens from disk
only 1,214 prompt suffix tokens were prefetched
reported usage: cached_tokens=126976, cache_write_tokens=1214
cache load log: load=355.5 ms for the 1.69 GiB cold checkpoint

Stream KV payload saves

8f97b28

leblancfg marked this pull request as ready for review June 8, 2026 01:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid staging local KV cache payloads#353

Avoid staging local KV cache payloads#353
leblancfg wants to merge 1 commit into
antirez:mainfrom
leblancfg:kv-streaming-write

leblancfg commented Jun 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leblancfg commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

In this PR

Benchmark

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leblancfg commented Jun 7, 2026 •

edited

Loading