fix(ccr): protect headroom_retrieve outputs on the chat-completions path#1176
Open
maxsturmb wants to merge 2 commits into
Open
fix(ccr): protect headroom_retrieve outputs on the chat-completions path#1176maxsturmb wants to merge 2 commits into
maxsturmb wants to merge 2 commits into
Conversation
The Responses path tracks retrieve call_ids and skips compressing their outputs (proxy/handlers/openai.py ~L766-837). The Chat-Completions path had no equivalent, so a retrieved original — the expanded content the model explicitly asked for — got re-compressed back into a <<ccr:...>> marker on the next turn (marker-in / marker-out: retrieval never returned content). Capture pristine headroom_retrieve outputs (keyed by tool_call_id) before compression and restore them after, mirroring the Responses-path guard. Non-retrieve tool outputs stay compressible. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
PR governanceThis PR follows the template and is marked ready for human review. |
JerrettDavis
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The Chat-Completions path does not protect
headroom_retrievetool outputs fromre-compression, so CCR retrieval is not actually reversible there.
The Responses path already guards this — it collects the
call_ids ofheadroom_retrievefunction calls and skips compressing their outputs (
proxy/handlers/openai.py, ~L766–837,reason
headroom_retrieve_output_protected).handle_openai_chathas no equivalent.Effect: a model calls
headroom_retrieve, the tool layer returns the expanded original as arole:"tool"message, and on the next turn the proxy compresses that tool output again —turning the retrieved original straight back into a
<<ccr:...>>marker (or a lossy rewrite).From the model's side, retrieval returns a marker instead of content: marker-in / marker-out.
compress_unit_with_router's existing_CCR_MARKER_REhandling does not catch it, because afreshly-retrieved original contains no marker.
Closes #1077
Type of Change
Changes Made
Mirror the Responses-path guard on the chat path, keyed by
tool_call_id(survives reordering):_headroom_retrieve_tool_call_ids(messages)— find thetool_call_ids of assistanttool_callsnamedheadroom_retrieve(or*__headroom_retrieve, covering MCP namespacing).capture_headroom_retrieve_outputs(messages)— before compression, snapshot the pristinecontent of every
role:"tool"message whosetool_call_idbelongs to such a call.restore_headroom_retrieve_outputs(messages, protected)— after compression, restore thosemessages in place (matched by
tool_call_id). Returns the count restored.handle_openai_chat: capture immediately before theCompressionDecisionruns, restore right after the
INPUT_COMPRESSEDevent.Non-retrieve tool outputs stay fully compressible — only
headroom_retrieveresults are protected.Files:
headroom/proxy/handlers/openai.py(+84),tests/test_ccr_chat_retrieve_protection.py(+88).Testing
pytest)ruff check .) — ruff not available in the verification env (see Additional Notes)mypy headroom) — mypy not available in the verification env (see Additional Notes)Test Output
Real Behavior Proof
fix/ccr-chat-retrieve-reentrancy, Python 3.13; live A/B against the running proxy with a recording mock upstream (this deployment serves only/v1/chat/completions).headroom_retrieveassistant tool call plus its ~59.8 KBrole:"tool"output, and an identical-content non-retrieve tool output as a control; send it through the proxy and diff the body forwarded upstream byte-for-byte against the original. Unit side:pytest tests/test_ccr_chat_retrieve_protection.py(7 passed) plus an 11/11 router round-trip.Review Readiness
Checklist
Additional Notes
introducing a new mechanism, to keep behaviour and reasoning consistent across both API formats.
ruff/mypywere not present in the verification environment, so those two boxes are leftunchecked rather than claimed; the change is small, type-annotated, and follows the surrounding
style of
openai.py. Happy to paste linter/type-check output if a maintainer points me at theexpected toolchain.
CHANGELOG.md/ docs entry added — flagging as N/A; will add on request if the project expects one.