Skip to content

fix(ccr): protect headroom_retrieve outputs on the chat-completions path#1176

Open
maxsturmb wants to merge 2 commits into
headroomlabs-ai:mainfrom
maxsturmb:fix/ccr-chat-retrieve-reentrancy
Open

fix(ccr): protect headroom_retrieve outputs on the chat-completions path#1176
maxsturmb wants to merge 2 commits into
headroomlabs-ai:mainfrom
maxsturmb:fix/ccr-chat-retrieve-reentrancy

Conversation

@maxsturmb

@maxsturmb maxsturmb commented Jun 19, 2026

Copy link
Copy Markdown

Description

The Chat-Completions path does not protect headroom_retrieve tool outputs from
re-compression, so CCR retrieval is not actually reversible there.

The Responses path already guards this — it collects the call_ids of headroom_retrieve
function calls and skips compressing their outputs (proxy/handlers/openai.py, ~L766–837,
reason headroom_retrieve_output_protected). handle_openai_chat has no equivalent.

Effect: a model calls headroom_retrieve, the tool layer returns the expanded original as a
role:"tool" message, and on the next turn the proxy compresses that tool output again —
turning the retrieved original straight back into a <<ccr:...>> marker (or a lossy rewrite).
From the model's side, retrieval returns a marker instead of content: marker-in / marker-out.
compress_unit_with_router's existing _CCR_MARKER_RE handling does not catch it, because a
freshly-retrieved original contains no marker.

Closes #1077

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

Mirror the Responses-path guard on the chat path, keyed by tool_call_id (survives reordering):

  • _headroom_retrieve_tool_call_ids(messages) — find the tool_call_ids of assistant
    tool_calls named headroom_retrieve (or *__headroom_retrieve, covering MCP namespacing).
  • capture_headroom_retrieve_outputs(messages)before compression, snapshot the pristine
    content of every role:"tool" message whose tool_call_id belongs to such a call.
  • restore_headroom_retrieve_outputs(messages, protected)after compression, restore those
    messages in place (matched by tool_call_id). Returns the count restored.
  • Two-line wiring in handle_openai_chat: capture immediately before the CompressionDecision
    runs, restore right after the INPUT_COMPRESSED event.

Non-retrieve tool outputs stay fully compressible — only headroom_retrieve results are protected.

Files: headroom/proxy/handlers/openai.py (+84), tests/test_ccr_chat_retrieve_protection.py (+88).

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check .) — ruff not available in the verification env (see Additional Notes)
  • Type checking passes (mypy headroom) — mypy not available in the verification env (see Additional Notes)
  • New tests added for new functionality
  • Manual testing performed

Test Output

$ pytest tests/test_ccr_chat_retrieve_protection.py -q
collected 7 items
tests/test_ccr_chat_retrieve_protection.py .......                       [100%]
========================= 7 passed, 1 warning in 0.13s =========================

Real Behavior Proof

  • Environment: Headroom v0.26.0, branch fix/ccr-chat-retrieve-reentrancy, Python 3.13; live A/B against the running proxy with a recording mock upstream (this deployment serves only /v1/chat/completions).
  • Exact command / steps: craft a request whose history holds a headroom_retrieve assistant tool call plus its ~59.8 KB role:"tool" output, and an identical-content non-retrieve tool output as a control; send it through the proxy and diff the body forwarded upstream byte-for-byte against the original. Unit side: pytest tests/test_ccr_chat_retrieve_protection.py (7 passed) plus an 11/11 router round-trip.
  • Observed result: unpatched, the retrieve output was forwarded as 59787 -> 819 bytes (mangled); patched, it is forwarded 59787 -> 59787 bytes byte-identical with timestamps intact, while the identical-content non-retrieve control still compresses to 819 bytes (protection is scoped, not blanket).
  • Not tested: streaming responses (the affected path operates on request-side message history, not the response stream); and a host agent's own context compression re-compressing the retrieved output in very long sessions (out of scope for this proxy fix).

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md if applicable

Additional Notes

  • The fix deliberately mirrors the existing, accepted Responses-path protection rather than
    introducing a new mechanism, to keep behaviour and reasoning consistent across both API formats.
  • ruff / mypy were not present in the verification environment, so those two boxes are left
    unchecked rather than claimed; the change is small, type-annotated, and follows the surrounding
    style of openai.py. Happy to paste linter/type-check output if a maintainer points me at the
    expected toolchain.
  • No CHANGELOG.md / docs entry added — flagging as N/A; will add on request if the project expects one.

The Responses path tracks retrieve call_ids and skips compressing their
outputs (proxy/handlers/openai.py ~L766-837). The Chat-Completions path had
no equivalent, so a retrieved original — the expanded content the model
explicitly asked for — got re-compressed back into a <<ccr:...>> marker on
the next turn (marker-in / marker-out: retrieval never returned content).

Capture pristine headroom_retrieve outputs (keyed by tool_call_id) before
compression and restore them after, mirroring the Responses-path guard.
Non-retrieve tool outputs stay compressible.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR follows the template and is marked ready for human review.

@github-actions github-actions Bot added status: needs author action Pull request body or readiness checklist still needs author updates status: ready for review Pull request body is complete and the author marked it ready for human review and removed status: needs author action Pull request body or readiness checklist still needs author updates labels Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready for review Pull request body is complete and the author marked it ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Headroom CCR proxy re-compresses its own headroom_retrieve responses, causing an infinite retrieval

2 participants