Skip to content

fix(ccr): store opaque blobs from lossless:table compaction (#1083)#1182

Open
jichaowang02-lang wants to merge 1 commit into
chopratejas:mainfrom
jichaowang02-lang:fix/ccr-lossless-table-store
Open

fix(ccr): store opaque blobs from lossless:table compaction (#1083)#1182
jichaowang02-lang wants to merge 1 commit into
chopratejas:mainfrom
jichaowang02-lang:fix/ccr-lossless-table-store

Conversation

@jichaowang02-lang

Copy link
Copy Markdown
Contributor

Description

SmartCrusher's lossless:table compaction path emits opaque-blob CCR markers
(<<ccr:HASH,KIND,SIZE>>) but never wrote the original payload to the CCR
store. As a result GET /v1/retrieve/{hash} and the headroom_retrieve tool
return 404 for those hashes. The opaque-string path
(walker::emit_opaque_ccr_marker) already stores its payload; the table
compactor diverged simply because no store was threaded into it.

Closes #1083

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • compaction/compactor.rs: add compact_with_store(items, cfg, store) and a
    private compact_inner; thread Option<&Arc<dyn CcrStore>> through
    build_homogeneous_tablebuild_rowcell_from_value and the recursive
    bucket/nested calls. In the Opaque branch, store.put(&hash, payload) under
    the same hash_opaque value that becomes the marker hash (mirrors
    walker::emit_opaque_ccr_marker). Public compact is unchanged — it delegates
    with None.
  • compaction/mod.rs: add CompactionStage::run_with_store; run is unchanged.
  • crusher.rs: the lossless branch now calls
    stage.run_with_store(items, self.ccr_store.as_ref()) instead of stage.run(items).
  • Two new unit tests in compactor.rs (see below).

The IR (and therefore the rendered marker text) is identical whether or not a
store is supplied — the store only gains the write that should already have
happened, so existing output stays byte-for-byte the same.

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check .)
  • Type checking passes (mypy headroom)
  • New tests added for new functionality
  • Manual testing performed

Note: this change is in the Rust core (crates/headroom-core), so the
Python-specific checks above are N/A. The Rust equivalents were run:

Test Output

$ cargo test -p headroom-core --lib compaction
test result: ok. 70 passed; 0 failed; 0 ignored; 0 measured; 766 filtered out; finished in 0.01s

$ cargo fmt -p headroom-core -- --check
# clean (exit 0)

New tests:

  • opaque_payload_is_stored_under_marker_hash — after compact_with_store, the
    original blob is retrievable via store.get(marker_hash), and the stored key
    equals hash_opaque(payload) (locks the key↔marker contract).
  • store_presence_does_not_change_the_ircompact and compact_with_store
    produce identical IR; only the store write is added.

(The full cargo test -p headroom-core --lib run has 18 pre-existing failures,
all in transforms::magika_detector — they require the ONNX runtime/model and
are unrelated to this change. All 70 compaction + crusher tests pass.)

Real Behavior Proof

  • Environment: Windows, Rust 1.95.0, cargo test -p headroom-core (no live proxy).
  • Exact command / steps: build a 2-item array with a long opaque-blob field →
    compact_with_store(&items, &cfg, Some(&InMemoryCcrStore)) → read the
    OpaqueRef.ccr_hash from the IR → store.get(ccr_hash).
  • Observed result: before the fix the store is empty (retrieval would 404);
    after the fix store.get(ccr_hash) == Some(original_payload) and the marker
    hash is unchanged.
  • Not tested: end-to-end through a running proxy / a real GET /v1/retrieve/{hash}
    HTTP round-trip. Verified at the unit level that the store now receives the
    payload under the exact marker hash, which is the write that was missing.

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md if applicable

Additional Notes

  • Docs/CHANGELOG checklist items are N/A — this is an internal correctness fix
    with no user-facing API change.
  • Scope is intentionally minimal: public compact/run signatures are
    preserved (delegating with None), so all existing callers and the 68
    in-crate compaction tests are unaffected. Only the lossless crush_array
    branch opts into the store-threading via run_with_store.

…jas#1083)

SmartCrusher's lossless:table compaction emits `<<ccr:HASH,KIND,SIZE>>`
markers for opaque blobs but never wrote the original payload to the CCR
store, so `GET /v1/retrieve/{hash}` and the `headroom_retrieve` tool 404'd
on them. The opaque-string path (`walker::emit_opaque_ccr_marker`) already
stored; the table compactor diverged because no store was threaded into it.

Thread the CCR store from `crush_array`'s lossless branch down to the
opaque-cell builder:

- compactor.rs: add `compact_with_store`; thread `Option<&Arc<dyn CcrStore>>`
  through `compact_inner` / `build_homogeneous_table` / `build_row` /
  `cell_from_value` and the recursive bucket + nested calls; in the Opaque
  branch call `store.put(&hash, s)` under the same `hash_opaque` that becomes
  the marker hash. Public `compact` is unchanged (delegates with `None`).
- compaction/mod.rs: add `CompactionStage::run_with_store`; `run` unchanged.
- crusher.rs: the lossless branch calls
  `run_with_store(items, self.ccr_store.as_ref())`.

Marker text is store-independent, so rendered output stays byte-identical;
the store only gains the write that should already have happened.

Adds two compactor unit tests: the opaque payload is retrievable under the
marker hash, and store presence does not change the IR.

Fixes chopratejas#1083
Copilot AI review requested due to automatic review settings June 20, 2026 02:31
@github-actions

Copy link
Copy Markdown
Contributor

PR governance

This PR follows the template and is marked ready for human review.

@github-actions github-actions Bot added the status: ready for review Pull request body is complete and the author marked it ready for human review label Jun 20, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a CCR correctness gap in the Rust SmartCrusher “lossless:table” compaction path: when compaction emits opaque <<ccr:...>> markers, the original payload is now written to the configured CCR store under the same marker hash so /v1/retrieve/{hash} (and headroom_retrieve) can resolve it.

Changes:

  • Add a store-aware compaction entry point (compact_with_store) and thread an optional CCR store through table/bucket/nested compaction so CellClass::Opaque writes payloads via store.put(hash, payload).
  • Add CompactionStage::run_with_store to run compaction+format while optionally stashing opaque payloads.
  • Update SmartCrusher::crush_array lossless branch to call run_with_store(...) with self.ccr_store, plus add unit tests that lock the marker-hash ↔ store-key contract and verify store presence doesn’t affect output.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
crates/headroom-core/src/transforms/smart_crusher/crusher.rs Lossless array compaction now threads the CCR store so opaque markers produced by lossless:table are retrievable.
crates/headroom-core/src/transforms/smart_crusher/compaction/mod.rs Exposes compact_with_store and adds CompactionStage::run_with_store to support store-aware compaction while keeping existing APIs intact.
crates/headroom-core/src/transforms/smart_crusher/compaction/compactor.rs Implements store-aware compaction plumbing, performs store.put for opaque cells, and adds unit tests validating storage and store-independence.

Comment on lines +840 to +842
// Marker text is store-independent — only a side-effecting write is
// added, so the two IRs are byte-for-byte identical.
assert_eq!(format!("{without:?}"), format!("{with:?}"));
@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready for review Pull request body is complete and the author marked it ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] lossless:table compaction emits CCR markers that never reach the store — /v1/retrieve 404s

2 participants