fix(ccr): store opaque blobs from lossless:table compaction (#1083)#1182
Open
jichaowang02-lang wants to merge 1 commit into
Open
fix(ccr): store opaque blobs from lossless:table compaction (#1083)#1182jichaowang02-lang wants to merge 1 commit into
jichaowang02-lang wants to merge 1 commit into
Conversation
…jas#1083) SmartCrusher's lossless:table compaction emits `<<ccr:HASH,KIND,SIZE>>` markers for opaque blobs but never wrote the original payload to the CCR store, so `GET /v1/retrieve/{hash}` and the `headroom_retrieve` tool 404'd on them. The opaque-string path (`walker::emit_opaque_ccr_marker`) already stored; the table compactor diverged because no store was threaded into it. Thread the CCR store from `crush_array`'s lossless branch down to the opaque-cell builder: - compactor.rs: add `compact_with_store`; thread `Option<&Arc<dyn CcrStore>>` through `compact_inner` / `build_homogeneous_table` / `build_row` / `cell_from_value` and the recursive bucket + nested calls; in the Opaque branch call `store.put(&hash, s)` under the same `hash_opaque` that becomes the marker hash. Public `compact` is unchanged (delegates with `None`). - compaction/mod.rs: add `CompactionStage::run_with_store`; `run` unchanged. - crusher.rs: the lossless branch calls `run_with_store(items, self.ccr_store.as_ref())`. Marker text is store-independent, so rendered output stays byte-identical; the store only gains the write that should already have happened. Adds two compactor unit tests: the opaque payload is retrievable under the marker hash, and store presence does not change the IR. Fixes chopratejas#1083
Contributor
PR governanceThis PR follows the template and is marked ready for human review. |
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a CCR correctness gap in the Rust SmartCrusher “lossless:table” compaction path: when compaction emits opaque <<ccr:...>> markers, the original payload is now written to the configured CCR store under the same marker hash so /v1/retrieve/{hash} (and headroom_retrieve) can resolve it.
Changes:
- Add a store-aware compaction entry point (
compact_with_store) and thread an optional CCR store through table/bucket/nested compaction soCellClass::Opaquewrites payloads viastore.put(hash, payload). - Add
CompactionStage::run_with_storeto run compaction+format while optionally stashing opaque payloads. - Update
SmartCrusher::crush_arraylossless branch to callrun_with_store(...)withself.ccr_store, plus add unit tests that lock the marker-hash ↔ store-key contract and verify store presence doesn’t affect output.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| crates/headroom-core/src/transforms/smart_crusher/crusher.rs | Lossless array compaction now threads the CCR store so opaque markers produced by lossless:table are retrievable. |
| crates/headroom-core/src/transforms/smart_crusher/compaction/mod.rs | Exposes compact_with_store and adds CompactionStage::run_with_store to support store-aware compaction while keeping existing APIs intact. |
| crates/headroom-core/src/transforms/smart_crusher/compaction/compactor.rs | Implements store-aware compaction plumbing, performs store.put for opaque cells, and adds unit tests validating storage and store-independence. |
Comment on lines
+840
to
+842
| // Marker text is store-independent — only a side-effecting write is | ||
| // added, so the two IRs are byte-for-byte identical. | ||
| assert_eq!(format!("{without:?}"), format!("{with:?}")); |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
SmartCrusher's
lossless:tablecompaction path emits opaque-blob CCR markers(
<<ccr:HASH,KIND,SIZE>>) but never wrote the original payload to the CCRstore. As a result
GET /v1/retrieve/{hash}and theheadroom_retrievetoolreturn 404 for those hashes. The opaque-string path
(
walker::emit_opaque_ccr_marker) already stores its payload; the tablecompactor diverged simply because no store was threaded into it.
Closes #1083
Type of Change
Changes Made
compaction/compactor.rs: addcompact_with_store(items, cfg, store)and aprivate
compact_inner; threadOption<&Arc<dyn CcrStore>>throughbuild_homogeneous_table→build_row→cell_from_valueand the recursivebucket/nested calls. In the
Opaquebranch,store.put(&hash, payload)underthe same
hash_opaquevalue that becomes the marker hash (mirrorswalker::emit_opaque_ccr_marker). Publiccompactis unchanged — it delegateswith
None.compaction/mod.rs: addCompactionStage::run_with_store;runis unchanged.crusher.rs: the lossless branch now callsstage.run_with_store(items, self.ccr_store.as_ref())instead ofstage.run(items).compactor.rs(see below).The IR (and therefore the rendered marker text) is identical whether or not a
store is supplied — the store only gains the write that should already have
happened, so existing output stays byte-for-byte the same.
Testing
pytest)ruff check .)mypy headroom)Test Output
New tests:
opaque_payload_is_stored_under_marker_hash— aftercompact_with_store, theoriginal blob is retrievable via
store.get(marker_hash), and the stored keyequals
hash_opaque(payload)(locks the key↔marker contract).store_presence_does_not_change_the_ir—compactandcompact_with_storeproduce identical IR; only the store write is added.
(The full
cargo test -p headroom-core --librun has 18 pre-existing failures,all in
transforms::magika_detector— they require the ONNX runtime/model andare unrelated to this change. All 70 compaction + crusher tests pass.)
Real Behavior Proof
cargo test -p headroom-core(no live proxy).compact_with_store(&items, &cfg, Some(&InMemoryCcrStore))→ read theOpaqueRef.ccr_hashfrom the IR →store.get(ccr_hash).after the fix
store.get(ccr_hash) == Some(original_payload)and the markerhash is unchanged.
GET /v1/retrieve/{hash}HTTP round-trip. Verified at the unit level that the store now receives the
payload under the exact marker hash, which is the write that was missing.
Review Readiness
Checklist
Additional Notes
with no user-facing API change.
compact/runsignatures arepreserved (delegating with
None), so all existing callers and the 68in-crate compaction tests are unaffected. Only the lossless
crush_arraybranch opts into the store-threading via
run_with_store.