feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated)#1143
feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated)#1143RubenAAA wants to merge 7 commits into
Conversation
Ports `headroom.transforms.kompress_compressor` (the ModernBERT token compressor behind PlainText compression) to Rust — the last unported compressor alongside CodeCompressor. Engine only; live-zone dispatch wiring follows in a separate PR (matches how the other compressors landed: port + parity first, then wire). What it does: - Loads the trained `chopratejas/kompress-v2-base` ONNX model (the inference weights) + the `answerdotai/ModernBERT-base` tokenizer (a fine-tune reuses its base vocab). Runs the ONNX/proxy compression path: whitespace-split → 350-word chunks → pre-tokenized encode (is_split_into_words) → ONNX `final_scores` → max-score-per-word → keep `> 0.5` → join kept words. <10 words passes through. - Inference via `ort` (direct dep added here — first session-API consumer; unifies to the ORT instance fastembed/magika already vendor). Tokenization via the `tokenizers` crate. Both reproduce the Python `transformers`/onnxruntime path exactly. Parity (byte-exact against the Python reference): - Tokenizer `input_ids`/`word_ids` reproduce HF on pre-tokenized input. - ONNX scores match to ~1e-6 (far below the 0.5 keep threshold). - Kept-word set + joined output match byte-for-byte. - `KompressComparator` wired into the parity harness; 21 fixtures recorded via recorder.py (enable_ccr=False → deterministic output). `cargo run -p headroom-parity -- run --only kompress`: 21/21 matched. Model-gated: skips (not fails) when the model isn't in the HF cache. Tests: - crates/headroom-core/tests/kompress_parity.rs (byte-parity + passthrough) - module unit tests (config defaults, result helpers) - clippy clean, no regressions across the full parity suite. CCR offload of dropped words is left to the dispatcher (the `<<ccr:>>` convention), not this engine — the inline Python marker is intentionally not reproduced.
…harness Ports headroom.transforms.code_compressor (CodeAwareCompressor) to Rust on the same branch as the Kompress port, so one PR delivers both new compressors (SmartCrusher is already upstream). Engine (crates/headroom-core/src/transforms/code_compressor.rs): - tree-sitter AST parsing for python/js/ts/go/rust/java/c/cpp - language detection (regex prefilter -> fewest-errors tree-sitter) - symbol-importance scoring (min-max normalized, round-3 half-even) + body budget allocation, statement-level body truncation, omitted-line comments with call info, Python docstring handling (first_line/full/remove incl. multiline first-line reconstruction), syntax-validity guard (re-parse; return original on ERROR/MISSING). Grammar-version parity is the precondition: the Rust tree-sitter-<lang> crates are pinned to the exact versions of the Python wheels the fixtures were recorded against (same version number on crates.io + PyPI = same grammar source = identical ASTs). A canary over 9 samples x 8 languages confirmed node-for-node identical node-type + line-span trees at these pins. Parity: cargo run -p headroom-parity -- run --only code_aware_compressor -> 30/30 matched (18 non-trivial across all 8 languages + unknown + invalid-syntax fallback; all 3 docstring modes). py_round_int/py_round3 half-to-even verified against CPython. Integration test + record_code_compressor_fixtures.py mirror the Kompress harness. Fixtures recorded with enable_ccr=False + fallback_to_kompress=False for determinism; live-zone dispatch wiring (SourceCode slot) is a deliberate follow-up.
Mirrors the Python content_router so the two newly-ported compressors actually run in the proxy's live zone (they were engine-only before). - ContentType::SourceCode → CodeCompressor. The grammars are statically linked, so the singleton constructs in microseconds — a synchronous one-liner like Diff/Log/Search. Flips the existing `source_code_tool_result_routes_to_no_op` contract test (whose comment invited a future "wire it up" PR) to assert code_aware_compressor. - ContentType::PlainText → Kompress, loaded CACHE-ONLY. This mirrors the Python reference's `allow_download=False` preload path: never download on a hot/startup thread; when the ~261 MB model isn't in the local HF cache, yield None and pass the text through unchanged, exactly as Python does when Kompress is unavailable. New `Kompress::from_cache` constructor + `hf_cache_file` resolver back this. The PlainText routing test is model-gated (asserts strategy "kompress" when cached, passthrough when not), matching the kompress parity test's gating. - `warm_live_zone_compressors()` (exported) mirrors Python's `eager_load_compressors`: force the cache-only singletons off the request path. Proxy startup can call it; the lazy path works without it. No regressions: headroom-core 918 tests + 7 live_zone_dispatch tests pass, full parity unchanged (all comparators green), clippy clean across core/parity/proxy, workspace builds.
Kompress carries a ~261 MB ONNX model, so — unlike the always-on structural compressors and the AST CodeCompressor — it now loads only when an operator opts in, mirroring the Python reference's `config.enable_kompress`. - core: process-wide `KOMPRESS_ENABLED` (default off) + `set_kompress_enabled`. `kompress()` checks it before the OnceLock, so a disabled proxy never loads the model and PlainText passes through. `warm_live_zone_compressors` only warms Kompress when enabled. - proxy: `--enable-kompress` / `HEADROOM_PROXY_ENABLE_KOMPRESS` flag (default false), threaded through CliArgs → Config → for_test. main.rs sets the gate from config and, when enabled, fires a `spawn_blocking` cache-only warm-up off the request path (mirrors Python's `eager_load_compressors`) — a cold cache just leaves it deferred rather than stalling the bind. - the model-gated PlainText dispatch test enables the gate explicitly, like the proxy does at startup. No regressions: core 918 passed, proxy 407 passed, clippy clean across core/parity/proxy, workspace builds.
PR governanceThis PR does not yet satisfy the required template fields:
Please update the PR body, or move the PR back to draft while it is still in progress. |
|
This will need a deep review - any chance this can be broken into multiple PRs? |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
For a bit of context: I'd actually been building something along these lines independently on my own side, so I was genuinely happy to find this repo. It's a much better foundation than what I had. My setup needs the engine in Rust (both for how it's wired in and for the speed), which is what motivated porting these compressors over rather than calling the Python ones. Sorry for the size, I'll break it into a stack of 3 PRs that review in order. Each builds green on its own:
So #1 and #2 are pure additive engine ports (dead code until #3 lands them), and #3 is the behavior change in isolation. That should make the deep review tractable one engine at a time. |
|
As promised, split into a 3-PR stack (each builds green on its own, reviews in order):
Since I don't have push access to base branches here, #1154 and #1155 are stacked on the fork, so their diffs against Happy to close this PR in favor of the stack once you've had a look — leaving it open for now so the discussion thread isn't lost. |
|
This PR is still a draft and GitHub reports mergeStateStatus=UNSTABLE, so I am not approving it in the ready-to-merge pass. Please mark it ready and get required CI green before final review. |
Description
Ports two of the live-zone compressors documented in the README architecture to the Rust
headroom-coreengine, and wires both into the live-zone dispatcher (they fill theSourceCode/PlainTextslots thatdispatch_compressoralready had reserved with TODOs):headroom/transforms/code_compressor.py(CodeAwareCompressor).kompress-v2-baseONNX + ModernBERT tokenizer), a byte-for-byte port ofheadroom/transforms/kompress_compressor.py, gated behind--enable-kompressbecause it loads a ~261 MB model.Both are verified against the Python reference via the existing
headroom-parityharness (replays fixtures recorded from the Python implementations and asserts identical output). The Kompress engine landed first in this branch; the CodeCompressor port, the dispatch wiring, and the gate follow.No tracking issue — this completes the in-repo
PR-B4/ "Rust port pending" TODOs incrates/headroom-core/src/transforms/live_zone.rs(the dispatcher and a contract test were written to be flipped once these crates landed).Type of Change
Changes Made
headroom-core:transforms/code_compressor.rs— full CodeCompressor port: language detection (regex prefilter → fewest-errors tree-sitter), symbol-importance scoring (min-max normalized, CPython half-to-even rounding), per-function body-budget allocation, statement-level body truncation with# [N lines omitted; calls: …]summaries, Python docstring handling (first_line/full/remove, incl. multi-line first-line reconstruction), and a re-parse syntax-validity guard that returns the original on any invalid output.headroom-core:transforms/kompress.rs— addsKompress::from_cache(cache-only constructor) +hf_cache_fileresolver.headroom-core:transforms/live_zone.rs—dispatch_compressornow routesSourceCode → CodeCompressorandPlainText → Kompress(cache-only, gated). Addsset_kompress_enabled+warm_live_zone_compressors.headroom-proxy:config.rs/main.rs—--enable-kompress/HEADROOM_PROXY_ENABLE_KOMPRESSflag (default off); startup sets the gate and fires an off-request-path cache-only warm-up when enabled.headroom-parity—CodeCompressorComparator; recorder +scripts/record_code_compressor_fixtures.py; 30 code + 21 kompress fixtures.code_compressor_parity.rsintegration test; flipped/added live-zone dispatch routing tests; module unit tests (rounding parity, detection, passthrough).CHANGELOG.md— Unreleased entries for both compressors.Design decisions & rationale
tree-sitter-<lang>crates are pinned (=) to the exact versions of the Pythontree-sitter-*wheels the fixtures were recorded against — same version number on crates.io + PyPI means the samegrammar.jssource, hence the same generatedparser.c. A canary over 9 samples × 8 languages confirmed 100% identical node-type + line-span trees before a line of the compressor was written (see Real Behavior Proof).--enable-kompress, default off), mirroring the Python reference'sconfig.enable_kompress.Kompress::from_cacheresolves the model + tokenizer from the local HF cache and never downloads — the Rust mirror of the Python reference'sallow_download=Falsepreload path. A cold cache yieldsNone, and the dispatcher passes plain text through unchanged (exactly as Python does when Kompress is unavailable), so neither a request nor the startup bind can block on a download.# [N tokens compressed… hash=]/[… hash=]markers are intentionally not reproduced (live-zone CCR uses the<<ccr:>>convention). Fixtures are therefore recorded withenable_ccr=Falsefor determinism.stris reproduced as correct UTF-8 byte slicing; for ASCII these are identical, so fixtures are ASCII (non-ASCII identifiers are the one documented out-of-scope case, noted in the module docs).Dependencies & supply chain
This PR adds Rust crates only (no new Python deps; the
tree-sitter-*Python wheels are used at fixture-record time, not at runtime).tree-sitter(0.25.2) +tree-sitter-{python,javascript,typescript,go,rust,java,c,cpp}— the canonical tree-sitter org grammars. Why these: required for AST code compression; they are the same grammars the Python reference uses, which is what makes byte-parity attainable. Maintenance: the officialtree-sitter/tree-sitter-*projects, actively released. Install surface: each compiles a vendored Cparser.cviacc(already required byrusqlite's bundled build); no install/runtime network. Why these versions: required new functionality — pinned=to match the Python wheels exactly; bumping any pin requires re-running the canary and re-recording fixtures.ort(2.0.0-rc.12) — already in the tree viafastembed/magika; pulled as a direct dep so the Kompress port shares the single ONNX Runtime instance (no second runtime, no extra download). Cache-only at runtime.Testing
ruff check ./cargo clippy -D warnings)pytest/mypy headroom— N/A (Rust-only change; the only Python touched is dev-time fixture tooling undertests/parity/, which passesruff check+ruff format)Test Output
Real Behavior Proof
Environment: Linux (WSL2, kernel 6.6), Rust stable toolchain, Python 3.13.9 for reference recording.
kompress-v2-base+ModernBERT-basepresent in the local HF cache.Exact steps:
cargo run -p headroom-parity -- run --only code_aware_compressorand--only kompress.Observed result — grammar canary (precondition):
Signatures, imports, the first statement of each body, and inter-function call edges are preserved; the rest is summarized. The Rust port reproduces this output byte-for-byte (parity 30/30 above). Output re-parses cleanly (the syntax-validity guard returns the original otherwise).
Review Readiness
Checklist
Additional Notes
mainand shares no commits with the OpenVINO execution-provider work; its onlyorttouch is the direct-dependency line that unifies onto the existing ONNX Runtime instance.warm_live_zone_compressors()from proxy startup is wired behind the gate but the eager warm is optional (the lazy cache-only path works without it); Kompress live-zone CCR offload of dropped words via<<ccr:>>is handled by the dispatcher layer, not this engine.pytest/mypy) are N/A — this is a Rust change; the equivalent gates arecargo test+cargo clippy, run above.