feat(rust): wire CodeCompressor + Kompress into live-zone dispatch + gate (3/3) by RubenAAA · Pull Request #1155 · chopratejas/headroom

RubenAAA · 2026-06-19T09:06:10Z

Description

Wires the two ported compressors (Kompress + CodeCompressor) into the live-zone dispatcher and adds the Kompress opt-in gate. This is PR 3 of 3 splitting #1143 — the only behavior-changing piece. dispatch_compressor now routes SourceCode -> CodeCompressor and PlainText -> Kompress, filling the slots it had reserved with TODOs.

Stacked on PR 1 (Kompress) and PR 2 (CodeCompressor). Until those merge, this PR's diff against main also includes their content; review #1 and #2 first. Once they land, this PR shrinks to the dispatch wiring + flag (~250 lines, mostly the routing test).

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Performance improvement
Code refactoring (no functional changes)

Changes Made

headroom-core: transforms/live_zone.rs — dispatch_compressor routes SourceCode -> CodeCompressor (always-on; grammars are statically linked, constructs in microseconds with no I/O) and PlainText -> Kompress (cache-only, gated). Adds set_kompress_enabled + warm_live_zone_compressors.
headroom-core: transforms/kompress.rs — small additions to support the dispatcher's cache-only path.
headroom-proxy: config.rs / main.rs — --enable-kompress / HEADROOM_PROXY_ENABLE_KOMPRESS flag (default off; Kompress carries a ~261 MB model so it is opt-in, mirroring the Python config.enable_kompress). Startup sets the gate and fires an off-request-path cache-only warm-up when enabled.
tests: flipped/added live-zone dispatch routing tests.
CHANGELOG.md: Unreleased entries for both compressors.

Testing

Unit tests pass (pytest) — N/A for runtime (Rust change)
Linting passes (ruff check .) — also cargo clippy -D warnings
Type checking passes (mypy headroom) — N/A (Rust-only runtime change)
New tests added for new functionality
Manual testing performed

Test Output

# Rust suites
cargo test -p headroom-core   ->  918 passed, 3 ignored (14 suites)
cargo test -p headroom-proxy  ->  407 passed (34 suites)
cargo test -p headroom-core --test live_zone_dispatch  ->  routing tests green

# Parity still byte-exact after wiring
[code_aware_compressor] total=30 matched=30 skipped=0 diffed=0
[kompress             ] total=21 matched=21 skipped=0 diffed=0

cargo clippy -p headroom-core -p headroom-proxy -p headroom-parity -- -D warnings  ->  No issues found
cargo fmt --check  ->  clean
ruff check . && ruff format --check .  ->  All checks passed

Real Behavior Proof

Environment: Linux (WSL2, kernel 6.6), Rust stable toolchain.
Exact command / steps: cargo test -p headroom-core and cargo test -p headroom-proxy; cargo test -p headroom-core --test live_zone_dispatch; re-ran the parity harness to confirm wiring did not perturb output.
Observed result: 918 core + 407 proxy tests pass; live-zone dispatch routing tests green; parity remains 30/30 code + 21/21 kompress; clippy and fmt clean. With the gate off (default), PlainText passes through unchanged; with --enable-kompress and a cold cache, dispatch still passes plain text through (Kompress unavailable) rather than blocking.
Not tested: end-to-end against a live provider (dispatcher is covered by integration tests, not live traffic in this PR); alternate ONNX execution providers (default CPU EP here).

Review Readiness

I have performed a self-review
This PR is ready for human review

Additional Notes

Split of feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated) #1143 — PR 3 of 3. Lands the engines from PR 1 + PR 2 into the dispatcher.
CCR markers are left to the dispatcher: both engines return the compressed string only; live-zone CCR uses the <<ccr:>> convention, so fixtures are recorded with enable_ccr=False for determinism.
pytest/mypy checklist items are N/A — Rust change; equivalent gates are cargo test + cargo clippy, run above.

Update — non-blocking model load + rebase

Pushed fix(kompress): never block the request path on model load and rebased this branch onto the updated #1153/#1154.

Non-blocking kompress() — the request-path accessor previously used OnceLock::get_or_init, so a request thread that hit a PlainText block while the ~261 MB model was still loading (or while a slow EP graph compile ran — the OpenVINO NPU compile takes ~13s+) blocked on that init and stalled the proxy. It's now a non-blocking OnceLock::get (returns None ⇒ pass through until ready); the one-time load happens only in warm_live_zone_compressors, off the request path. So an enabled-but-not-yet-warm Kompress degrades to passthrough instead of hanging live traffic.
Rebase — from_cache/hf_cache_file moved to the engine PR (feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3) #1153) where they belong; this branch's wiring commit reduces to the dispatch glue. Stack integrity intact (stack/1 ← stack/2 ← stack/3).

Validated downstream against a live Intel NPU run: proxy stays responsive through the NPU compile, then kompress_ready:true and prose blocks compress on-device. cargo fmt/clippy clean, dispatch + core tests green.

Ports `headroom.transforms.kompress_compressor` (the ModernBERT token compressor behind PlainText compression) to Rust — the last unported compressor alongside CodeCompressor. Engine only; live-zone dispatch wiring follows in a separate PR (matches how the other compressors landed: port + parity first, then wire). What it does: - Loads the trained `chopratejas/kompress-v2-base` ONNX model (the inference weights) + the `answerdotai/ModernBERT-base` tokenizer (a fine-tune reuses its base vocab). Runs the ONNX/proxy compression path: whitespace-split → 350-word chunks → pre-tokenized encode (is_split_into_words) → ONNX `final_scores` → max-score-per-word → keep `> 0.5` → join kept words. <10 words passes through. - Inference via `ort` (direct dep added here — first session-API consumer; unifies to the ORT instance fastembed/magika already vendor). Tokenization via the `tokenizers` crate. Both reproduce the Python `transformers`/onnxruntime path exactly. Parity (byte-exact against the Python reference): - Tokenizer `input_ids`/`word_ids` reproduce HF on pre-tokenized input. - ONNX scores match to ~1e-6 (far below the 0.5 keep threshold). - Kept-word set + joined output match byte-for-byte. - `KompressComparator` wired into the parity harness; 21 fixtures recorded via recorder.py (enable_ccr=False → deterministic output). `cargo run -p headroom-parity -- run --only kompress`: 21/21 matched. Model-gated: skips (not fails) when the model isn't in the HF cache. Tests: - crates/headroom-core/tests/kompress_parity.rs (byte-parity + passthrough) - module unit tests (config defaults, result helpers) - clippy clean, no regressions across the full parity suite. CCR offload of dropped words is left to the dispatcher (the `<<ccr:>>` convention), not this engine — the inline Python marker is intentionally not reproduced.

github-actions · 2026-06-19T09:06:25Z

PR governance

This PR follows the template and is marked ready for human review.

…ape ONNX Three self-contained additions to the cache-only loader, all in kompress.rs: - Cross-platform HF cache resolution: resolve the model via HF_HUB_CACHE / HF_HOME / HOME / USERPROFILE (was $HOME-only), so from_cache finds the model on native Windows, not just Linux. - Load diagnostics: from_cache now logs WHY it defers (kompress_cache_miss / kompress_session_build_failed with the searched roots + the real session build error) instead of returning None silently — the #1 cause of an unexplained kompress_ready=false. - Static-shape ONNX support: detect a fixed input_ids seq dimension on the loaded model and right-pad each chunk to it (masked padding => identical scores); dynamic models keep their natural length at zero padding cost. Enables execution providers that cannot compile dynamic shapes (OpenVINO NPU). Parity 21/21 in both static and dynamic modes.

…harness Ports headroom.transforms.code_compressor (CodeAwareCompressor) to Rust on the same branch as the Kompress port, so one PR delivers both new compressors (SmartCrusher is already upstream). Engine (crates/headroom-core/src/transforms/code_compressor.rs): - tree-sitter AST parsing for python/js/ts/go/rust/java/c/cpp - language detection (regex prefilter -> fewest-errors tree-sitter) - symbol-importance scoring (min-max normalized, round-3 half-even) + body budget allocation, statement-level body truncation, omitted-line comments with call info, Python docstring handling (first_line/full/remove incl. multiline first-line reconstruction), syntax-validity guard (re-parse; return original on ERROR/MISSING). Grammar-version parity is the precondition: the Rust tree-sitter-<lang> crates are pinned to the exact versions of the Python wheels the fixtures were recorded against (same version number on crates.io + PyPI = same grammar source = identical ASTs). A canary over 9 samples x 8 languages confirmed node-for-node identical node-type + line-span trees at these pins. Parity: cargo run -p headroom-parity -- run --only code_aware_compressor -> 30/30 matched (18 non-trivial across all 8 languages + unknown + invalid-syntax fallback; all 3 docstring modes). py_round_int/py_round3 half-to-even verified against CPython. Integration test + record_code_compressor_fixtures.py mirror the Kompress harness. Fixtures recorded with enable_ccr=False + fallback_to_kompress=False for determinism; live-zone dispatch wiring (SourceCode slot) is a deliberate follow-up.

Mirrors the Python content_router so the two newly-ported compressors actually run in the proxy's live zone (they were engine-only before). - ContentType::SourceCode → CodeCompressor. The grammars are statically linked, so the singleton constructs in microseconds — a synchronous one-liner like Diff/Log/Search. Flips the existing `source_code_tool_result_routes_to_no_op` contract test (whose comment invited a future "wire it up" PR) to assert code_aware_compressor. - ContentType::PlainText → Kompress, loaded CACHE-ONLY. This mirrors the Python reference's `allow_download=False` preload path: never download on a hot/startup thread; when the ~261 MB model isn't in the local HF cache, yield None and pass the text through unchanged, exactly as Python does when Kompress is unavailable. New `Kompress::from_cache` constructor + `hf_cache_file` resolver back this. The PlainText routing test is model-gated (asserts strategy "kompress" when cached, passthrough when not), matching the kompress parity test's gating. - `warm_live_zone_compressors()` (exported) mirrors Python's `eager_load_compressors`: force the cache-only singletons off the request path. Proxy startup can call it; the lazy path works without it. No regressions: headroom-core 918 tests + 7 live_zone_dispatch tests pass, full parity unchanged (all comparators green), clippy clean across core/parity/proxy, workspace builds.

Kompress carries a ~261 MB ONNX model, so — unlike the always-on structural compressors and the AST CodeCompressor — it now loads only when an operator opts in, mirroring the Python reference's `config.enable_kompress`. - core: process-wide `KOMPRESS_ENABLED` (default off) + `set_kompress_enabled`. `kompress()` checks it before the OnceLock, so a disabled proxy never loads the model and PlainText passes through. `warm_live_zone_compressors` only warms Kompress when enabled. - proxy: `--enable-kompress` / `HEADROOM_PROXY_ENABLE_KOMPRESS` flag (default false), threaded through CliArgs → Config → for_test. main.rs sets the gate from config and, when enabled, fires a `spawn_blocking` cache-only warm-up off the request path (mirrors Python's `eager_load_compressors`) — a cold cache just leaves it deferred rather than stalling the bind. - the model-gated PlainText dispatch test enables the gate explicitly, like the proxy does at startup. No regressions: core 918 passed, proxy 407 passed, clippy clean across core/parity/proxy, workspace builds.

…compressors

kompress() used OnceLock::get_or_init, so a request thread that hit a PlainText block while the ~261 MB model was still loading (or while a slow EP graph compile ran) blocked on that init and stalled the proxy. Make the request-path accessor a non-blocking OnceLock::get (None => pass through until ready); perform the one-time load only in warm_live_zone_compressors, off the request path.

RubenAAA · 2026-06-19T11:15:33Z

Rebased onto the updated #1153/#1154 and added fix(kompress): never block the request path on model load — the old get_or_init accessor blocked request threads on the (slow, ~13s+ on NPU) model load and stalled the proxy; it's now a non-blocking get that degrades to passthrough until the off-path warm completes. Stack integrity intact. Description updated.

JerrettDavis

This PR is not merge-ready in current GitHub state (mergeStateStatus=UNSTABLE). Please update from current main, resolve any conflicts if present, and rerun/clear required CI before this can be approved.

Ruben Avanesov added 2 commits June 19, 2026 10:56

style: cargo fmt + ruff format

b24dc61

RubenAAA mentioned this pull request Jun 19, 2026

feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated) #1143

Draft

21 tasks

github-actions Bot added the status: ready for review Pull request body is complete and the author marked it ready for human review label Jun 19, 2026

Ruben Avanesov added 8 commits June 19, 2026 13:10

style: cargo fmt + ruff format

c5dd978

docs(changelog): note Rust CodeCompressor + gated Kompress live-zone …

1c2a645

…compressors

style: cargo fmt + ruff format

f88be24

RubenAAA force-pushed the stack/3-dispatch-gate branch from ebb76ed to 80ce58e Compare June 19, 2026 11:13

RubenAAA mentioned this pull request Jun 19, 2026

feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3) #1153

Open

13 tasks

JerrettDavis requested changes Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rust): wire CodeCompressor + Kompress into live-zone dispatch + gate (3/3)#1155

feat(rust): wire CodeCompressor + Kompress into live-zone dispatch + gate (3/3)#1155
RubenAAA wants to merge 10 commits into
chopratejas:mainfrom
RubenAAA:stack/3-dispatch-gate

RubenAAA commented Jun 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

RubenAAA commented Jun 19, 2026

Uh oh!

JerrettDavis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RubenAAA commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Testing

Test Output

Real Behavior Proof

Review Readiness

Additional Notes

Update — non-blocking model load + rebase

Uh oh!

github-actions Bot commented Jun 19, 2026

PR governance

Uh oh!

RubenAAA commented Jun 19, 2026

Uh oh!

JerrettDavis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RubenAAA commented Jun 19, 2026 •

edited

Loading