feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated) by RubenAAA · Pull Request #1143 · chopratejas/headroom

RubenAAA · 2026-06-19T00:56:49Z

Description

Ports two of the live-zone compressors documented in the README architecture to the Rust headroom-core engine, and wires both into the live-zone dispatcher (they fill the SourceCode / PlainText slots that dispatch_compressor already had reserved with TODOs):

CodeCompressor — an AST, syntax-preserving source-code compressor (tree-sitter), a byte-for-byte port of headroom/transforms/code_compressor.py (CodeAwareCompressor).
Kompress — the ML prose compressor (kompress-v2-base ONNX + ModernBERT tokenizer), a byte-for-byte port of headroom/transforms/kompress_compressor.py, gated behind --enable-kompress because it loads a ~261 MB model.

Both are verified against the Python reference via the existing headroom-parity harness (replays fixtures recorded from the Python implementations and asserts identical output). The Kompress engine landed first in this branch; the CodeCompressor port, the dispatch wiring, and the gate follow.

No tracking issue — this completes the in-repo PR-B4 / "Rust port pending" TODOs in crates/headroom-core/src/transforms/live_zone.rs (the dispatcher and a contract test were written to be flipped once these crates landed).

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Performance improvement
Code refactoring (no functional changes)

Changes Made

headroom-core: transforms/code_compressor.rs — full CodeCompressor port: language detection (regex prefilter → fewest-errors tree-sitter), symbol-importance scoring (min-max normalized, CPython half-to-even rounding), per-function body-budget allocation, statement-level body truncation with # [N lines omitted; calls: …] summaries, Python docstring handling (first_line / full / remove, incl. multi-line first-line reconstruction), and a re-parse syntax-validity guard that returns the original on any invalid output.
headroom-core: transforms/kompress.rs — adds Kompress::from_cache (cache-only constructor) + hf_cache_file resolver.
headroom-core: transforms/live_zone.rs — dispatch_compressor now routes SourceCode → CodeCompressor and PlainText → Kompress (cache-only, gated). Adds set_kompress_enabled + warm_live_zone_compressors.
headroom-proxy: config.rs / main.rs — --enable-kompress / HEADROOM_PROXY_ENABLE_KOMPRESS flag (default off); startup sets the gate and fires an off-request-path cache-only warm-up when enabled.
headroom-parity — CodeCompressorComparator; recorder + scripts/record_code_compressor_fixtures.py; 30 code + 21 kompress fixtures.
Tests — code_compressor_parity.rs integration test; flipped/added live-zone dispatch routing tests; module unit tests (rounding parity, detection, passthrough).
CHANGELOG.md — Unreleased entries for both compressors.

Design decisions & rationale

Grammar-version parity is the make-or-break invariant. Byte-parity for an AST compressor requires the Rust and Python parsers to produce node-for-node identical trees. The Rust tree-sitter-<lang> crates are pinned (=) to the exact versions of the Python tree-sitter-* wheels the fixtures were recorded against — same version number on crates.io + PyPI means the same grammar.js source, hence the same generated parser.c. A canary over 9 samples × 8 languages confirmed 100% identical node-type + line-span trees before a line of the compressor was written (see Real Behavior Proof).
CodeCompressor is always-on; Kompress is gated. CodeCompressor's grammars are statically linked — its singleton constructs in microseconds with no I/O — so it dispatches synchronously like Diff/Log/Search. Kompress carries a ~261 MB model, so it is opt-in (--enable-kompress, default off), mirroring the Python reference's config.enable_kompress.
Kompress loads cache-only. Kompress::from_cache resolves the model + tokenizer from the local HF cache and never downloads — the Rust mirror of the Python reference's allow_download=False preload path. A cold cache yields None, and the dispatcher passes plain text through unchanged (exactly as Python does when Kompress is unavailable), so neither a request nor the startup bind can block on a download.
CCR markers are left to the dispatcher. Both engines return the compressed string only; the Python inline # [N tokens compressed… hash=] / [… hash=] markers are intentionally not reproduced (live-zone CCR uses the <<ccr:>> convention). Fixtures are therefore recorded with enable_ccr=False for determinism.
Parity scope. Slicing the Python reference does by byte offset into a str is reproduced as correct UTF-8 byte slicing; for ASCII these are identical, so fixtures are ASCII (non-ASCII identifiers are the one documented out-of-scope case, noted in the module docs).

Dependencies & supply chain

This PR adds Rust crates only (no new Python deps; the tree-sitter-* Python wheels are used at fixture-record time, not at runtime).

tree-sitter (0.25.2) + tree-sitter-{python,javascript,typescript,go,rust,java,c,cpp} — the canonical tree-sitter org grammars. Why these: required for AST code compression; they are the same grammars the Python reference uses, which is what makes byte-parity attainable. Maintenance: the official tree-sitter/tree-sitter-* projects, actively released. Install surface: each compiles a vendored C parser.c via cc (already required by rusqlite's bundled build); no install/runtime network. Why these versions: required new functionality — pinned = to match the Python wheels exactly; bumping any pin requires re-running the canary and re-recording fixtures.
ort (2.0.0-rc.12) — already in the tree via fastembed/magika; pulled as a direct dep so the Kompress port shares the single ONNX Runtime instance (no second runtime, no extra download). Cache-only at runtime.

Testing

New tests added for new functionality
Linting passes (ruff check . / cargo clippy -D warnings)
Unit + integration tests pass (Rust)
pytest / mypy headroom — N/A (Rust-only change; the only Python touched is dev-time fixture tooling under tests/parity/, which passes ruff check + ruff format)
Manual testing performed (see Real Behavior Proof)

Test Output

# Byte-parity: Rust port vs Python reference (headroom-parity harness)
[code_aware_compressor] total=30 matched=30 skipped=0 diffed=0
[kompress             ] total=21 matched=21 skipped=0 diffed=0
# (full run: diff/tokenizer/smart_crusher/content_detector all green;
#  log/cache_aligner/ccr are pre-existing Phase-0 stubs, unchanged)

# Rust test suites
cargo test -p headroom-core   ->  918 passed, 3 ignored (14 suites)
cargo test -p headroom-proxy  ->  407 passed (34 suites)
cargo test -p headroom-core --test live_zone_dispatch  ->  7 passed

# Lint
cargo clippy -p headroom-core -p headroom-parity -p headroom-proxy -- -D warnings  ->  No issues found
ruff check tests/parity/recorder.py scripts/record_code_compressor_fixtures.py    ->  All checks passed!

Real Behavior Proof

Environment: Linux (WSL2, kernel 6.6), Rust stable toolchain, Python 3.13.9 for reference recording. kompress-v2-base + ModernBERT-base present in the local HF cache.
Exact steps:
1. Grammar canary — dump node-type + line-span trees from the Python tree-sitter stack and the pinned Rust crates over identical samples, then compare.
2. cargo run -p headroom-parity -- run --only code_aware_compressor and --only kompress.
3. Run the CodeCompressor over a sample and inspect the compressed output.
Observed result — grammar canary (precondition):

IDENTICAL  c:basic            (222 nodes)
IDENTICAL  cpp:basic          (216 nodes)
IDENTICAL  go:basic           (225 nodes)
IDENTICAL  java:basic         (234 nodes)
IDENTICAL  javascript:basic   (212 nodes)
IDENTICAL  python:basic       (279 nodes)
IDENTICAL  python:nodoc       (163 nodes)
IDENTICAL  rust:basic         (261 nodes)
IDENTICAL  typescript:basic   (214 nodes)
=> 9/9 languages produce byte-identical ASTs (node type + line spans)

Observed result — live compression (CodeCompressor, Python source, ratio 0.525):

# INPUT (567 chars): three functions load()/transform()/run()  -->  COMPRESSED OUTPUT:
import json

def load(path):
    with open(path) as fh:
        data = json.load(fh)
    # [6 lines omitted]
    pass
def transform(records, factor):
    out = []
    # [6 lines omitted]
    pass
def run(path, factor):
    data = load(path)
    # [3 lines omitted; calls: load, transform]
    pass
# symbol_scores: {"load": 1.0, "run": 0.0, "transform": 0.0} | language: python (conf 1.0)

Signatures, imports, the first statement of each body, and inter-function call edges are preserved; the rest is summarized. The Rust port reproduces this output byte-for-byte (parity 30/30 above). Output re-parses cleanly (the syntax-validity guard returns the original otherwise).

Not tested: non-ASCII identifiers/string contents (documented out-of-parity-scope); end-to-end against a live provider (the dispatcher is covered by integration tests, not live traffic in this PR); NPU/alternate ONNX execution providers (Kompress runs on the default CPU EP here — EP selection is independent of this change).

Review Readiness

I have performed a self-review
This PR is ready for human review

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (CHANGELOG + module docs)
My changes generate no new warnings
I have added tests that prove my feature works
New and existing unit tests pass locally with my changes
I have updated the CHANGELOG.md

Additional Notes

Independent of the OpenVINO EP branch/PR. This branch is based directly on main and shares no commits with the OpenVINO execution-provider work; its only ort touch is the direct-dependency line that unifies onto the existing ONNX Runtime instance.
Follow-ups (out of scope here): calling warm_live_zone_compressors() from proxy startup is wired behind the gate but the eager warm is optional (the lazy cache-only path works without it); Kompress live-zone CCR offload of dropped words via <<ccr:>> is handled by the dispatcher layer, not this engine.
Python checklist items (pytest/mypy) are N/A — this is a Rust change; the equivalent gates are cargo test + cargo clippy, run above.

Ports `headroom.transforms.kompress_compressor` (the ModernBERT token compressor behind PlainText compression) to Rust — the last unported compressor alongside CodeCompressor. Engine only; live-zone dispatch wiring follows in a separate PR (matches how the other compressors landed: port + parity first, then wire). What it does: - Loads the trained `chopratejas/kompress-v2-base` ONNX model (the inference weights) + the `answerdotai/ModernBERT-base` tokenizer (a fine-tune reuses its base vocab). Runs the ONNX/proxy compression path: whitespace-split → 350-word chunks → pre-tokenized encode (is_split_into_words) → ONNX `final_scores` → max-score-per-word → keep `> 0.5` → join kept words. <10 words passes through. - Inference via `ort` (direct dep added here — first session-API consumer; unifies to the ORT instance fastembed/magika already vendor). Tokenization via the `tokenizers` crate. Both reproduce the Python `transformers`/onnxruntime path exactly. Parity (byte-exact against the Python reference): - Tokenizer `input_ids`/`word_ids` reproduce HF on pre-tokenized input. - ONNX scores match to ~1e-6 (far below the 0.5 keep threshold). - Kept-word set + joined output match byte-for-byte. - `KompressComparator` wired into the parity harness; 21 fixtures recorded via recorder.py (enable_ccr=False → deterministic output). `cargo run -p headroom-parity -- run --only kompress`: 21/21 matched. Model-gated: skips (not fails) when the model isn't in the HF cache. Tests: - crates/headroom-core/tests/kompress_parity.rs (byte-parity + passthrough) - module unit tests (config defaults, result helpers) - clippy clean, no regressions across the full parity suite. CCR offload of dropped words is left to the dispatcher (the `<<ccr:>>` convention), not this engine — the inline Python marker is intentionally not reproduced.

…harness Ports headroom.transforms.code_compressor (CodeAwareCompressor) to Rust on the same branch as the Kompress port, so one PR delivers both new compressors (SmartCrusher is already upstream). Engine (crates/headroom-core/src/transforms/code_compressor.rs): - tree-sitter AST parsing for python/js/ts/go/rust/java/c/cpp - language detection (regex prefilter -> fewest-errors tree-sitter) - symbol-importance scoring (min-max normalized, round-3 half-even) + body budget allocation, statement-level body truncation, omitted-line comments with call info, Python docstring handling (first_line/full/remove incl. multiline first-line reconstruction), syntax-validity guard (re-parse; return original on ERROR/MISSING). Grammar-version parity is the precondition: the Rust tree-sitter-<lang> crates are pinned to the exact versions of the Python wheels the fixtures were recorded against (same version number on crates.io + PyPI = same grammar source = identical ASTs). A canary over 9 samples x 8 languages confirmed node-for-node identical node-type + line-span trees at these pins. Parity: cargo run -p headroom-parity -- run --only code_aware_compressor -> 30/30 matched (18 non-trivial across all 8 languages + unknown + invalid-syntax fallback; all 3 docstring modes). py_round_int/py_round3 half-to-even verified against CPython. Integration test + record_code_compressor_fixtures.py mirror the Kompress harness. Fixtures recorded with enable_ccr=False + fallback_to_kompress=False for determinism; live-zone dispatch wiring (SourceCode slot) is a deliberate follow-up.

Mirrors the Python content_router so the two newly-ported compressors actually run in the proxy's live zone (they were engine-only before). - ContentType::SourceCode → CodeCompressor. The grammars are statically linked, so the singleton constructs in microseconds — a synchronous one-liner like Diff/Log/Search. Flips the existing `source_code_tool_result_routes_to_no_op` contract test (whose comment invited a future "wire it up" PR) to assert code_aware_compressor. - ContentType::PlainText → Kompress, loaded CACHE-ONLY. This mirrors the Python reference's `allow_download=False` preload path: never download on a hot/startup thread; when the ~261 MB model isn't in the local HF cache, yield None and pass the text through unchanged, exactly as Python does when Kompress is unavailable. New `Kompress::from_cache` constructor + `hf_cache_file` resolver back this. The PlainText routing test is model-gated (asserts strategy "kompress" when cached, passthrough when not), matching the kompress parity test's gating. - `warm_live_zone_compressors()` (exported) mirrors Python's `eager_load_compressors`: force the cache-only singletons off the request path. Proxy startup can call it; the lazy path works without it. No regressions: headroom-core 918 tests + 7 live_zone_dispatch tests pass, full parity unchanged (all comparators green), clippy clean across core/parity/proxy, workspace builds.

Kompress carries a ~261 MB ONNX model, so — unlike the always-on structural compressors and the AST CodeCompressor — it now loads only when an operator opts in, mirroring the Python reference's `config.enable_kompress`. - core: process-wide `KOMPRESS_ENABLED` (default off) + `set_kompress_enabled`. `kompress()` checks it before the OnceLock, so a disabled proxy never loads the model and PlainText passes through. `warm_live_zone_compressors` only warms Kompress when enabled. - proxy: `--enable-kompress` / `HEADROOM_PROXY_ENABLE_KOMPRESS` flag (default false), threaded through CliArgs → Config → for_test. main.rs sets the gate from config and, when enabled, fires a `spawn_blocking` cache-only warm-up off the request path (mirrors Python's `eager_load_compressors`) — a cold cache just leaves it deferred rather than stalling the bind. - the model-gated PlainText dispatch test enables the gate explicitly, like the proxy does at startup. No regressions: core 918 passed, proxy 407 passed, clippy clean across core/parity/proxy, workspace builds.

…compressors

github-actions · 2026-06-19T00:57:00Z

PR governance

This PR does not yet satisfy the required template fields:

Fill in Real Behavior Proof → Environment.
Fill in Real Behavior Proof → Exact command / steps.
Fill in Real Behavior Proof → Observed result.
Fill in Real Behavior Proof → Not tested.

Please update the PR body, or move the PR back to draft while it is still in progress.

chopratejas · 2026-06-19T05:48:27Z

This will need a deep review - any chance this can be broken into multiple PRs?
Also - i see a lot of fixtures/*.json files - can you help me understand why we need those?

codecov · 2026-06-19T06:12:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

RubenAAA · 2026-06-19T08:54:34Z

This will need a deep review - any chance this can be broken into multiple PRs? Also - i see a lot of fixtures/*.json files - can you help me understand why we need those?

For a bit of context: I'd actually been building something along these lines independently on my own side, so I was genuinely happy to find this repo. It's a much better foundation than what I had. My setup needs the engine in Rust (both for how it's wired in and for the speed), which is what motivated porting these compressors over rather than calling the Python ones.

Sorry for the size, I'll break it into a stack of 3 PRs that review in order. Each builds green on its own:

feat(rust): port Kompress ML prose compressor + parity harness
The kompress.rs engine + the headroom-parity Kompress comparator/fixtures. Self-contained — adds the engine module but doesn't touch the dispatcher yet.
feat(rust): port CodeCompressor AST compressor (stacked on Add Claude Opus 4.5 and Claude 4 model family to context limits #1)
The code_compressor.rs engine + its comparator/fixtures + the pinned tree-sitter-* grammar deps. Also dispatcher-free.
feat(rust): wire both compressors into live-zone dispatch + gate Kompress (stacked on [BUG] Decompression error: ZlibError #2)
The small bit that flips the SourceCode/PlainText TODOs in dispatch_compressor, plus the --enable-kompress flag and changelog. This is where the two engines actually go live.

So #1 and #2 are pure additive engine ports (dead code until #3 lands them), and #3 is the behavior change in isolation. That should make the deep review tractable one engine at a time.
On the fixtures/*.json, these are the parity contract. Each is a recorded {input, config, output} triple captured from the Python reference implementations (headroom/transforms/{kompress,code_compressor}.py) via the recorder scripts. The headroom-parity harness replays each one through the Rust port and asserts byte-identical output (30 code + 21 kompress). For an AST compressor "looks equivalent" isn't good enough — a one-node grammar difference silently changes the compression. The fixtures are how I prove that the Rust output matches Python exactly. They're also what would catch a tree-sitter grammar-version bump regressing parity. I record them rather than generate at test-time so the Python reference isn't a build/test dependency for the Rust crates.
I'll get the stack up shortly and link them here. Also pushed a fmt-only commit that fixes the red lint / cargo fmt checks on this PR in the meantime.

RubenAAA · 2026-06-19T09:06:23Z

As promised, split into a 3-PR stack (each builds green on its own, reviews in order):

feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3) #1153 — Kompress ML prose compressor + parity harness (engine only, not yet wired)
feat(rust): port CodeCompressor AST compressor to Rust (parity-only, 2/3) #1154 — CodeCompressor AST compressor + parity fixtures (engine only) — stacked on feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3) #1153
feat(rust): wire CodeCompressor + Kompress into live-zone dispatch + gate (3/3) #1155 — wire both into live-zone dispatch + --enable-kompress gate (the only behavior change) — stacked on feat(rust): port CodeCompressor AST compressor to Rust (parity-only, 2/3) #1154

Since I don't have push access to base branches here, #1154 and #1155 are stacked on the fork, so their diffs against main show the upstream PRs' content until those merge — review #1153 first and each subsequent diff shrinks as they land. Bodies pass the PR-governance template check, and the red fmt/lint checks are fixed in each branch.

Happy to close this PR in favor of the stack once you've had a look — leaving it open for now so the discussion thread isn't lost.

JerrettDavis · 2026-06-19T13:14:39Z

This PR is still a draft and GitHub reports mergeStateStatus=UNSTABLE, so I am not approving it in the ready-to-merge pass. Please mark it ready and get required CI green before final review.

Ruben Avanesov added 6 commits June 19, 2026 01:24

docs(changelog): note Rust CodeCompressor + gated Kompress live-zone …

77615ef

…compressors

style(parity): apply ruff format + import sort to fixture tooling

45cd045

github-actions Bot added the status: needs author action Pull request body or readiness checklist still needs author updates label Jun 19, 2026

style: apply cargo fmt + ruff format to satisfy CI lint gates

a127430

This was referenced Jun 19, 2026

feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3) #1153

Open

feat(rust): port CodeCompressor AST compressor to Rust (parity-only, 2/3) #1154

Open

feat(rust): wire CodeCompressor + Kompress into live-zone dispatch + gate (3/3) #1155

Open

RubenAAA marked this pull request as draft June 19, 2026 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated)#1143

feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated)#1143
RubenAAA wants to merge 7 commits into
chopratejas:mainfrom
RubenAAA:feat/rust-kompress-port

RubenAAA commented Jun 19, 2026

Uh oh!

github-actions Bot commented Jun 19, 2026 •

edited

Loading

Uh oh!

chopratejas commented Jun 19, 2026

Uh oh!

codecov Bot commented Jun 19, 2026

Uh oh!

RubenAAA commented Jun 19, 2026

Uh oh!

RubenAAA commented Jun 19, 2026

Uh oh!

JerrettDavis commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

RubenAAA commented Jun 19, 2026

Description

Type of Change

Changes Made

Design decisions & rationale

Dependencies & supply chain

Testing

Test Output

Real Behavior Proof

Review Readiness

Checklist

Additional Notes

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR governance

Uh oh!

chopratejas commented Jun 19, 2026

Uh oh!

codecov Bot commented Jun 19, 2026

Codecov Report

Uh oh!

RubenAAA commented Jun 19, 2026

Uh oh!

RubenAAA commented Jun 19, 2026

Uh oh!

JerrettDavis commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 19, 2026 •

edited

Loading