Skip to content

feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated)#1143

Draft
RubenAAA wants to merge 7 commits into
chopratejas:mainfrom
RubenAAA:feat/rust-kompress-port
Draft

feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated)#1143
RubenAAA wants to merge 7 commits into
chopratejas:mainfrom
RubenAAA:feat/rust-kompress-port

Conversation

@RubenAAA

Copy link
Copy Markdown

Description

Ports two of the live-zone compressors documented in the README architecture to the Rust headroom-core engine, and wires both into the live-zone dispatcher (they fill the SourceCode / PlainText slots that dispatch_compressor already had reserved with TODOs):

  • CodeCompressor — an AST, syntax-preserving source-code compressor (tree-sitter), a byte-for-byte port of headroom/transforms/code_compressor.py (CodeAwareCompressor).
  • Kompress — the ML prose compressor (kompress-v2-base ONNX + ModernBERT tokenizer), a byte-for-byte port of headroom/transforms/kompress_compressor.py, gated behind --enable-kompress because it loads a ~261 MB model.

Both are verified against the Python reference via the existing headroom-parity harness (replays fixtures recorded from the Python implementations and asserts identical output). The Kompress engine landed first in this branch; the CodeCompressor port, the dispatch wiring, and the gate follow.

No tracking issue — this completes the in-repo PR-B4 / "Rust port pending" TODOs in crates/headroom-core/src/transforms/live_zone.rs (the dispatcher and a contract test were written to be flipped once these crates landed).

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • headroom-core: transforms/code_compressor.rs — full CodeCompressor port: language detection (regex prefilter → fewest-errors tree-sitter), symbol-importance scoring (min-max normalized, CPython half-to-even rounding), per-function body-budget allocation, statement-level body truncation with # [N lines omitted; calls: …] summaries, Python docstring handling (first_line / full / remove, incl. multi-line first-line reconstruction), and a re-parse syntax-validity guard that returns the original on any invalid output.
  • headroom-core: transforms/kompress.rs — adds Kompress::from_cache (cache-only constructor) + hf_cache_file resolver.
  • headroom-core: transforms/live_zone.rsdispatch_compressor now routes SourceCode → CodeCompressor and PlainText → Kompress (cache-only, gated). Adds set_kompress_enabled + warm_live_zone_compressors.
  • headroom-proxy: config.rs / main.rs--enable-kompress / HEADROOM_PROXY_ENABLE_KOMPRESS flag (default off); startup sets the gate and fires an off-request-path cache-only warm-up when enabled.
  • headroom-parityCodeCompressorComparator; recorder + scripts/record_code_compressor_fixtures.py; 30 code + 21 kompress fixtures.
  • Testscode_compressor_parity.rs integration test; flipped/added live-zone dispatch routing tests; module unit tests (rounding parity, detection, passthrough).
  • CHANGELOG.md — Unreleased entries for both compressors.

Design decisions & rationale

  • Grammar-version parity is the make-or-break invariant. Byte-parity for an AST compressor requires the Rust and Python parsers to produce node-for-node identical trees. The Rust tree-sitter-<lang> crates are pinned (=) to the exact versions of the Python tree-sitter-* wheels the fixtures were recorded against — same version number on crates.io + PyPI means the same grammar.js source, hence the same generated parser.c. A canary over 9 samples × 8 languages confirmed 100% identical node-type + line-span trees before a line of the compressor was written (see Real Behavior Proof).
  • CodeCompressor is always-on; Kompress is gated. CodeCompressor's grammars are statically linked — its singleton constructs in microseconds with no I/O — so it dispatches synchronously like Diff/Log/Search. Kompress carries a ~261 MB model, so it is opt-in (--enable-kompress, default off), mirroring the Python reference's config.enable_kompress.
  • Kompress loads cache-only. Kompress::from_cache resolves the model + tokenizer from the local HF cache and never downloads — the Rust mirror of the Python reference's allow_download=False preload path. A cold cache yields None, and the dispatcher passes plain text through unchanged (exactly as Python does when Kompress is unavailable), so neither a request nor the startup bind can block on a download.
  • CCR markers are left to the dispatcher. Both engines return the compressed string only; the Python inline # [N tokens compressed… hash=] / [… hash=] markers are intentionally not reproduced (live-zone CCR uses the <<ccr:>> convention). Fixtures are therefore recorded with enable_ccr=False for determinism.
  • Parity scope. Slicing the Python reference does by byte offset into a str is reproduced as correct UTF-8 byte slicing; for ASCII these are identical, so fixtures are ASCII (non-ASCII identifiers are the one documented out-of-scope case, noted in the module docs).

Dependencies & supply chain

This PR adds Rust crates only (no new Python deps; the tree-sitter-* Python wheels are used at fixture-record time, not at runtime).

  • tree-sitter (0.25.2) + tree-sitter-{python,javascript,typescript,go,rust,java,c,cpp} — the canonical tree-sitter org grammars. Why these: required for AST code compression; they are the same grammars the Python reference uses, which is what makes byte-parity attainable. Maintenance: the official tree-sitter/tree-sitter-* projects, actively released. Install surface: each compiles a vendored C parser.c via cc (already required by rusqlite's bundled build); no install/runtime network. Why these versions: required new functionality — pinned = to match the Python wheels exactly; bumping any pin requires re-running the canary and re-recording fixtures.
  • ort (2.0.0-rc.12) — already in the tree via fastembed/magika; pulled as a direct dep so the Kompress port shares the single ONNX Runtime instance (no second runtime, no extra download). Cache-only at runtime.

Testing

  • New tests added for new functionality
  • Linting passes (ruff check . / cargo clippy -D warnings)
  • Unit + integration tests pass (Rust)
  • pytest / mypy headroom — N/A (Rust-only change; the only Python touched is dev-time fixture tooling under tests/parity/, which passes ruff check + ruff format)
  • Manual testing performed (see Real Behavior Proof)

Test Output

# Byte-parity: Rust port vs Python reference (headroom-parity harness)
[code_aware_compressor] total=30 matched=30 skipped=0 diffed=0
[kompress             ] total=21 matched=21 skipped=0 diffed=0
# (full run: diff/tokenizer/smart_crusher/content_detector all green;
#  log/cache_aligner/ccr are pre-existing Phase-0 stubs, unchanged)

# Rust test suites
cargo test -p headroom-core   ->  918 passed, 3 ignored (14 suites)
cargo test -p headroom-proxy  ->  407 passed (34 suites)
cargo test -p headroom-core --test live_zone_dispatch  ->  7 passed

# Lint
cargo clippy -p headroom-core -p headroom-parity -p headroom-proxy -- -D warnings  ->  No issues found
ruff check tests/parity/recorder.py scripts/record_code_compressor_fixtures.py    ->  All checks passed!

Real Behavior Proof

  • Environment: Linux (WSL2, kernel 6.6), Rust stable toolchain, Python 3.13.9 for reference recording. kompress-v2-base + ModernBERT-base present in the local HF cache.

  • Exact steps:

    1. Grammar canary — dump node-type + line-span trees from the Python tree-sitter stack and the pinned Rust crates over identical samples, then compare.
    2. cargo run -p headroom-parity -- run --only code_aware_compressor and --only kompress.
    3. Run the CodeCompressor over a sample and inspect the compressed output.
  • Observed result — grammar canary (precondition):

IDENTICAL  c:basic            (222 nodes)
IDENTICAL  cpp:basic          (216 nodes)
IDENTICAL  go:basic           (225 nodes)
IDENTICAL  java:basic         (234 nodes)
IDENTICAL  javascript:basic   (212 nodes)
IDENTICAL  python:basic       (279 nodes)
IDENTICAL  python:nodoc       (163 nodes)
IDENTICAL  rust:basic         (261 nodes)
IDENTICAL  typescript:basic   (214 nodes)
=> 9/9 languages produce byte-identical ASTs (node type + line spans)
  • Observed result — live compression (CodeCompressor, Python source, ratio 0.525):
# INPUT (567 chars): three functions load()/transform()/run()  -->  COMPRESSED OUTPUT:
import json

def load(path):
    with open(path) as fh:
        data = json.load(fh)
    # [6 lines omitted]
    pass
def transform(records, factor):
    out = []
    # [6 lines omitted]
    pass
def run(path, factor):
    data = load(path)
    # [3 lines omitted; calls: load, transform]
    pass
# symbol_scores: {"load": 1.0, "run": 0.0, "transform": 0.0} | language: python (conf 1.0)

Signatures, imports, the first statement of each body, and inter-function call edges are preserved; the rest is summarized. The Rust port reproduces this output byte-for-byte (parity 30/30 above). Output re-parses cleanly (the syntax-validity guard returns the original otherwise).

  • Not tested: non-ASCII identifiers/string contents (documented out-of-parity-scope); end-to-end against a live provider (the dispatcher is covered by integration tests, not live traffic in this PR); NPU/alternate ONNX execution providers (Kompress runs on the default CPU EP here — EP selection is independent of this change).

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (CHANGELOG + module docs)
  • My changes generate no new warnings
  • I have added tests that prove my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md

Additional Notes

  • Independent of the OpenVINO EP branch/PR. This branch is based directly on main and shares no commits with the OpenVINO execution-provider work; its only ort touch is the direct-dependency line that unifies onto the existing ONNX Runtime instance.
  • Follow-ups (out of scope here): calling warm_live_zone_compressors() from proxy startup is wired behind the gate but the eager warm is optional (the lazy cache-only path works without it); Kompress live-zone CCR offload of dropped words via <<ccr:>> is handled by the dispatcher layer, not this engine.
  • Python checklist items (pytest/mypy) are N/A — this is a Rust change; the equivalent gates are cargo test + cargo clippy, run above.

Ruben Avanesov added 6 commits June 19, 2026 01:24
Ports `headroom.transforms.kompress_compressor` (the ModernBERT token
compressor behind PlainText compression) to Rust — the last unported
compressor alongside CodeCompressor. Engine only; live-zone dispatch
wiring follows in a separate PR (matches how the other compressors
landed: port + parity first, then wire).

What it does:
- Loads the trained `chopratejas/kompress-v2-base` ONNX model (the
  inference weights) + the `answerdotai/ModernBERT-base` tokenizer (a
  fine-tune reuses its base vocab). Runs the ONNX/proxy compression
  path: whitespace-split → 350-word chunks → pre-tokenized encode
  (is_split_into_words) → ONNX `final_scores` → max-score-per-word →
  keep `> 0.5` → join kept words. <10 words passes through.
- Inference via `ort` (direct dep added here — first session-API
  consumer; unifies to the ORT instance fastembed/magika already
  vendor). Tokenization via the `tokenizers` crate. Both reproduce the
  Python `transformers`/onnxruntime path exactly.

Parity (byte-exact against the Python reference):
- Tokenizer `input_ids`/`word_ids` reproduce HF on pre-tokenized input.
- ONNX scores match to ~1e-6 (far below the 0.5 keep threshold).
- Kept-word set + joined output match byte-for-byte.
- `KompressComparator` wired into the parity harness; 21 fixtures
  recorded via recorder.py (enable_ccr=False → deterministic output).
  `cargo run -p headroom-parity -- run --only kompress`: 21/21 matched.
  Model-gated: skips (not fails) when the model isn't in the HF cache.

Tests:
- crates/headroom-core/tests/kompress_parity.rs (byte-parity + passthrough)
- module unit tests (config defaults, result helpers)
- clippy clean, no regressions across the full parity suite.

CCR offload of dropped words is left to the dispatcher (the `<<ccr:>>`
convention), not this engine — the inline Python marker is intentionally
not reproduced.
…harness

Ports headroom.transforms.code_compressor (CodeAwareCompressor) to Rust on the
same branch as the Kompress port, so one PR delivers both new compressors
(SmartCrusher is already upstream).

Engine (crates/headroom-core/src/transforms/code_compressor.rs):
- tree-sitter AST parsing for python/js/ts/go/rust/java/c/cpp
- language detection (regex prefilter -> fewest-errors tree-sitter)
- symbol-importance scoring (min-max normalized, round-3 half-even) + body
  budget allocation, statement-level body truncation, omitted-line comments
  with call info, Python docstring handling (first_line/full/remove incl.
  multiline first-line reconstruction), syntax-validity guard (re-parse;
  return original on ERROR/MISSING).

Grammar-version parity is the precondition: the Rust tree-sitter-<lang> crates
are pinned to the exact versions of the Python wheels the fixtures were
recorded against (same version number on crates.io + PyPI = same grammar
source = identical ASTs). A canary over 9 samples x 8 languages confirmed
node-for-node identical node-type + line-span trees at these pins.

Parity: cargo run -p headroom-parity -- run --only code_aware_compressor
-> 30/30 matched (18 non-trivial across all 8 languages + unknown +
invalid-syntax fallback; all 3 docstring modes). py_round_int/py_round3
half-to-even verified against CPython. Integration test +
record_code_compressor_fixtures.py mirror the Kompress harness. Fixtures
recorded with enable_ccr=False + fallback_to_kompress=False for determinism;
live-zone dispatch wiring (SourceCode slot) is a deliberate follow-up.
Mirrors the Python content_router so the two newly-ported compressors
actually run in the proxy's live zone (they were engine-only before).

- ContentType::SourceCode → CodeCompressor. The grammars are statically
  linked, so the singleton constructs in microseconds — a synchronous
  one-liner like Diff/Log/Search. Flips the existing
  `source_code_tool_result_routes_to_no_op` contract test (whose comment
  invited a future "wire it up" PR) to assert code_aware_compressor.

- ContentType::PlainText → Kompress, loaded CACHE-ONLY. This mirrors the
  Python reference's `allow_download=False` preload path: never download
  on a hot/startup thread; when the ~261 MB model isn't in the local HF
  cache, yield None and pass the text through unchanged, exactly as Python
  does when Kompress is unavailable. New `Kompress::from_cache` constructor
  + `hf_cache_file` resolver back this. The PlainText routing test is
  model-gated (asserts strategy "kompress" when cached, passthrough when
  not), matching the kompress parity test's gating.

- `warm_live_zone_compressors()` (exported) mirrors Python's
  `eager_load_compressors`: force the cache-only singletons off the request
  path. Proxy startup can call it; the lazy path works without it.

No regressions: headroom-core 918 tests + 7 live_zone_dispatch tests pass,
full parity unchanged (all comparators green), clippy clean across
core/parity/proxy, workspace builds.
Kompress carries a ~261 MB ONNX model, so — unlike the always-on structural
compressors and the AST CodeCompressor — it now loads only when an operator
opts in, mirroring the Python reference's `config.enable_kompress`.

- core: process-wide `KOMPRESS_ENABLED` (default off) + `set_kompress_enabled`.
  `kompress()` checks it before the OnceLock, so a disabled proxy never loads
  the model and PlainText passes through. `warm_live_zone_compressors` only
  warms Kompress when enabled.
- proxy: `--enable-kompress` / `HEADROOM_PROXY_ENABLE_KOMPRESS` flag (default
  false), threaded through CliArgs → Config → for_test. main.rs sets the gate
  from config and, when enabled, fires a `spawn_blocking` cache-only warm-up
  off the request path (mirrors Python's `eager_load_compressors`) — a cold
  cache just leaves it deferred rather than stalling the bind.
- the model-gated PlainText dispatch test enables the gate explicitly, like
  the proxy does at startup.

No regressions: core 918 passed, proxy 407 passed, clippy clean across
core/parity/proxy, workspace builds.
@github-actions

github-actions Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR does not yet satisfy the required template fields:

  • Fill in Real Behavior ProofEnvironment.
  • Fill in Real Behavior ProofExact command / steps.
  • Fill in Real Behavior ProofObserved result.
  • Fill in Real Behavior ProofNot tested.

Please update the PR body, or move the PR back to draft while it is still in progress.

@github-actions github-actions Bot added the status: needs author action Pull request body or readiness checklist still needs author updates label Jun 19, 2026
@chopratejas

Copy link
Copy Markdown
Owner

This will need a deep review - any chance this can be broken into multiple PRs?
Also - i see a lot of fixtures/*.json files - can you help me understand why we need those?

@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@RubenAAA

Copy link
Copy Markdown
Author

This will need a deep review - any chance this can be broken into multiple PRs? Also - i see a lot of fixtures/*.json files - can you help me understand why we need those?

For a bit of context: I'd actually been building something along these lines independently on my own side, so I was genuinely happy to find this repo. It's a much better foundation than what I had. My setup needs the engine in Rust (both for how it's wired in and for the speed), which is what motivated porting these compressors over rather than calling the Python ones.

Sorry for the size, I'll break it into a stack of 3 PRs that review in order. Each builds green on its own:

  1. feat(rust): port Kompress ML prose compressor + parity harness
    The kompress.rs engine + the headroom-parity Kompress comparator/fixtures. Self-contained — adds the engine module but doesn't touch the dispatcher yet.
  2. feat(rust): port CodeCompressor AST compressor (stacked on Add Claude Opus 4.5 and Claude 4 model family to context limits #1)
    The code_compressor.rs engine + its comparator/fixtures + the pinned tree-sitter-* grammar deps. Also dispatcher-free.
  3. feat(rust): wire both compressors into live-zone dispatch + gate Kompress (stacked on [BUG] Decompression error: ZlibError #2)
    The small bit that flips the SourceCode/PlainText TODOs in dispatch_compressor, plus the --enable-kompress flag and changelog. This is where the two engines actually go live.

So #1 and #2 are pure additive engine ports (dead code until #3 lands them), and #3 is the behavior change in isolation. That should make the deep review tractable one engine at a time.
On the fixtures/*.json, these are the parity contract. Each is a recorded {input, config, output} triple captured from the Python reference implementations (headroom/transforms/{kompress,code_compressor}.py) via the recorder scripts. The headroom-parity harness replays each one through the Rust port and asserts byte-identical output (30 code + 21 kompress). For an AST compressor "looks equivalent" isn't good enough — a one-node grammar difference silently changes the compression. The fixtures are how I prove that the Rust output matches Python exactly. They're also what would catch a tree-sitter grammar-version bump regressing parity. I record them rather than generate at test-time so the Python reference isn't a build/test dependency for the Rust crates.
I'll get the stack up shortly and link them here. Also pushed a fmt-only commit that fixes the red lint / cargo fmt checks on this PR in the meantime.

@RubenAAA

Copy link
Copy Markdown
Author

As promised, split into a 3-PR stack (each builds green on its own, reviews in order):

Since I don't have push access to base branches here, #1154 and #1155 are stacked on the fork, so their diffs against main show the upstream PRs' content until those merge — review #1153 first and each subsequent diff shrinks as they land. Bodies pass the PR-governance template check, and the red fmt/lint checks are fixed in each branch.

Happy to close this PR in favor of the stack once you've had a look — leaving it open for now so the discussion thread isn't lost.

@RubenAAA RubenAAA marked this pull request as draft June 19, 2026 09:07
@JerrettDavis

Copy link
Copy Markdown
Collaborator

This PR is still a draft and GitHub reports mergeStateStatus=UNSTABLE, so I am not approving it in the ready-to-merge pass. Please mark it ready and get required CI green before final review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: needs author action Pull request body or readiness checklist still needs author updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants