feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3) by RubenAAA · Pull Request #1153 · headroomlabs-ai/headroom

RubenAAA · 2026-06-19T09:05:59Z

Description

Ports the Kompress ML prose compressor to the Rust headroom-core engine — a byte-for-byte port of headroom/transforms/kompress_compressor.py (kompress-v2-base ONNX + ModernBERT tokenizer). This is PR 1 of 3 splitting #1143 into reviewable pieces: this PR adds the engine + the parity harness only; it does not wire Kompress into the live-zone dispatcher (that is PR 3). The engine is therefore additive and inert until dispatch lands.

The port is verified against the Python reference via the headroom-parity harness (replays recorded fixtures and asserts identical output).

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Performance improvement
Code refactoring (no functional changes)

Changes Made

headroom-core: transforms/kompress.rs — full Kompress engine (ModernBERT tokenizer + ONNX inference session via the shared ort runtime), plus Kompress::from_cache cache-only constructor and hf_cache_file resolver. Cache-only: a cold cache yields None and never downloads (mirrors the Python allow_download=False path).
headroom-core: transforms/mod.rs — exports the new module.
headroom-parity: KompressComparator + recorder (scripts/record_kompress_fixtures.py) + 21 recorded fixtures under tests/parity/fixtures/kompress/.
Cargo.toml: adds ort as a direct dependency, pinned (=2.0.0-rc.12) to the existing lock entry so Cargo unifies onto the single ONNX Runtime already in the tree via fastembed/magika (no second runtime, no extra binary download).

Testing

Unit tests pass (pytest) — N/A for runtime (Rust change); dev-time fixture tooling under scripts//tests/parity/ passes ruff check + ruff format
Linting passes (ruff check .) — also cargo clippy -D warnings
Type checking passes (mypy headroom) — N/A (Rust-only runtime change)
New tests added for new functionality
Manual testing performed

Test Output

# Byte-parity: Rust port vs Python reference (headroom-parity harness)
cargo run -p headroom-parity -- run --only kompress
[kompress] total=21 matched=21 skipped=0 diffed=0

# Rust suites
cargo test -p headroom-core   ->  911 passed, 3 ignored (13 suites)
cargo clippy -p headroom-core -p headroom-parity -- -D warnings  ->  No issues found
cargo fmt --check  ->  clean
ruff check . && ruff format --check .  ->  All checks passed

Real Behavior Proof

Environment: Linux (WSL2, kernel 6.6), Rust stable toolchain, Python 3.13 for reference recording; kompress-v2-base + ModernBERT-base present in the local HF cache.
Exact command / steps: cargo run -p headroom-parity -- run --only kompress; then cargo test -p headroom-core.
Observed result: 21/21 kompress fixtures match the Python reference byte-for-byte; cargo test -p headroom-core reports 911 passed, 3 ignored; clippy and fmt clean.
Not tested: live-zone dispatch routing (intentionally not wired here — lands in PR 3); non-ASCII identifiers/content (documented out-of-parity-scope); alternate ONNX execution providers (default CPU EP here).

Review Readiness

I have performed a self-review
This PR is ready for human review

Additional Notes

Split of feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated) #1143 — PR 1 of 3. PR 2 ports the CodeCompressor; PR 3 wires both into the live-zone dispatcher and adds the --enable-kompress gate. PRs 2 and 3 are stacked on this branch, so until this merges their diffs against main show this PR's content as well.
pytest/mypy checklist items are N/A — this is a Rust change; the equivalent gates are cargo test + cargo clippy, run above.

Update — Windows/NPU engine support (one commit added)

Pushed feat(kompress): Windows cache resolution, load diagnostics, static-shape ONNX — three self-contained additions to the cache-only loader, all in kompress.rs, validated against real NPU hardware downstream:

Cross-platform HF cache resolution — resolve the model via HF_HUB_CACHE / HF_HOME / HOME / USERPROFILE (was $HOME-only), so from_cache finds the model on native Windows, not just Linux.
Load diagnostics — from_cache now logs why it defers (kompress_cache_miss / kompress_session_build_failed, with the searched roots + the real session-build error) instead of returning None silently. This was the Add Claude Opus 4.5 and Claude 4 model family to context limits #1 cause of an unexplained kompress_ready=false (e.g. HF cache symlinks being unreadable over a \\wsl$ mount, or an EP rejecting the graph).
Static-shape ONNX support — detect a fixed input_ids seq dimension on the loaded model (detect_static_seq) and right-pad each chunk to it (masked padding ⇒ identical scores); dynamic models keep their natural length at zero padding cost. Enables execution providers that can't compile dynamic shapes (OpenVINO NPU, see feat(proxy): HEADROOM_ORT_EP for OpenVINO/CUDA EP selection; fix Phase E skips when auth-mode-policy-enforcement disabled #1139).

Parity: 21/21 in both static and dynamic modes. cargo fmt/clippy/ruff clean, cargo test -p headroom-core green.

Note: the stack was rebased — from_cache/hf_cache_file (previously introduced in the wiring PR #1155) now live here, in the engine, which is their correct home.

Ports `headroom.transforms.kompress_compressor` (the ModernBERT token compressor behind PlainText compression) to Rust — the last unported compressor alongside CodeCompressor. Engine only; live-zone dispatch wiring follows in a separate PR (matches how the other compressors landed: port + parity first, then wire). What it does: - Loads the trained `chopratejas/kompress-v2-base` ONNX model (the inference weights) + the `answerdotai/ModernBERT-base` tokenizer (a fine-tune reuses its base vocab). Runs the ONNX/proxy compression path: whitespace-split → 350-word chunks → pre-tokenized encode (is_split_into_words) → ONNX `final_scores` → max-score-per-word → keep `> 0.5` → join kept words. <10 words passes through. - Inference via `ort` (direct dep added here — first session-API consumer; unifies to the ORT instance fastembed/magika already vendor). Tokenization via the `tokenizers` crate. Both reproduce the Python `transformers`/onnxruntime path exactly. Parity (byte-exact against the Python reference): - Tokenizer `input_ids`/`word_ids` reproduce HF on pre-tokenized input. - ONNX scores match to ~1e-6 (far below the 0.5 keep threshold). - Kept-word set + joined output match byte-for-byte. - `KompressComparator` wired into the parity harness; 21 fixtures recorded via recorder.py (enable_ccr=False → deterministic output). `cargo run -p headroom-parity -- run --only kompress`: 21/21 matched. Model-gated: skips (not fails) when the model isn't in the HF cache. Tests: - crates/headroom-core/tests/kompress_parity.rs (byte-parity + passthrough) - module unit tests (config defaults, result helpers) - clippy clean, no regressions across the full parity suite. CCR offload of dropped words is left to the dispatcher (the `<<ccr:>>` convention), not this engine — the inline Python marker is intentionally not reproduced.

github-actions · 2026-06-19T09:06:14Z

PR governance

This PR follows the template and is marked ready for human review.

…ape ONNX Three self-contained additions to the cache-only loader, all in kompress.rs: - Cross-platform HF cache resolution: resolve the model via HF_HUB_CACHE / HF_HOME / HOME / USERPROFILE (was $HOME-only), so from_cache finds the model on native Windows, not just Linux. - Load diagnostics: from_cache now logs WHY it defers (kompress_cache_miss / kompress_session_build_failed with the searched roots + the real session build error) instead of returning None silently — the #1 cause of an unexplained kompress_ready=false. - Static-shape ONNX support: detect a fixed input_ids seq dimension on the loaded model and right-pad each chunk to it (masked padding => identical scores); dynamic models keep their natural length at zero padding cost. Enables execution providers that cannot compile dynamic shapes (OpenVINO NPU). Parity 21/21 in both static and dynamic modes.

RubenAAA · 2026-06-19T11:15:32Z

Added one engine commit (Windows HF-cache resolution + load diagnostics + static-shape ONNX support) — all self-contained in kompress.rs, parity 21/21 in both static and dynamic modes. Heads-up: I rebased the stack, so from_cache/hf_cache_file (previously in the wiring PR #1155) now live here in the engine, which is their correct home. Description updated with the rationale.

JerrettDavis

This PR is not merge-ready in current GitHub state (mergeStateStatus=UNSTABLE). Please update from current main, resolve any conflicts if present, and rerun/clear required CI before this can be approved.

Ruben Avanesov added 2 commits June 19, 2026 10:56

style: cargo fmt + ruff format

b24dc61

github-actions Bot added the status: ready for review Pull request body is complete and the author marked it ready for human review label Jun 19, 2026

RubenAAA mentioned this pull request Jun 19, 2026

feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated) #1143

Draft

21 tasks

This was referenced Jun 19, 2026

feat(proxy): HEADROOM_ORT_EP for OpenVINO/CUDA EP selection; fix Phase E skips when auth-mode-policy-enforcement disabled #1139

Open

feat(rust): wire CodeCompressor + Kompress into live-zone dispatch + gate (3/3) #1155

Open

JerrettDavis requested changes Jun 19, 2026

View reviewed changes

github-actions Bot added status: ci failing Required or reported CI checks are failing and removed status: ready for review Pull request body is complete and the author marked it ready for human review labels Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3)#1153

feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3)#1153
RubenAAA wants to merge 3 commits into
headroomlabs-ai:mainfrom
RubenAAA:stack/1-kompress-engine

RubenAAA commented Jun 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 19, 2026

Uh oh!

RubenAAA commented Jun 19, 2026

Uh oh!

JerrettDavis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RubenAAA commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Changes Made

Testing

Test Output

Real Behavior Proof

Review Readiness

Additional Notes

Update — Windows/NPU engine support (one commit added)

Uh oh!

github-actions Bot commented Jun 19, 2026

PR governance

Uh oh!

RubenAAA commented Jun 19, 2026

Uh oh!

JerrettDavis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RubenAAA commented Jun 19, 2026 •

edited

Loading