feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3)#1153
Open
RubenAAA wants to merge 3 commits into
Open
feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3)#1153RubenAAA wants to merge 3 commits into
RubenAAA wants to merge 3 commits into
Conversation
added 2 commits
June 19, 2026 10:56
Ports `headroom.transforms.kompress_compressor` (the ModernBERT token compressor behind PlainText compression) to Rust — the last unported compressor alongside CodeCompressor. Engine only; live-zone dispatch wiring follows in a separate PR (matches how the other compressors landed: port + parity first, then wire). What it does: - Loads the trained `chopratejas/kompress-v2-base` ONNX model (the inference weights) + the `answerdotai/ModernBERT-base` tokenizer (a fine-tune reuses its base vocab). Runs the ONNX/proxy compression path: whitespace-split → 350-word chunks → pre-tokenized encode (is_split_into_words) → ONNX `final_scores` → max-score-per-word → keep `> 0.5` → join kept words. <10 words passes through. - Inference via `ort` (direct dep added here — first session-API consumer; unifies to the ORT instance fastembed/magika already vendor). Tokenization via the `tokenizers` crate. Both reproduce the Python `transformers`/onnxruntime path exactly. Parity (byte-exact against the Python reference): - Tokenizer `input_ids`/`word_ids` reproduce HF on pre-tokenized input. - ONNX scores match to ~1e-6 (far below the 0.5 keep threshold). - Kept-word set + joined output match byte-for-byte. - `KompressComparator` wired into the parity harness; 21 fixtures recorded via recorder.py (enable_ccr=False → deterministic output). `cargo run -p headroom-parity -- run --only kompress`: 21/21 matched. Model-gated: skips (not fails) when the model isn't in the HF cache. Tests: - crates/headroom-core/tests/kompress_parity.rs (byte-parity + passthrough) - module unit tests (config defaults, result helpers) - clippy clean, no regressions across the full parity suite. CCR offload of dropped words is left to the dispatcher (the `<<ccr:>>` convention), not this engine — the inline Python marker is intentionally not reproduced.
Contributor
PR governanceThis PR follows the template and is marked ready for human review. |
21 tasks
…ape ONNX Three self-contained additions to the cache-only loader, all in kompress.rs: - Cross-platform HF cache resolution: resolve the model via HF_HUB_CACHE / HF_HOME / HOME / USERPROFILE (was $HOME-only), so from_cache finds the model on native Windows, not just Linux. - Load diagnostics: from_cache now logs WHY it defers (kompress_cache_miss / kompress_session_build_failed with the searched roots + the real session build error) instead of returning None silently — the #1 cause of an unexplained kompress_ready=false. - Static-shape ONNX support: detect a fixed input_ids seq dimension on the loaded model and right-pad each chunk to it (masked padding => identical scores); dynamic models keep their natural length at zero padding cost. Enables execution providers that cannot compile dynamic shapes (OpenVINO NPU). Parity 21/21 in both static and dynamic modes.
Author
|
Added one engine commit (Windows HF-cache resolution + load diagnostics + static-shape ONNX support) — all self-contained in |
JerrettDavis
requested changes
Jun 19, 2026
JerrettDavis
left a comment
Collaborator
There was a problem hiding this comment.
This PR is not merge-ready in current GitHub state (mergeStateStatus=UNSTABLE). Please update from current main, resolve any conflicts if present, and rerun/clear required CI before this can be approved.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Ports the Kompress ML prose compressor to the Rust
headroom-coreengine — a byte-for-byte port ofheadroom/transforms/kompress_compressor.py(kompress-v2-baseONNX + ModernBERT tokenizer). This is PR 1 of 3 splitting #1143 into reviewable pieces: this PR adds the engine + the parity harness only; it does not wire Kompress into the live-zone dispatcher (that is PR 3). The engine is therefore additive and inert until dispatch lands.The port is verified against the Python reference via the
headroom-parityharness (replays recorded fixtures and asserts identical output).Type of Change
Changes Made
headroom-core:transforms/kompress.rs— full Kompress engine (ModernBERT tokenizer + ONNX inference session via the sharedortruntime), plusKompress::from_cachecache-only constructor andhf_cache_fileresolver. Cache-only: a cold cache yieldsNoneand never downloads (mirrors the Pythonallow_download=Falsepath).headroom-core:transforms/mod.rs— exports the new module.headroom-parity:KompressComparator+ recorder (scripts/record_kompress_fixtures.py) + 21 recorded fixtures undertests/parity/fixtures/kompress/.Cargo.toml: addsortas a direct dependency, pinned (=2.0.0-rc.12) to the existing lock entry so Cargo unifies onto the single ONNX Runtime already in the tree viafastembed/magika(no second runtime, no extra binary download).Testing
pytest) — N/A for runtime (Rust change); dev-time fixture tooling underscripts//tests/parity/passesruff check+ruff formatruff check .) — alsocargo clippy -D warningsmypy headroom) — N/A (Rust-only runtime change)Test Output
Real Behavior Proof
kompress-v2-base+ModernBERT-basepresent in the local HF cache.cargo run -p headroom-parity -- run --only kompress; thencargo test -p headroom-core.cargo test -p headroom-corereports 911 passed, 3 ignored; clippy and fmt clean.Review Readiness
Additional Notes
--enable-kompressgate. PRs 2 and 3 are stacked on this branch, so until this merges their diffs againstmainshow this PR's content as well.pytest/mypychecklist items are N/A — this is a Rust change; the equivalent gates arecargo test+cargo clippy, run above.Update — Windows/NPU engine support (one commit added)
Pushed
feat(kompress): Windows cache resolution, load diagnostics, static-shape ONNX— three self-contained additions to the cache-only loader, all inkompress.rs, validated against real NPU hardware downstream:HF_HUB_CACHE/HF_HOME/HOME/USERPROFILE(was$HOME-only), sofrom_cachefinds the model on native Windows, not just Linux.from_cachenow logs why it defers (kompress_cache_miss/kompress_session_build_failed, with the searched roots + the real session-build error) instead of returningNonesilently. This was the Add Claude Opus 4.5 and Claude 4 model family to context limits #1 cause of an unexplainedkompress_ready=false(e.g. HF cache symlinks being unreadable over a\\wsl$mount, or an EP rejecting the graph).input_idsseq dimension on the loaded model (detect_static_seq) and right-pad each chunk to it (masked padding ⇒ identical scores); dynamic models keep their natural length at zero padding cost. Enables execution providers that can't compile dynamic shapes (OpenVINO NPU, see feat(proxy): HEADROOM_ORT_EP for OpenVINO/CUDA EP selection; fix Phase E skips when auth-mode-policy-enforcement disabled #1139).Parity: 21/21 in both static and dynamic modes.
cargo fmt/clippy/ruffclean,cargo test -p headroom-coregreen.