Skip to content

feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3)#1153

Open
RubenAAA wants to merge 3 commits into
headroomlabs-ai:mainfrom
RubenAAA:stack/1-kompress-engine
Open

feat(rust): port Kompress ML prose compressor to Rust (parity-only, 1/3)#1153
RubenAAA wants to merge 3 commits into
headroomlabs-ai:mainfrom
RubenAAA:stack/1-kompress-engine

Conversation

@RubenAAA

@RubenAAA RubenAAA commented Jun 19, 2026

Copy link
Copy Markdown

Description

Ports the Kompress ML prose compressor to the Rust headroom-core engine — a byte-for-byte port of headroom/transforms/kompress_compressor.py (kompress-v2-base ONNX + ModernBERT tokenizer). This is PR 1 of 3 splitting #1143 into reviewable pieces: this PR adds the engine + the parity harness only; it does not wire Kompress into the live-zone dispatcher (that is PR 3). The engine is therefore additive and inert until dispatch lands.

The port is verified against the Python reference via the headroom-parity harness (replays recorded fixtures and asserts identical output).

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • headroom-core: transforms/kompress.rs — full Kompress engine (ModernBERT tokenizer + ONNX inference session via the shared ort runtime), plus Kompress::from_cache cache-only constructor and hf_cache_file resolver. Cache-only: a cold cache yields None and never downloads (mirrors the Python allow_download=False path).
  • headroom-core: transforms/mod.rs — exports the new module.
  • headroom-parity: KompressComparator + recorder (scripts/record_kompress_fixtures.py) + 21 recorded fixtures under tests/parity/fixtures/kompress/.
  • Cargo.toml: adds ort as a direct dependency, pinned (=2.0.0-rc.12) to the existing lock entry so Cargo unifies onto the single ONNX Runtime already in the tree via fastembed/magika (no second runtime, no extra binary download).

Testing

  • Unit tests pass (pytest) — N/A for runtime (Rust change); dev-time fixture tooling under scripts//tests/parity/ passes ruff check + ruff format
  • Linting passes (ruff check .) — also cargo clippy -D warnings
  • Type checking passes (mypy headroom) — N/A (Rust-only runtime change)
  • New tests added for new functionality
  • Manual testing performed

Test Output

# Byte-parity: Rust port vs Python reference (headroom-parity harness)
cargo run -p headroom-parity -- run --only kompress
[kompress] total=21 matched=21 skipped=0 diffed=0

# Rust suites
cargo test -p headroom-core   ->  911 passed, 3 ignored (13 suites)
cargo clippy -p headroom-core -p headroom-parity -- -D warnings  ->  No issues found
cargo fmt --check  ->  clean
ruff check . && ruff format --check .  ->  All checks passed

Real Behavior Proof

  • Environment: Linux (WSL2, kernel 6.6), Rust stable toolchain, Python 3.13 for reference recording; kompress-v2-base + ModernBERT-base present in the local HF cache.
  • Exact command / steps: cargo run -p headroom-parity -- run --only kompress; then cargo test -p headroom-core.
  • Observed result: 21/21 kompress fixtures match the Python reference byte-for-byte; cargo test -p headroom-core reports 911 passed, 3 ignored; clippy and fmt clean.
  • Not tested: live-zone dispatch routing (intentionally not wired here — lands in PR 3); non-ASCII identifiers/content (documented out-of-parity-scope); alternate ONNX execution providers (default CPU EP here).

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Additional Notes

  • Split of feat(rust): port CodeCompressor + Kompress live-zone compressors to Rust (parity-gated) #1143 — PR 1 of 3. PR 2 ports the CodeCompressor; PR 3 wires both into the live-zone dispatcher and adds the --enable-kompress gate. PRs 2 and 3 are stacked on this branch, so until this merges their diffs against main show this PR's content as well.
  • pytest/mypy checklist items are N/A — this is a Rust change; the equivalent gates are cargo test + cargo clippy, run above.

Update — Windows/NPU engine support (one commit added)

Pushed feat(kompress): Windows cache resolution, load diagnostics, static-shape ONNX — three self-contained additions to the cache-only loader, all in kompress.rs, validated against real NPU hardware downstream:

  • Cross-platform HF cache resolution — resolve the model via HF_HUB_CACHE / HF_HOME / HOME / USERPROFILE (was $HOME-only), so from_cache finds the model on native Windows, not just Linux.
  • Load diagnosticsfrom_cache now logs why it defers (kompress_cache_miss / kompress_session_build_failed, with the searched roots + the real session-build error) instead of returning None silently. This was the Add Claude Opus 4.5 and Claude 4 model family to context limits #1 cause of an unexplained kompress_ready=false (e.g. HF cache symlinks being unreadable over a \\wsl$ mount, or an EP rejecting the graph).
  • Static-shape ONNX support — detect a fixed input_ids seq dimension on the loaded model (detect_static_seq) and right-pad each chunk to it (masked padding ⇒ identical scores); dynamic models keep their natural length at zero padding cost. Enables execution providers that can't compile dynamic shapes (OpenVINO NPU, see feat(proxy): HEADROOM_ORT_EP for OpenVINO/CUDA EP selection; fix Phase E skips when auth-mode-policy-enforcement disabled #1139).

Parity: 21/21 in both static and dynamic modes. cargo fmt/clippy/ruff clean, cargo test -p headroom-core green.

Note: the stack was rebased — from_cache/hf_cache_file (previously introduced in the wiring PR #1155) now live here, in the engine, which is their correct home.

Ruben Avanesov added 2 commits June 19, 2026 10:56
Ports `headroom.transforms.kompress_compressor` (the ModernBERT token
compressor behind PlainText compression) to Rust — the last unported
compressor alongside CodeCompressor. Engine only; live-zone dispatch
wiring follows in a separate PR (matches how the other compressors
landed: port + parity first, then wire).

What it does:
- Loads the trained `chopratejas/kompress-v2-base` ONNX model (the
  inference weights) + the `answerdotai/ModernBERT-base` tokenizer (a
  fine-tune reuses its base vocab). Runs the ONNX/proxy compression
  path: whitespace-split → 350-word chunks → pre-tokenized encode
  (is_split_into_words) → ONNX `final_scores` → max-score-per-word →
  keep `> 0.5` → join kept words. <10 words passes through.
- Inference via `ort` (direct dep added here — first session-API
  consumer; unifies to the ORT instance fastembed/magika already
  vendor). Tokenization via the `tokenizers` crate. Both reproduce the
  Python `transformers`/onnxruntime path exactly.

Parity (byte-exact against the Python reference):
- Tokenizer `input_ids`/`word_ids` reproduce HF on pre-tokenized input.
- ONNX scores match to ~1e-6 (far below the 0.5 keep threshold).
- Kept-word set + joined output match byte-for-byte.
- `KompressComparator` wired into the parity harness; 21 fixtures
  recorded via recorder.py (enable_ccr=False → deterministic output).
  `cargo run -p headroom-parity -- run --only kompress`: 21/21 matched.
  Model-gated: skips (not fails) when the model isn't in the HF cache.

Tests:
- crates/headroom-core/tests/kompress_parity.rs (byte-parity + passthrough)
- module unit tests (config defaults, result helpers)
- clippy clean, no regressions across the full parity suite.

CCR offload of dropped words is left to the dispatcher (the `<<ccr:>>`
convention), not this engine — the inline Python marker is intentionally
not reproduced.
@github-actions

Copy link
Copy Markdown
Contributor

PR governance

This PR follows the template and is marked ready for human review.

@github-actions github-actions Bot added the status: ready for review Pull request body is complete and the author marked it ready for human review label Jun 19, 2026
…ape ONNX

Three self-contained additions to the cache-only loader, all in kompress.rs:

- Cross-platform HF cache resolution: resolve the model via HF_HUB_CACHE /
  HF_HOME / HOME / USERPROFILE (was $HOME-only), so from_cache finds the
  model on native Windows, not just Linux.
- Load diagnostics: from_cache now logs WHY it defers (kompress_cache_miss /
  kompress_session_build_failed with the searched roots + the real session
  build error) instead of returning None silently — the #1 cause of an
  unexplained kompress_ready=false.
- Static-shape ONNX support: detect a fixed input_ids seq dimension on the
  loaded model and right-pad each chunk to it (masked padding => identical
  scores); dynamic models keep their natural length at zero padding cost.
  Enables execution providers that cannot compile dynamic shapes (OpenVINO
  NPU). Parity 21/21 in both static and dynamic modes.
@RubenAAA

Copy link
Copy Markdown
Author

Added one engine commit (Windows HF-cache resolution + load diagnostics + static-shape ONNX support) — all self-contained in kompress.rs, parity 21/21 in both static and dynamic modes. Heads-up: I rebased the stack, so from_cache/hf_cache_file (previously in the wiring PR #1155) now live here in the engine, which is their correct home. Description updated with the rationale.

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is not merge-ready in current GitHub state (mergeStateStatus=UNSTABLE). Please update from current main, resolve any conflicts if present, and rerun/clear required CI before this can be approved.

@github-actions github-actions Bot added status: ci failing Required or reported CI checks are failing and removed status: ready for review Pull request body is complete and the author marked it ready for human review labels Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ci failing Required or reported CI checks are failing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants