Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
f6d706d
patch transform: stop folding inputs; add select_inputs
kali May 28, 2026
b2dc0a0
tract-cli, tract-libcli: make cudarc + tract-cuda truly optional
kali May 28, 2026
5c11d4a
rename: concretize_symbols / substitute_symbols → set_symbols
kali May 28, 2026
3f0176a
cli: `run --set` accepts TDim expressions
kali May 28, 2026
ee52857
core/ops/scan: reuse body state across iterations
czoli1976 May 28, 2026
43a9fee
onnx: com.microsoft GroupQueryAttention (prefill), ORT-validated
czoli1976 May 26, 2026
a5eeab5
build: track Cargo.lock
kali May 28, 2026
5bf8e45
ci/binaries: attach CycloneDX + SPDX SBOMs to release binaries
kali May 28, 2026
8e3b24e
ci/binaries: sign release .tgz with SBOM attestations
kali May 28, 2026
cfd91a9
ci/binaries: embed SBOM in binary + attest build provenance
kali May 28, 2026
d902aa4
ci/binaries: SHA-pin sbom + attestation actions
kali May 28, 2026
86ba417
ci/wheels: cargo-auditable in wheel build + PEP 770 SBOMs
kali May 28, 2026
ff8abf2
changelog update
kali May 28, 2026
cde69ad
drop the unmaintained atty crate; use std::io::IsTerminal
kali May 28, 2026
b577fd6
core/runtime: add virtual `gpu` / `gpu-or-cpu` names
kali May 28, 2026
33d4840
core/runtime: rename `default` to `cpu`, alias `default` in runtime_f…
kali May 28, 2026
5ca8dab
fmt
kali May 29, 2026
bf123a5
drop the libtensorflow binding + `conform` feature
kali May 28, 2026
2614eea
setup deny for cli
kali May 29, 2026
19394ea
onnx: fix LayerNorm output dtype mismatch with F16 inputs
May 24, 2026
1d6d1aa
fmt
kali May 28, 2026
b2239b0
linalg/mmm: cache-adaptive 2D-blocking for the single-thread tile walk
czoli1976 May 23, 2026
091427f
linalg/mmm: rustfmt the Linux L2 cache-detection branch
czoli1976 May 24, 2026
9dcf880
release 0.23.0-dev.6
mathieupoumeyrolsonos Jun 1, 2026
3ed1478
post-release v0.23.0-pre
mathieupoumeyrolsonos Jun 1, 2026
a5e5fb1
release scripts: reject 'v'-prefixed versions
kali Jun 1, 2026
4ffe58b
post-release 0.23.0-pre
kali Jun 1, 2026
57f236a
ci/pydoc: push gh-pages via explicit token URL
kali Jun 1, 2026
9398b06
changelog
kali Jun 1, 2026
bf90b4b
release 0.23.0
mathieupoumeyrolsonos Jun 1, 2026
688b476
post-release 0.23.1-pre
mathieupoumeyrolsonos Jun 1, 2026
176ada9
arm64 SDOT int8 matmul kernel (FEAT_DotProd)
czoli1976 May 26, 2026
9fbbf31
linalg/x86_64: add AVX-512 VNNI int8 GEMM kernel (avx512vnni_mmm_i32_…
czoli1976 May 28, 2026
6037d80
onnx: auto-enable external_state for caller-managed recurrent state
czoli1976 May 27, 2026
9d41153
scan: detect caller-managed state by graph reachability, not an impor…
czoli1976 May 27, 2026
47d0428
linalg/x86_64: add AVX-512 element-wise activations
czoli1976 May 28, 2026
a468525
Merge branch 'main' into feat/int8-sdot-kernel
kali Jun 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .all_crates.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@

ALL_CRATES_PATH="data linalg core nnef nnef/nnef-resources pulse-opl pulse extra transformers hir tflite tensorflow onnx-opl onnx gpu metal cuda libcli api api/rs api/ffi api/proxy/sys api/proxy cli"
ALL_CRATES_PATH="data linalg core nnef nnef/nnef-resources transformers pulse-opl pulse extra hir tflite tensorflow onnx-opl onnx gpu metal cuda libcli api api/rs api/ffi api/proxy/sys api/proxy cli"
85 changes: 85 additions & 0 deletions .github/scripts/inject_wheel_sboms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
"""Inject CycloneDX + SPDX SBOMs into a wheel's `.dist-info/sboms/` per PEP 770.

Reads the wheel, generates SBOMs from its unpacked contents via `syft`
(which understands the `cargo-auditable` section embedded in the
bundled Rust dylib), drops them into the wheel's metadata directory,
and re-packs the wheel. `wheel pack` regenerates RECORD so the new
files are properly listed and hashed.

Usage: python inject_wheel_sboms.py wheel-1.whl wheel-2.whl ...
(in-place; replaces each input wheel with the SBOM-bearing one)

Requires: `syft` on PATH, and the `wheel` Python package installed.
"""

import shutil
import subprocess
import sys
import tempfile
from pathlib import Path


def inject(wheel_path: Path) -> None:
wheel_path = wheel_path.resolve()
with tempfile.TemporaryDirectory() as tmp:
tmp = Path(tmp)
unpack_root = tmp / "unpacked"
repack_root = tmp / "repacked"
unpack_root.mkdir()
repack_root.mkdir()

subprocess.check_call(
[sys.executable, "-m", "wheel", "unpack", "-d", str(unpack_root), str(wheel_path)]
)

# `wheel unpack` writes one top-level dir named `<name>-<version>`
(unpacked,) = list(unpack_root.iterdir())
(dist_info,) = list(unpacked.glob("*.dist-info"))

sboms_dir = dist_info / "sboms"
sboms_dir.mkdir(exist_ok=True)

# syft scans the unpacked tree. Its rust-audit-binary cataloger
# reads the `.dep-v0` ELF/Mach-O section that `cargo-auditable`
# embedded; the Python cataloger picks up METADATA.
subprocess.check_call(
[
"syft",
"scan",
f"dir:{unpacked}",
"--source-name",
unpacked.name,
"-o",
f"cyclonedx-json={sboms_dir / 'sbom.cdx.json'}",
"-o",
f"spdx-json={sboms_dir / 'sbom.spdx.json'}",
]
)

# `wheel pack` rewrites RECORD with hashes for every file under
# `unpacked/`, including the two SBOMs we just added.
subprocess.check_call(
[sys.executable, "-m", "wheel", "pack", "-d", str(repack_root), str(unpacked)]
)

(repacked_wheel,) = list(repack_root.glob("*.whl"))
# Names should match; if `wheel pack` produced a different
# filename (e.g. build-tag difference), prefer the new name.
target = wheel_path.parent / repacked_wheel.name
if target != wheel_path:
wheel_path.unlink()
shutil.move(str(repacked_wheel), str(target))
print(f"injected SBOMs into {target.name}")


def main(argv: list[str]) -> int:
if not argv:
print(__doc__, file=sys.stderr)
return 2
for w in argv:
inject(Path(w))
return 0


if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
53 changes: 51 additions & 2 deletions .github/workflows/binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ jobs:
name: Upload Release Binaries
permissions:
contents: write
# OIDC + attestations: cryptographically sign each released .tgz
# against its SBOMs via GitHub's attestation store. Anyone can
# then verify with `gh attestation verify <tgz> --owner sonos`.
id-token: write
attestations: write
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -98,15 +103,59 @@ jobs:
export CARGO_TARGET_${RUST_TRIPLE_ENV}_LINKER=$TARGET_CC
fi

cargo build --target ${target} --release -p tract-cli
# cargo-auditable wraps `cargo build` to embed the resolved
# dependency graph into the binary so consumers can recover
# the SBOM directly from the binary via `cargo audit bin`.
cargo install --locked cargo-auditable --version 0.7.0
cargo auditable build --target ${target} --release -p tract-cli --locked
mkdir tract-$name
cp target/${target}/release/tract tract-${name}
tar czf tract-${name}.tgz tract-${name}

- name: Generate CycloneDX SBOM
uses: anchore/sbom-action@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
with:
path: .
format: cyclonedx-json
artifact-name: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.cdx.json
output-file: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.cdx.json
upload-artifact: false
upload-release-assets: false

- name: Generate SPDX SBOM
uses: anchore/sbom-action@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
with:
path: .
format: spdx-json
artifact-name: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.spdx.json
output-file: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.spdx.json
upload-artifact: false
upload-release-assets: false

- name: Attest build provenance
uses: actions/attest-build-provenance@a2bbfa25375fe432b6a289bc6b6cd05ecd0c4c32 # v4.1.0
with:
subject-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.tgz

- name: Attest release tarball with CycloneDX SBOM
uses: actions/attest-sbom@c604332985a26aa8cf1bdc465b92731239ec6b9e # v4.1.0
with:
subject-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.tgz
sbom-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.cdx.json

- name: Attest release tarball with SPDX SBOM
uses: actions/attest-sbom@c604332985a26aa8cf1bdc465b92731239ec6b9e # v4.1.0
with:
subject-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.tgz
sbom-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.spdx.json

- name: Upload asset
uses: softprops/action-gh-release@b4309332981a82ec1c5618f44dd2e27cc8bfbfda # v3
with:
files: tract-${{matrix.target}}-${{ steps.version.outputs.value }}.tgz
files: |
tract-${{matrix.target}}-${{ steps.version.outputs.value }}.tgz
tract-${{matrix.target}}-${{ steps.version.outputs.value }}.cdx.json
tract-${{matrix.target}}-${{ steps.version.outputs.value }}.spdx.json
name: ${{ steps.version.outputs.value }}
tag_name: v${{ steps.version.outputs.value }}
env:
Expand Down
7 changes: 5 additions & 2 deletions .github/workflows/pydoc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,13 +85,16 @@ jobs:
json.dump(versions, f, indent=2)
"

# commit and push
# commit and push. actions/checkout ran with persist-credentials:
# false, so no auth is wired into .git/config — push via an
# explicit token URL instead.
git add -A
git commit -m "Update Python docs ($version)" || true
git push origin gh-pages
git push "https://x-access-token:${GH_TOKEN}@github.com/${GITHUB_REPOSITORY}.git" gh-pages

# clean up worktree
cd -
git worktree remove "$workdir"
env:
STEPS_VERSION_OUTPUTS_VALUE: ${{ steps.version.outputs.value }}
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
12 changes: 12 additions & 0 deletions .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,18 @@ jobs:
timeout_seconds: 54000 # 15 hours :/
command: uvx cibuildwheel --output-dir wheelhouse api/py

- name: Install syft
uses: anchore/sbom-action/download-syft@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0

- name: Inject CycloneDX + SPDX SBOMs into wheels (PEP 770)
shell: bash
run: |
set -ex
uv pip install --system wheel
for w in wheelhouse/*.whl ; do
python .github/scripts/inject_wheel_sboms.py "$w"
done

- uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: wheels-${{github.run_id}}-${{matrix.os}}
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ target
**/*.rs.bk
*.rustfmt
*.back
Cargo.lock
examples/data
.idea
.cached/**
Expand Down
5 changes: 4 additions & 1 deletion .travis/cargo-deny-check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,7 @@ else
CARGO_DENY="cargo deny"
fi

(cd api/rs ; $CARGO_DENY check)
set -e

(cd api/rs ; $CARGO_DENY check -c deny.toml)
(cd cli ; $CARGO_DENY check -c deny.toml)
11 changes: 0 additions & 11 deletions .travis/tf.sh

This file was deleted.

48 changes: 44 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,48 @@
# 0.23.0 — soon

# 0.22.x → 0.23 in a nutshell

- **`tract` facade is now the recommended public API.** Renamed from `tract-rs`; sole crate under semver, one curated surface (`Model`/`Runnable`/`State`/`Tensor`/`TDim` + `nnef()`/`onnx()`/`runtime_for_name`). `ndarray` removed from public types in favour of opt-in `impl_ndarray_interop!()`. Most other API renames in this release fall out of this consolidation (`Value`→`Tensor`, `concretize_symbols`→`set_symbols`, `default` runtime → `cpu`, …).
- **GPU is first-class.** `cuda` + `metal` `Runtime` impls with f16 conv + cuDNN, CUDA 13 (CUDA 12 dropped), automatic per-node CPU fallback when GPU rejects a shape; virtual `gpu` / `gpu-or-cpu` names for portable downstream code.
- **Supply-chain hardening.** ` Cargo.lock` tracked; CDX + SPDX SBOMs on release binaries plus PEP 770 SBOMs in wheels (`cargo-auditable` + GitHub attestations);

## Migrating from 0.22.x to 0.23

For normal usage we recommend adopting the **`tract` facade crate** (the public API at `api/rs`) instead of wiring `tract-core`, `tract-nnef`, `tract-onnx`, `tract-pulse`, `tract-cuda`, `tract-metal`, etc. directly. The facade exposes one stable surface — `nnef()`, `onnx()`, `runtime_for_name("cpu" | "gpu" | "gpu-or-cpu" | "cuda" | "metal" | ...)`, plus `Model`, `Runnable`, `State`, `Tensor`, `TDim`, and a `SetSymbols` transform builder — with all the backends curated behind it. `impl_ndarray_interop!()` (0.23.0-dev.5) keeps `ndarray` interop opt-in without leaking an `ndarray` version into the public API. Downstream code that pinned `tract-core` + `tract-onnx` directly can usually drop those deps in favour of `tract = "0.23"` and `use tract::prelude::*;`. Examples are now organised around this facade — see `examples/onnx-mobilenet-v2`, `examples/nnef-mobilenet-v2`, and `examples/causal_llm`.

# 0.23.0 - 2026-05-1

This section lists changes since 0.23.0-dev.5 only; the dev.2…dev.5 sections below cover the rest of the 0.22.x→0.23.0 delta.

### API — breaking

- **`concretize_symbols` / `substitute_symbols` renamed to `set_symbols`.** Affects `TypedModel::set_symbols`, the `SetSymbols` transform (was `ConcretizeSymbols`), and the `--transform set_symbols=...` CLI form. No deprecation aliases — call sites must be updated.
- **`default` runtime renamed to `cpu`.** `runtime_for_name("default")` still resolves to the CPU runtime (ad-hoc alias), but `Runtime::name()` returns `"cpu"`. JSON loading configs and `--loading-config-path` payloads that pin `"default"` keep working.
- **`nnef().with_tract_core()` removed.** The `tract_core` extension is opt-out since 0.23.0-pre — call `disable_tract_core()` instead, or just drop the `with_tract_core()?` line.


### CPU / linalg

- **ARM SME backend (ARMv9.2-A).** New `linalg/arm64/sme` module provides SME GEMM (Phase 1) and SME2 GEMV (`sme_mmv_f32_64x1`, Phase 2A) micro-kernels for Apple M4+, Cortex-X4, and other ARMv9.2-A+ chips. Dispatch is gated on a 512-bit streaming vector length at runtime; SME2 assembler detection skips kernels when the assembler lacks support. Force via `TRACT_CPU_AARCH64_KIND=applem` (or `generic` to disable).
- **ARMv9 SVE**: f32/f16 GEMM+GEMV and int8→i32 GEMM+GEMV kernels
- **Apple AMX: shape-aware dispatch.** AMX kernel selection is now M/N/K-aware at runtime, yielding 5–43% wins across canary models.
- **NEON element-wise kernels.** `HardSwish`, `SiLU`, and `GELU` get dedicated aarch64 NEON kernels wired as single graph ops.
- **x86_64.** M-aware kernel picker; AVX-512 GEMM routed to the 16×8 / 32×5 / 32×6 kernels.
- **WASM SIMD.** Relaxed-SIMD FMA in all MMM kernels; 32×1 GEMV kernel (8 v128 accumulators, 8-way ILP); vectorised sigmoid and tanh; `rustfft wasm_simd` enabled. Low-accumulator MMM paths recover 8–23% under `+relaxed-simd`; M-band GEMV dispatch woken up (30–37% on small-M). `Executor::RayonGlobal` for `wasm-bindgen-rayon`.
- **WASM**: relaxed-dot int8 fast path + PackedI8K4 + SIMD int8 matmul.
- **Multithreaded GEMM.** TLS borrow and sync hoisted out of the per-tile inner loop; 2D chunked dispatch with a small-MMM threshold avoids rayon overhead on small operands.
- **im2col.** Contiguous-x fast path for valid (zero-padding, unit stride) convolutions; grouped lazy im2col extended; depthwise convolutions excluded from lazy im2col path; N=1/2/3 zone dispatch for depthwise.
- **General.** Same-shape fast path in `BinMiniOp::generic_eval`; `rbytes=96/128` fast paths for mn-major packing.
- **EinSum** Fold contiguous same-role axes in standard codegen.
- **BLAS / SGemm integration dropped.**
- **Cache-adaptive 2D-blocking** for the single-thread MMM tile walk; per-OS L2-size detection on Linux.

### ONNX

- `Resize`: `pytorch_half_pixel` coordinate transformer.
- `Reshape` with 0-dims and rank change fixed (issue #2104).
- Support for GroupQueryAttention, MultiHeadAttention, MatMulNBits (4-bit), SkipLayerNormalization, SimplifiedLayerNormalization, BiasGelu / FastGelu /
QuickGelu, LpNormalization, MeanVarianceNormalization, GroupNormalization, RotaryEmbedding, opset-24 Attention, Swish, Mish, Gelu, RMSNormalization.
- LayerNorm: fixed output dtype mismatch with F16 inputs.

### NNEF

Expand All @@ -26,16 +52,20 @@
### Pulse for chunked attention layers (experimental)

- **`pulse::Blockify` rewrite pass.** Translates block-diagonal multi-time-axis subgraphs into chunk-parallel form: recognises quadratic sections (EinSum terminators, `DiagGather` initiators, banded masks, Softmax body chains) and rewrites each into a per-chunk section. Covered by ex01–ex10 synthetic harness cases.
- **`DiagGather` op** (moved from `transformers` into `core`): causal skew-trick gather with ROI-driven narrowing and re-anchoring.
- **`DiagGather` op**: causal skew-trick gather with ROI-driven narrowing and re-anchoring.
- **`WindowOnAxis` op**: windowed gather over the streaming axis with configurable pad value.
- **`AxisOp::Reshape` pulsifier**: auto-inserts alignment `Delay` on streaming-axis size change.
- Stream-axis LCM merge + slope-based per-pulse sizing for `Range`.
- **Scan body state reused across iterations** instead of reallocated per step.
- **`scaled_masked_softmax`** gains a `bool`-mask variant and a `post_softmax_mask` variant on both cuda and metal.
- **`GpuPulsePad`**: stride-aware initial copy fixes 26% drift on pulsified encoders where a fused move axis fed a non-contiguous view to the pad.

### Runtime / plan

- Per-node shape resolve skipped once all symbols are bound.
- `TDim::Sym` fast-path in shape resolve; lock-free `guess_scenario` on empty scope.
- `PropagateRoi` iterates to fixed point and simplifies.
- **Virtual runtime names `gpu` and `gpu-or-cpu`.** `runtime_for_name("gpu")` returns the first available GPU backend (cuda or metal) or errors; `"gpu-or-cpu"` falls back to CPU if none is present.

### Scan

Expand All @@ -53,7 +83,17 @@
- `doc/symbolic-shapes.md`: TDim, Symbol, and how to bind them.
- `doc/op.md`: working with a `Tensor`'s data.
- `doc/cli-recipe.md`: `--audit-json`, `--save-outputs`, timing pitfalls, environment-variable table.
- `README.md`: refreshed — current backends, modern examples table, Python bindings section, torch-to-nnef pointer.
- `README.md`: refreshed — current runtimes, modern examples table, Python bindings section, torch-to-nnef pointer.

### Supply chain / build / CI

- **`Cargo.lock` tracked.** All workspace + binary builds are now reproducible against the same dependency snapshot.
- **SBOMs on release binaries (CycloneDX + SPDX).** `tract-cli` release binaries are built with `cargo-auditable` (Rust dep tree embedded in the `.dep-v0` section) and shipped alongside CDX + SPDX SBOMs generated by `syft`. Both SBOMs are signed via GitHub attestations (`actions/attest-sbom` + `actions/attest-build-provenance`).
- **PEP 770 SBOMs in Python wheels.** Wheels are built with `cargo-auditable` and have `sbom.cdx.json` + `sbom.spdx.json` injected into `.dist-info/sboms/` per PEP 770.
- **Release builds pinned to current stable rustc** via `dtolnay/rust-toolchain@stable`.
- **`cargo-deny` lints wired up for `tract-cli`.**
- **zizmor SARIF upload** to GitHub's security tab.


# 0.23.0-dev.5 - 2026-04-22

Expand Down Expand Up @@ -146,7 +186,7 @@
- **`into_tract()` renamed to `into_model()`** in all API layers.
- **`DatumType` variant names shortened** — the `TRACT_DATUM_TYPE_` prefix is dropped (C API).
- **Deprecated state methods removed**: `init_states()`, `state_initializers`, and the `n_states` parameter are gone from `State` trait and `RunTensors`.
- **Python**: `concretize_symbols` and `pulse` methods replaced by typed transform classes; `TransformSpec` is now an abstract base class.
- **Python**: `set_symbols` and `pulse` methods replaced by typed transform classes; `TransformSpec` is now an abstract base class.

### Improvements

Expand Down
Loading
Loading