Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions .github/scripts/inject_wheel_sboms.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
"""Inject CycloneDX + SPDX SBOMs into a wheel's `.dist-info/sboms/` per PEP 770.

Reads the wheel, generates SBOMs from its unpacked contents via `syft`
(which understands the `cargo-auditable` section embedded in the
bundled Rust dylib), drops them into the wheel's metadata directory,
and re-packs the wheel. `wheel pack` regenerates RECORD so the new
files are properly listed and hashed.

Usage: python inject_wheel_sboms.py wheel-1.whl wheel-2.whl ...
(in-place; replaces each input wheel with the SBOM-bearing one)

Requires: `syft` on PATH, and the `wheel` Python package installed.
"""

import shutil
import subprocess
import sys
import tempfile
from pathlib import Path


def inject(wheel_path: Path) -> None:
wheel_path = wheel_path.resolve()
with tempfile.TemporaryDirectory() as tmp:
tmp = Path(tmp)
unpack_root = tmp / "unpacked"
repack_root = tmp / "repacked"
unpack_root.mkdir()
repack_root.mkdir()

subprocess.check_call(
[sys.executable, "-m", "wheel", "unpack", "-d", str(unpack_root), str(wheel_path)]
)

# `wheel unpack` writes one top-level dir named `<name>-<version>`
(unpacked,) = list(unpack_root.iterdir())
(dist_info,) = list(unpacked.glob("*.dist-info"))

sboms_dir = dist_info / "sboms"
sboms_dir.mkdir(exist_ok=True)

# syft scans the unpacked tree. Its rust-audit-binary cataloger
# reads the `.dep-v0` ELF/Mach-O section that `cargo-auditable`
# embedded; the Python cataloger picks up METADATA.
subprocess.check_call(
[
"syft",
"scan",
f"dir:{unpacked}",
"--source-name",
unpacked.name,
"-o",
f"cyclonedx-json={sboms_dir / 'sbom.cdx.json'}",
"-o",
f"spdx-json={sboms_dir / 'sbom.spdx.json'}",
]
)

# `wheel pack` rewrites RECORD with hashes for every file under
# `unpacked/`, including the two SBOMs we just added.
subprocess.check_call(
[sys.executable, "-m", "wheel", "pack", "-d", str(repack_root), str(unpacked)]
)

(repacked_wheel,) = list(repack_root.glob("*.whl"))
# Names should match; if `wheel pack` produced a different
# filename (e.g. build-tag difference), prefer the new name.
target = wheel_path.parent / repacked_wheel.name
if target != wheel_path:
wheel_path.unlink()
shutil.move(str(repacked_wheel), str(target))
print(f"injected SBOMs into {target.name}")


def main(argv: list[str]) -> int:
if not argv:
print(__doc__, file=sys.stderr)
return 2
for w in argv:
inject(Path(w))
return 0


if __name__ == "__main__":
sys.exit(main(sys.argv[1:]))
53 changes: 51 additions & 2 deletions .github/workflows/binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ jobs:
name: Upload Release Binaries
permissions:
contents: write
# OIDC + attestations: cryptographically sign each released .tgz
# against its SBOMs via GitHub's attestation store. Anyone can
# then verify with `gh attestation verify <tgz> --owner sonos`.
id-token: write
attestations: write
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -98,15 +103,59 @@ jobs:
export CARGO_TARGET_${RUST_TRIPLE_ENV}_LINKER=$TARGET_CC
fi

cargo build --target ${target} --release -p tract-cli
# cargo-auditable wraps `cargo build` to embed the resolved
# dependency graph into the binary so consumers can recover
# the SBOM directly from the binary via `cargo audit bin`.
cargo install --locked cargo-auditable --version 0.7.0
cargo auditable build --target ${target} --release -p tract-cli --locked
mkdir tract-$name
cp target/${target}/release/tract tract-${name}
tar czf tract-${name}.tgz tract-${name}

- name: Generate CycloneDX SBOM
uses: anchore/sbom-action@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
with:
path: .
format: cyclonedx-json
artifact-name: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.cdx.json
output-file: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.cdx.json
upload-artifact: false
upload-release-assets: false

- name: Generate SPDX SBOM
uses: anchore/sbom-action@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0
with:
path: .
format: spdx-json
artifact-name: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.spdx.json
output-file: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.spdx.json
upload-artifact: false
upload-release-assets: false

- name: Attest build provenance
uses: actions/attest-build-provenance@a2bbfa25375fe432b6a289bc6b6cd05ecd0c4c32 # v4.1.0
with:
subject-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.tgz

- name: Attest release tarball with CycloneDX SBOM
uses: actions/attest-sbom@c604332985a26aa8cf1bdc465b92731239ec6b9e # v4.1.0
with:
subject-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.tgz
sbom-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.cdx.json

- name: Attest release tarball with SPDX SBOM
uses: actions/attest-sbom@c604332985a26aa8cf1bdc465b92731239ec6b9e # v4.1.0
with:
subject-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.tgz
sbom-path: tract-${{ matrix.target }}-${{ steps.version.outputs.value }}.spdx.json

- name: Upload asset
uses: softprops/action-gh-release@b4309332981a82ec1c5618f44dd2e27cc8bfbfda # v3
with:
files: tract-${{matrix.target}}-${{ steps.version.outputs.value }}.tgz
files: |
tract-${{matrix.target}}-${{ steps.version.outputs.value }}.tgz
tract-${{matrix.target}}-${{ steps.version.outputs.value }}.cdx.json
tract-${{matrix.target}}-${{ steps.version.outputs.value }}.spdx.json
name: ${{ steps.version.outputs.value }}
tag_name: v${{ steps.version.outputs.value }}
env:
Expand Down
12 changes: 12 additions & 0 deletions .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,18 @@ jobs:
timeout_seconds: 54000 # 15 hours :/
command: uvx cibuildwheel --output-dir wheelhouse api/py

- name: Install syft
uses: anchore/sbom-action/download-syft@e22c389904149dbc22b58101806040fa8d37a610 # v0.24.0

- name: Inject CycloneDX + SPDX SBOMs into wheels (PEP 770)
shell: bash
run: |
set -ex
uv pip install --system wheel
for w in wheelhouse/*.whl ; do
python .github/scripts/inject_wheel_sboms.py "$w"
done

- uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: wheels-${{github.run_id}}-${{matrix.os}}
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ target
**/*.rs.bk
*.rustfmt
*.back
Cargo.lock
examples/data
.idea
.cached/**
Expand Down
10 changes: 7 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,12 @@
### CPU / linalg

- **ARM SME backend (ARMv9.2-A).** New `linalg/arm64/sme` module provides SME GEMM (Phase 1) and SME2 GEMV (`sme_mmv_f32_64x1`, Phase 2A) micro-kernels for Apple M4+, Cortex-X4, and other ARMv9.2-A+ chips. Dispatch is gated on a 512-bit streaming vector length at runtime; SME2 assembler detection skips kernels when the assembler lacks support. Force via `TRACT_CPU_AARCH64_KIND=applem` (or `generic` to disable).
- **ARMv9 SVE**: f32/f16 GEMM+GEMV and int8→i32 GEMM+GEMV kernels
- **Apple AMX: shape-aware dispatch.** AMX kernel selection is now M/N/K-aware at runtime, yielding 5–43% wins across canary models.
- **NEON element-wise kernels.** `HardSwish`, `SiLU`, and `GELU` get dedicated aarch64 NEON kernels wired as single graph ops.
- **x86_64.** M-aware kernel picker; AVX-512 GEMM routed to the 16×8 / 32×5 / 32×6 kernels.
- **WASM SIMD.** Relaxed-SIMD FMA in all MMM kernels; 32×1 GEMV kernel (8 v128 accumulators, 8-way ILP); vectorised sigmoid and tanh; `rustfft wasm_simd` enabled. Low-accumulator MMM paths recover 8–23% under `+relaxed-simd`; M-band GEMV dispatch woken up (30–37% on small-M). `Executor::RayonGlobal` for `wasm-bindgen-rayon`.
- **WASM**: relaxed-dot int8 fast path + PackedI8K4 + SIMD int8 matmul.
- **Multithreaded GEMM.** TLS borrow and sync hoisted out of the per-tile inner loop; 2D chunked dispatch with a small-MMM threshold avoids rayon overhead on small operands.
- **im2col.** Contiguous-x fast path for valid (zero-padding, unit stride) convolutions; grouped lazy im2col extended; depthwise convolutions excluded from lazy im2col path; N=1/2/3 zone dispatch for depthwise.
- **General.** Same-shape fast path in `BinMiniOp::generic_eval`; `rbytes=96/128` fast paths for mn-major packing.
Expand All @@ -17,6 +19,8 @@

- `Resize`: `pytorch_half_pixel` coordinate transformer.
- `Reshape` with 0-dims and rank change fixed (issue #2104).
- Support for GroupQueryAttention, MultiHeadAttention, MatMulNBits (4-bit), SkipLayerNormalization, SimplifiedLayerNormalization, BiasGelu / FastGelu /
QuickGelu, LpNormalization, MeanVarianceNormalization, GroupNormalization, RotaryEmbedding, opset-24 Attention, Swish, Mish, Gelu, RMSNormalization.

### NNEF

Expand All @@ -26,7 +30,7 @@
### Pulse for chunked attention layers (experimental)

- **`pulse::Blockify` rewrite pass.** Translates block-diagonal multi-time-axis subgraphs into chunk-parallel form: recognises quadratic sections (EinSum terminators, `DiagGather` initiators, banded masks, Softmax body chains) and rewrites each into a per-chunk section. Covered by ex01–ex10 synthetic harness cases.
- **`DiagGather` op** (moved from `transformers` into `core`): causal skew-trick gather with ROI-driven narrowing and re-anchoring.
- **`DiagGather` op**: causal skew-trick gather with ROI-driven narrowing and re-anchoring.
- **`WindowOnAxis` op**: windowed gather over the streaming axis with configurable pad value.
- **`AxisOp::Reshape` pulsifier**: auto-inserts alignment `Delay` on streaming-axis size change.
- Stream-axis LCM merge + slope-based per-pulse sizing for `Range`.
Expand All @@ -53,7 +57,7 @@
- `doc/symbolic-shapes.md`: TDim, Symbol, and how to bind them.
- `doc/op.md`: working with a `Tensor`'s data.
- `doc/cli-recipe.md`: `--audit-json`, `--save-outputs`, timing pitfalls, environment-variable table.
- `README.md`: refreshed — current backends, modern examples table, Python bindings section, torch-to-nnef pointer.
- `README.md`: refreshed — current runtimes, modern examples table, Python bindings section, torch-to-nnef pointer.

# 0.23.0-dev.5 - 2026-04-22

Expand Down Expand Up @@ -146,7 +150,7 @@
- **`into_tract()` renamed to `into_model()`** in all API layers.
- **`DatumType` variant names shortened** — the `TRACT_DATUM_TYPE_` prefix is dropped (C API).
- **Deprecated state methods removed**: `init_states()`, `state_initializers`, and the `n_states` parameter are gone from `State` trait and `RunTensors`.
- **Python**: `concretize_symbols` and `pulse` methods replaced by typed transform classes; `TransformSpec` is now an abstract base class.
- **Python**: `set_symbols` and `pulse` methods replaced by typed transform classes; `TransformSpec` is now an abstract base class.

### Improvements

Expand Down
Loading
Loading