Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
288 commits
Select commit Hold shift + click to select a range
42733a0
broadcast: fix MultiBroadcastTo axes_mapping for rank-changing ops
kali May 1, 2026
41b7b02
cli-tests: add trunet pulse=1 stream-vs-batch numerical check
kali May 1, 2026
7b2b8c2
transformers: ScaledMaskedSoftmax declares its axes mapping
kali May 4, 2026
7a0fe69
linalg/multithread: add Executor::RayonGlobal for wasm-bindgen-rayon
czoli1976 May 2, 2026
fb97a8f
linalg/mmm: 2D chunked dispatch + small-MMM threshold for multithread-mm
czoli1976 May 3, 2026
edc4801
linalg/multithread: make THREADING_PANEL_THRESHOLD runtime-configurable
czoli1976 May 3, 2026
c6bd1d5
core/einsum: lower K=1 ConvTranspose contractions as broadcast Mul
czoli1976 May 1, 2026
fd81283
ci: cross-platform.yml accepts pr_number input for fork PR benching
kali May 4, 2026
daed149
harness: pulse-multi-axis synthetic test cases
kali Apr 29, 2026
103a442
pulse: Blockify graph rewrite for block-diagonal multi-T-axis subgraphs
kali Apr 29, 2026
d53b783
pulse: replace string-based block-diag recogniser with AST destructure
kali Apr 29, 2026
95a1402
pulse: blockify honours streaming-axis positions on einsum I/O
kali Apr 29, 2026
c7909cf
harness: ex02-block-diag-bilinear synthetic (SDPA without softmax)
kali Apr 29, 2026
04340ad
harness: probe test for ex02 current failure mode
kali Apr 29, 2026
c0ebd9d
pulse: layered Blockify recogniser — QuadraticSection + per-op bridge
kali Apr 29, 2026
412db9b
pulse: split QuadraticSection by connected component
kali Apr 29, 2026
eca12bc
pulse: rewrite the rewrite — terminator dispatch by op-type
kali Apr 29, 2026
a16cb44
pulse: blockify handles EinSum terminator (ex02 goes green)
kali Apr 29, 2026
ed02ddc
pulse: drop Pattern, inline op-identification into the rewriter
kali Apr 29, 2026
b63ccad
pulse: rewrite as topological per-role dispatch — no more "find THE n…
kali Apr 29, 2026
7d92367
pulse: blockify uses substitute_symbols + one TypedModelPatch per sec…
kali Apr 29, 2026
f9dfcbb
pulse: blockify is a ModelTransform, ancillary outputs in properties
kali Apr 29, 2026
0e81721
pulse: split build_section_patch into per-role + per-op-type sub-func…
kali Apr 30, 2026
74a5333
harness: simplify mask construction with shape_of(a)[0]
kali Apr 30, 2026
d3b4df2
harness: drop the f32-roundtrip on chunk-id, integer div is fine
kali Apr 30, 2026
f9aa079
pulse: WindowOnAxis op for windowed gathers over the streaming axis
kali Apr 30, 2026
02fed5e
pulse: blockify recognises banded masks and wires WindowOnAxis
kali Apr 30, 2026
e0b06b8
pulse: rescale stream.delay through PulsedReshape on streaming-axis s…
kali Apr 30, 2026
cb4c258
harness: ex03 + ex04 banded synthetics; pulse-multi-axis tests via ru…
kali Apr 30, 2026
ea64e12
pulse: drop collect_dead_nodes, TypedModelPatch::apply already does it
kali Apr 30, 2026
499a68d
pulse: blockify dispatches windowing by terminator's contracted axis
kali Apr 30, 2026
460bb41
harness: ex05 banded + EinSum terminator (ex02 + ex03 mask)
kali Apr 30, 2026
d2fa087
pulse: blockify fails loudly when a recognised section can't be rewri…
kali May 4, 2026
7e5a6f4
pulse: blockify body replay via axes_mapping + neutral-element substi…
kali May 4, 2026
f7f78a4
pulse: blockify body — mask is a constant block of within-chunk size
kali May 4, 2026
6fa8021
pulse: WindowOnAxis grows configurable pad_value, ex07 boundary regre…
kali May 4, 2026
e355e07
pulse: PulsedRange + faithful chunkification of mask-construction chain
kali May 4, 2026
527af7e
core: Reshape propagates uniform_tdim through pure split / pure merge
kali May 4, 2026
96f6de3
pulse: blockify contracted_axis lives in mask frame, not score frame
kali May 4, 2026
c826bdd
pulse: blockify MultiBroadcastTo initiator + Softmax body axes shift
kali May 4, 2026
ebe9e7f
pulse + cli: bind Blockify chunk symbol in compare --stream
kali May 4, 2026
d5a09f4
cli: compare --stream skips incompatible-shape intermediates as unche…
kali May 4, 2026
8460cfd
pulse: blockify cleanup — clippy nits, docstring refresh, REVISIT notes
kali May 4, 2026
72306a0
harness: ex09 two-chained-sdpa — multi-section recogniser regression
kali May 4, 2026
dc360eb
harness: ex10 conv-then-sdpa — pulse-delay carry-through into chunked…
kali May 4, 2026
317c3ad
tdim: expand_polynomial + use it in Reshape volume check
kali May 4, 2026
edc6fe4
cargo fmt
kali May 4, 2026
0fc6250
tdim: factor integer GCD across Add terms in simplifier
kali May 4, 2026
a70babf
build(deps): update string-interner requirement
dependabot[bot] May 4, 2026
4d3f8e2
.travis/make_bundle: restore main fallback inside GitHub Actions
kali May 5, 2026
e924ca2
cross-platform.yml: pass PR head ref through for internal PRs
kali May 5, 2026
b29780d
cross-platform.yml: tag workflow_dispatch runs with their source branch
kali May 5, 2026
d53c79f
linalg/wasm: wake up M-band GEMV dispatch (30-37% on small-M)
czoli1976 May 4, 2026
347f42b
tdim: PartialEq second-chance via diff-and-simplify
kali May 5, 2026
52577ca
tdim: silence derived_hash_with_manual_eq on TDim
kali May 6, 2026
b98bbc3
nnef: make tract_core extension opt-out instead of opt-in
kali May 6, 2026
b169591
cli: drop allow_hyphen_values and restore --nnef-tract-core no-op
kali May 6, 2026
f54647f
core/conv: extend LazyIm2col to support grouped convolutions
czoli1976 May 3, 2026
97c1436
core/conv: add TRACT_LAZY_IM2COL_MIN_KERNEL env var for lazy-im2col t…
czoli1976 May 3, 2026
6388828
core/conv: exclude depthwise convs from lazy_im2col path
czoli1976 May 4, 2026
64c458c
linalg/wasm: add WASM SIMD sigmoid + tanh kernels (relaxed-simd FMA)
czoli1976 May 6, 2026
8a1a4b5
cli: --set X=value accepts TDim expressions
kali May 6, 2026
722a7ee
fmt
mathieupoumeyrolsonos May 6, 2026
11b3106
tdim: simplify Div(Add([k·X, …, c]), k) → X when X≥0 and 0≤c<k
kali May 6, 2026
99fdb37
tdim: don't emit unsound (k·X+c)/k → X+c/k variant in wiggle
kali May 6, 2026
4280839
tdim: add (Y − q·(Y/q))/q = 0 simplification rule
kali May 7, 2026
9819f72
linalg: enable rustfft `wasm_simd` feature
czoli1976 May 7, 2026
bb44206
pulse: add Deconv bias once after overlap-add
JulienBalianSonos May 7, 2026
5a1db26
cross-platform.yml: diagnose prepare inputs + fall back to pull_reque…
kali May 11, 2026
3a8b4fc
cross-platform.yml: export GITHUB_HEAD_REF/SHA via shell instead of s…
kali May 11, 2026
28a632e
ci: use TRACT_BENCH_BRANCH_NAME instead of GITHUB_HEAD_REF for bundle…
kali May 11, 2026
2a3a4a6
core/cnn/depth_wise: dispatch N=1, 2, 3 zones to process_zone_n_generic
czoli1976 May 9, 2026
062365c
core/binary: same-shape fast path for `BinMiniOp::generic_eval`
czoli1976 May 8, 2026
8a9c904
linalg/pack: add rbytes=96 and 128 fast paths for mn_major packing
czoli1976 May 8, 2026
bca9da2
core/benches: add plan-loop dispatch overhead micro-bench
czoli1976 May 8, 2026
b985d86
linalg/mmm: hoist TLS borrow + sync out of per-tile loop
czoli1976 May 8, 2026
2ba1c61
core: PropagateRoi iterates to fixed point, simplifies, skips ROI=1
kali May 11, 2026
8f62299
pulse: warn when a model has unannotated superlinear streaming wires
kali May 11, 2026
f212b45
linalg/mmm: hoist TLS borrow + sync once per rayon worker
czoli1976 May 12, 2026
b93cf5d
linalg/wasm: relaxed-simd FMA in MMM kernels (madd_f32x4! macro)
czoli1976 May 7, 2026
66423d4
docs: add linalg/WASM_RELAXED_SIMD.md
czoli1976 May 7, 2026
f74951b
pulse: DiagGather op + detect pass for skew-trick pulsification
kali May 11, 2026
0e97eee
blockify: handle DiagGather as section initiator
kali May 5, 2026
4bd01d6
transformers: move DiagGather op + detect pass from pulse
kali May 11, 2026
ee779b9
harness/parakeet: allow DiagGather under --cuda/--metal
kali May 12, 2026
bdd1e98
pulse: Reshape pulsifier auto-inserts alignment Delay
kali May 6, 2026
5aa98a6
core: forward-propagate uniform_tdim to refresh stale facts
kali May 13, 2026
4a41b19
pulse: LCM-merge stream-axis dims + slope-based per-pulse for Range
JulienBalianSonos May 7, 2026
5def537
review: address kali's feedback
JulienBalianSonos May 11, 2026
3965176
fmt: collapse stream_axis_lcm filter_map onto one line
JulienBalianSonos May 12, 2026
efb3179
pulse: declutter after detect_diag_gather inside PulseTransform
kali May 6, 2026
eac4366
cli+pulse: bind streaming symbol on session for pulse run path
kali May 6, 2026
2824023
api/rs: fix missing streaming_input_len field in RunTensors initializer
kali May 13, 2026
623e52c
api/rs tests: update property key assertions for pulse.streaming_symbol
kali May 13, 2026
08bb3ad
pulse+blockify: pulse-driven substitute, blockify becomes a section r…
kali May 6, 2026
f902c2c
core/ops/nn/rms_norm: cast eps to F32 in eval
czoli1976 May 13, 2026
c57c277
quick ref to ref comparison on top of bundle entrypoint
kali May 18, 2026
fb8de37
linalg/wasm: tag wasm_f32_8x8 ManuallyOptimized so strategize picks i…
czoli1976 May 14, 2026
18976d4
blockify: streaming MultiBroadcastTo initiator
kali May 6, 2026
e8c21d8
blockify: chunked split tolerates an affine `+c` offset on the stream…
kali May 6, 2026
2cd6320
blockify: fused einsum→DiagGather initiator (rel-pos skew pattern)
kali May 6, 2026
5dbc991
blockify: fix off-by-one in MultiBroadcastTo Move axis tracker
kali May 7, 2026
6561431
blockify: translate body op axes through chunk-insertion
kali May 7, 2026
bf93ca3
linalg/arm64: extend f32 _a53 prefetch coverage + fix 16x4 duplicate …
czoli1976 May 18, 2026
f087256
harness: update hey_snips golden file for pulse.streaming_symbol prop…
kali May 18, 2026
456b4d0
api/py tests: update property key assertions for pulse.streaming_symbol
kali May 18, 2026
c775925
core/cnn: make eager-vs-lazy Im2col choice shape-aware
czoli1976 May 14, 2026
9ceee1b
core/array/gather: axes_mapping for pulse axis tracking
kali May 18, 2026
b7e36aa
PR interactions rules
kali May 18, 2026
22d6af6
some fixes
kali May 18, 2026
8c76a10
pulse: allow non-streaming aux inputs/outputs in into_typed
kali May 18, 2026
c1d1d98
fix: update expected NNEF output for pulse.streaming_symbol property
kali May 19, 2026
badda83
linalg/arm64: swap ld1{v.d}[0] -> ldr d in per_col cols=3/6 paths
czoli1976 May 19, 2026
318eb26
core/array/broadcast: swap-through AxisOp under fan-out (E1 prep)
kali May 19, 2026
bb255a8
core: declare absorbing_element on Mul, fix declutter_absorbing broad…
kali May 19, 2026
52b3b3a
blockify: generic chunked-TypedBinOp section initiator (E1)
kali May 19, 2026
111ccde
core: fix declutter_absorbing type mismatch for quantized outputs
kali May 19, 2026
e58d14b
blockify: derive chunked DiagGather offset from op.offset + window_start
kali May 19, 2026
649c269
blockify: pad affine tail at section terminator output
kali May 7, 2026
124780d
blockify: AffineChunkTrim — pulse-aware affine-tail trim op
kali May 7, 2026
a24e047
DiagGather::input_roi: substitute c → r + q − offset
kali May 20, 2026
d755732
ScaledMaskedSoftmax::input_roi: remap mask coord symbols across rank gap
kali May 20, 2026
803bdad
optim: include 0 in PushSliceUp boundaries when no slicer covers prefix
JulienBalianSonos May 19, 2026
167af91
fmt
kali May 20, 2026
3fa2af0
ROI: closed-form chunked-band projection in EinSum::input_roi
kali May 20, 2026
e492c63
ROI-triggered DiagGather narrow + re-anchor
kali May 20, 2026
0c771a4
ScaledMaskedSoftmax::declutter: pre-align mask rank to scores rank
kali May 20, 2026
1b4fbf7
linalg/arm64/apple_amx: shape-aware AMX dispatch (5-43% wins across c…
czoli1976 May 13, 2026
150b098
linalg/arm64/apple_amx: rustfmt fixup
czoli1976 May 13, 2026
a7064d9
linalg/wasm: keep low-accumulator MMM kernels on mul+add under +relax…
czoli1976 May 13, 2026
37e6654
Fix Reshape with 0-dims and rank change (issue #2104)
kali Apr 17, 2026
36d9d7a
onnx: promote Cast(to=i32) to Cast(to=TDim) like i64
kali Apr 17, 2026
764293a
hir: replace StridedSlice axis panic with a proper error
kali Apr 18, 2026
24303a0
fmt
kali May 20, 2026
6466bad
compare --stream: attach node/outlet context to symbol-eval failures
kali May 20, 2026
114795b
pulse: suppress superlinear-wire warnings when an input is annotated
kali May 20, 2026
4d9081c
optim: document FoldUniformMask's interaction with ScaledMaskedSoftmax
kali May 20, 2026
9d73dcd
CI: wire nemotron-speech-streaming-en-0.6b harness into large_models …
kali May 20, 2026
5dfdcdc
example: nemotron streaming ASR demo with live mic (cpal)
kali May 20, 2026
f070a6d
fix ci large models
kali May 20, 2026
d6a5eff
harness/nemotron: allow DiagGather + ScaledMaskedSoftmax on GPU runtimes
czoli1976 May 20, 2026
4bceadd
fix pulsification prereq on dry run
kali May 20, 2026
203304e
pulse meets large models in ci !
kali May 21, 2026
13f3df4
dim/parse: accept broadcast(a, b, …) function-call syntax
kali May 21, 2026
f1c5796
linalg: fix nr=1 kernel add_unicast bugs and add generic regression test
kali May 20, 2026
fc9722b
AGENTS.md: streaming, model inspection, public API pointer
kali May 20, 2026
d620140
doc/: consolidate, fix stale signatures, drop dead crate tour
kali May 20, 2026
b638f75
AGENTS.md: link doc/ for conceptual background
kali May 20, 2026
740e9e4
README: refresh — current backends, modern examples, drop op walls an…
kali May 20, 2026
b7abeeb
README: mention torch2nnef as PyTorch -> NNEF path
kali May 20, 2026
88b8003
README: fix torch-to-nnef URL
kali May 20, 2026
c350b5f
README: link torch-to-nnef docs first, github source second
kali May 20, 2026
2f13530
AGENTS.md: comments describe current code, not the diff
kali May 20, 2026
ee1d936
Update AGENTS.md
kali May 21, 2026
29e5af3
doc: pipeline.md + timing pitfalls in cli-recipe
kali May 21, 2026
76a04bc
doc/op.md: working with a Tensor's data
kali May 21, 2026
7c34c4d
doc/symbolic-shapes.md: TDim, Symbol, and how to bind them
kali May 21, 2026
1caf914
doc/pipeline.md: split Example rewrites into Declutter and Lowering
kali May 21, 2026
c5c0724
doc: TRACT_* environment variables table
kali May 21, 2026
cf1bbca
zizmor auto fix attempt
kali May 21, 2026
153fdaf
more zizmor fixes
kali May 21, 2026
fcdb4d8
CI: zizmor workflow audit on PRs touching .github
kali May 21, 2026
53643a3
CI: schedule zizmor weekly on monday 04:00 UTC
kali May 21, 2026
d8a8b5c
zizmor zizmoring itself
kali May 21, 2026
8e7a621
build(deps): update cpal requirement in the rust-dependencies group
dependabot[bot] May 21, 2026
c3c1e0d
build(deps): bump astral-sh/setup-uv in the actions group
dependabot[bot] May 21, 2026
6cc25cc
onnx: narrow i32->TDim cast promotion to TDim inputs
kali May 21, 2026
9242eb6
linalg/arm64/sme: SME backend for ARMv9.2-A+SME chips (M4+, Cortex-X4+)
czoli1976 May 15, 2026
684b542
linalg: add SME_PHASE1_BENCH.xlsx with measured bench data
czoli1976 May 15, 2026
60d1777
linalg/arm64/sme: SME2 GEMV kernel sme_mmv_f32_64x1 (Phase 2A)
czoli1976 May 16, 2026
3d94d7c
linalg/SME_PHASE1_BENCH.xlsx: fix Phase 1 column in "Phase 2A delta" …
czoli1976 May 16, 2026
7df502f
linalg/arm64+core: add NEON kernels for HardSwish, SiLU, GELU + wire …
czoli1976 May 17, 2026
ef09f81
linalg/arm64/sme: gate SME dispatch on 512-bit streaming vector length
czoli1976 May 20, 2026
551d7ec
linalg/arm64/sme: skip SME kernels when the assembler lacks SME support
czoli1976 May 20, 2026
1426b09
onnx: stop panicking on unknown DataType / AttributeType enum values
kali May 20, 2026
1e90575
test-rt/suite-onnx: enable 113 passing-but-ignored tests
kali May 20, 2026
7eaf9ea
nnef: fix NNEF cycle for ConvInteger, SameLower conv, and QLinearConv
kali May 20, 2026
7f0abd9
fmt
kali May 22, 2026
1d3a689
core/einsum: fold contiguous same-role axes in standard codegen
czoli1976 May 15, 2026
d23d8ba
core/einsum: skip same-role-axes merge on broadcast group axes
czoli1976 May 19, 2026
d0f410e
core/einsum: skip same-role merge when it folds only unit axes
czoli1976 May 21, 2026
1a6ad72
draft 0.23.0 changelog entry
kali May 22, 2026
e51436c
docs: address Julien's review remarks (i-iv)
kali May 22, 2026
b02df7f
test-rt/suite-onnx: gate test_convtranspose_autopad_same to since:18
kali May 22, 2026
3ab80c7
core/binary: OptBinUnicast fallback when input tensors are not regular
kali May 22, 2026
2709c1d
core/binary: also check natural strides in OptBinUnicast guard + add …
kali May 22, 2026
954eb64
core/binary: mirror OptBinUnicast natural-strides guard onto OptBinBy…
kali May 22, 2026
188103c
onnx: add Swish, Mish, Gelu, and RMSNormalization op handlers
kali May 21, 2026
2e2a5a4
onnx: implement Attention op (opset 24) wired to Sdpa
kali May 21, 2026
1279dc8
linalg/arm64/sve: VLA SVE f32 GEMM backend for ARMv9 cores
czoli1976 May 21, 2026
a5e7d0e
linalg/arm64/sve: VLA SVE int8->i32 GEMM kernel (qmmm_i32)
czoli1976 May 21, 2026
66a4d81
linalg/arm64/sve: VLA SVE int8->i32 GEMV kernel (qmmv_i32)
czoli1976 May 21, 2026
e3292f2
linalg/arm64/sve: VLA SVE f32 GEMV kernel (mmv_f32)
czoli1976 May 21, 2026
eb3f6df
linalg/arm64/sve: VLA SVE f16 GEMM kernel (mmm_f16)
czoli1976 May 22, 2026
2d657cb
linalg/arm64/sve: VLA SVE f16 GEMV kernel (mmv_f16)
czoli1976 May 22, 2026
cb9b1b1
readme: replace "Backend" with "Runtime", show runtime.prepare() in q…
kali May 26, 2026
854fe5a
wasm: SIMD int8 (i8->i32) matmul kernel (wasm_i32_4x4)
czoli1976 May 25, 2026
774d8bd
wasm: relaxed-dot int8 fast path + PackedI8K4 + dyn-packing lowering
czoli1976 May 25, 2026
a5279e8
core/cnn: direct register-blocked conv for small-ocg grouped convs
czoli1976 May 20, 2026
54ce06a
core/cnn: clippy-clean BlockedConv
czoli1976 May 20, 2026
4fa823e
onnx: add LpNormalization, MeanVarianceNormalization, GroupNormalizat…
czoli1976 May 26, 2026
f43dc4c
Fix int8 matmul kernel selection picking the 64x1 GEMV for 2D matmuls
czoli1976 May 24, 2026
7611c28
cuda/diag_gather: GPU kernel for the rel-pos skew gather
kali May 26, 2026
890cd61
metal/diag_gather: mirror of the CUDA kernel
kali May 26, 2026
4476e58
metal/diag_gather: cargo +1.91.0 fmt
kali May 26, 2026
eeafefc
cuda/gather: GPU kernel for axis Gather (plain tensors)
kali May 27, 2026
6c8d7d1
metal/gather: mirror of the CUDA Gather kernel
kali May 27, 2026
ac4c206
onnx: add com.microsoft SimplifiedLayerNormalization + SkipSimplified…
czoli1976 May 26, 2026
dcc7230
onnx: add com.microsoft BiasGelu, FastGelu, QuickGelu handlers
czoli1976 May 26, 2026
7e442ab
tract-cuda: expose cuda-12XXX feature gates, default still cuda-13000
kali May 26, 2026
84fbbf3
tract (api/rs): pass-through CUDA features, default cuda-13000
kali May 26, 2026
44b7c8b
ci/large_models: add cuda-13000 to the no-default-features build
kali May 27, 2026
287a622
onnx: add com.microsoft MatMulNBits (4-bit weight quantized matmul)
czoli1976 May 26, 2026
4876cc7
ci/zizmor: grant security-events: write for SARIF upload
kali May 21, 2026
7b2ea86
onnx: add com.microsoft MultiHeadAttention handler
czoli1976 May 26, 2026
85255fd
cuda+metal/transform: fall back to CPU when gpu_op rejects target shape
kali May 27, 2026
fcda9de
harness/nemotron: CI for pulsified preprocessor on --cuda/--metal
kali May 27, 2026
0dcf8a7
pulse: fix output_facts for PulsedSameAxisConcat and AffineChunkTrim
kali May 27, 2026
0d94ee8
harness/nemotron: extend pulse-on-GPU CI to cover encoder too
kali May 27, 2026
f003781
cli/compare: handle device tensors in close_enough + cumulative-off r…
kali May 27, 2026
c4d6e4e
pulse-opl/delay: zero-init streaming buffer
kali May 27, 2026
98f1317
cli/compare + cuda/transform: helpers for GPU per-node bisection
kali May 27, 2026
3a25cc9
gpu/pulse: stride-aware initial copy in GpuPulsePad
kali May 27, 2026
27162d5
cuda/transform: pre-check on post-sync device facts, not raw target i…
kali May 27, 2026
ab5a2e5
metal/transform: mirror cuda pre-check fix on post-sync device facts
kali May 27, 2026
45c50fd
cuda+metal/scaled_masked_softmax: bool mask + post_softmax_mask
kali May 26, 2026
db4b410
harness/nemotron: tighten gpu allowlists (drop SMS, DiagGather, IsNan…
kali May 27, 2026
8b26d85
harness/nemotron: inline decoder Scan via force_scan_external_state
kali May 27, 2026
8517c5e
harness/nemotron: drop Gather from gpu allowlists
kali May 27, 2026
a79bca9
onnx: add com.microsoft SkipLayerNormalization handler
czoli1976 May 26, 2026
57e10a3
linalg/x86_64: add AVX-512 VNNI int8 GEMM kernel (avx512vnni_mmm_i32_…
czoli1976 May 28, 2026
677ccc9
linalg/build.rs: add Intel AMX int8 assembler probe
czoli1976 May 30, 2026
cfaa335
linalg/x86_64: add Intel AMX int8 GEMM kernel (avx512amx_mmm_i32_8x8)
czoli1976 May 30, 2026
60248b1
linalg: add amx_i32 Criterion microbench (AVX2 / VNNI / AMX i8)
czoli1976 May 30, 2026
208c5b6
linalg/core: wire PackedAmxA through OptMatMulPack and ungated re-export
czoli1976 Jun 1, 2026
59b2eb4
linalg/x86_64: prefetch hints in AMX i8 inner K loop
czoli1976 Jun 1, 2026
0aee23f
linalg/x86_64: add 16x16 AMX int8 GEMM kernel (4x mul-adds per tdpbssd)
czoli1976 Jun 1, 2026
573a6fc
linalg/bench: include avx512amx_mmm_i32_16x16 in amx_i32 microbench
czoli1976 Jun 1, 2026
7243c1f
linalg/x86_64: AMX 16x16 prefetch follows oneDNN cache-aware pattern
czoli1976 Jun 1, 2026
fe67666
linalg/x86_64: shape-adaptive AMX dispatch (16x16 for big, 8x8 for sm…
czoli1976 Jun 1, 2026
8f7af4a
linalg/x86_64: add CPUID-leaf-4 cache-size detection for AMX dispatch
czoli1976 Jun 1, 2026
fc44d07
linalg/x86_64: add AMX bf16 16x16 kernel for f32 matmul (TDPBF16PS)
czoli1976 Jun 1, 2026
8d799f5
linalg/bench: include avx512amx_mmm_f32_16x16 in amx_f32 microbench
czoli1976 Jun 2, 2026
f4272d3
linalg/x86_64: add AVX-VNNI ymm int8 GEMM kernel (avxvnni_mmm_i32_8x8)
czoli1976 Jun 2, 2026
ab415be
linalg/x86_64: boost AMX 16x16 kernels + add avxvnni_i32 microbench
czoli1976 Jun 2, 2026
99eb75b
linalg/x86_64: fix swapped operands in AMX 16x16 sub fused-op handlers
claude Jun 2, 2026
26726db
linalg/x86_64: add AVX-512 VNNI zmm 16x16 int8 GEMM kernel
claude Jun 2, 2026
9e8f1c5
linalg/x86_64: document the int8 GEMM kernel cascade
claude Jun 2, 2026
7a23812
linalg/x86_64: add AMX validation + benchmark runbook for AMX-capable…
claude Jun 2, 2026
9b90c6f
linalg/x86_64: add AMX int8/bf16 GEMM validation + benchmark results
claude Jun 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .all_crates.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@

ALL_CRATES_PATH="data linalg core nnef nnef/nnef-resources pulse-opl pulse extra transformers hir tflite tensorflow onnx-opl onnx gpu metal cuda libcli api/rs api/ffi api/proxy/sys api/proxy cli"
ALL_CRATES_PATH="data linalg core nnef nnef/nnef-resources pulse-opl pulse extra transformers hir tflite tensorflow onnx-opl onnx gpu metal cuda libcli api api/rs api/ffi api/proxy/sys api/proxy cli"
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ updates:
actions:
patterns:
- "*"
cooldown:
default-days: 7

- package-ecosystem: "cargo"
directory: "/"
Expand All @@ -20,6 +22,8 @@ updates:
rust-dependencies:
patterns:
- "*"
cooldown:
default-days: 7

- package-ecosystem: "pip"
directory: "/api/py"
Expand All @@ -29,3 +33,5 @@ updates:
schedule:
interval: "weekly"
day: "monday"
cooldown:
default-days: 7
7 changes: 6 additions & 1 deletion .github/workflows/asan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ env:
CARGO_INCREMENTAL: false
FORCE_JAVASCRIPT_ACTIONS_TO_NODE20: true

permissions:
contents: read

jobs:
sanitizer-address:
strategy:
Expand All @@ -19,7 +22,9 @@ jobs:
runs-on: ${{matrix.os}}

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false
- name: Rustup update
run: rustup update
- name: Run sanitized tests
Expand Down
32 changes: 23 additions & 9 deletions .github/workflows/binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,14 @@
CARGO_INCREMENTAL: false
FORCE_JAVASCRIPT_ACTIONS_TO_NODE20: true

permissions:
contents: read

jobs:
assets:
name: Upload Release Binaries
permissions:
contents: write
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -48,30 +53,39 @@
runs-on: ${{ matrix.os }}
steps:
- name: Checkout code
uses: actions/checkout@v6
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- name: Extract version tag
id: version
env:
VERSION_OVERRIDE: ${{ inputs.version_override }}
GH_REF: ${{ github.ref }}
run: |
if [ -n "${{ inputs.version_override }}" ]; then
echo "value=${{ inputs.version_override }}" >> $GITHUB_OUTPUT
if [ -n "$VERSION_OVERRIDE" ]; then
echo "value=$VERSION_OVERRIDE" >> "$GITHUB_OUTPUT"
else
echo "value=$(echo ${{ github.ref }} | cut -f 3 -d / | sed 's/^v//' )" >> $GITHUB_OUTPUT
echo "value=$(echo "$GH_REF" | cut -f 3 -d / | sed 's/^v//' )" >> "$GITHUB_OUTPUT"
fi

- name: Build tract
env:
TARGET: ${{ matrix.target }}
MUSL: ${{ matrix.musl }}
VERSION: ${{ steps.version.outputs.value }}
run: |
set -ex
target=${{matrix.target}}
version=${{steps.version.outputs.value}}
target="$TARGET"
version="$VERSION"
name=${target}-${version}

rustup update
rustup target add ${target}

if [ -n "${{matrix.musl}}" ]
if [ -n "$MUSL" ]
then
MUSL_TRIPLE=${{matrix.musl}}
MUSL_TRIPLE="$MUSL"
curl -s https://s3.amazonaws.com/tract-ci-builds/toolchains/${MUSL_TRIPLE}-cross.tgz | tar zx

MUSL_BIN=`pwd`/${MUSL_TRIPLE}-cross/bin
Expand All @@ -90,7 +104,7 @@
tar czf tract-${name}.tgz tract-${name}

- name: Upload asset
uses: softprops/action-gh-release@v3
uses: softprops/action-gh-release@b4309332981a82ec1c5618f44dd2e27cc8bfbfda # v3

Check notice

Code scanning / zizmor

action functionality is already included by the runner: use gh release in a script step Note

action functionality is already included by the runner: use gh release in a script step
with:
files: tract-${{matrix.target}}-${{ steps.version.outputs.value }}.tgz
name: ${{ steps.version.outputs.value }}
Expand Down
10 changes: 8 additions & 2 deletions .github/workflows/cost_model.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ env:
CARGO_INCREMENTAL: false
FORCE_JAVASCRIPT_ACTIONS_TO_NODE20: true

permissions:
contents: read

jobs:
build:
name: Upload cost model tasks
Expand All @@ -22,11 +25,14 @@ jobs:
target: [ "aarch64", "armv7" ]
steps:
- name: Checkout code
uses: actions/checkout@v6
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- name: Build and upload
run: ./.travis/cost_model_task_build.sh ${{matrix.target}} ${{github.event.inputs.dataset_id}}
run: ./.travis/cost_model_task_build.sh ${{matrix.target}} ${GITHUB_EVENT_INPUTS_DATASET_ID}
env:
AWS_ACCESS_KEY_ID: ${{secrets.TRACT_CI_AWS_ACCESS_KEY_ID}}
AWS_SECRET_ACCESS_KEY: ${{secrets.TRACT_CI_AWS_SECRET_ACCESS_KEY}}
AWS_EC2_METADATA_DISABLED: true
GITHUB_EVENT_INPUTS_DATASET_ID: ${{github.event.inputs.dataset_id}}
27 changes: 21 additions & 6 deletions .github/workflows/crates.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ env:
CARGO_INCREMENTAL: false
FORCE_JAVASCRIPT_ACTIONS_TO_NODE20: true

permissions:
contents: read

jobs:
prepare-matrix:
runs-on: ubuntu-latest
Expand Down Expand Up @@ -50,7 +53,9 @@ jobs:
RUSTUP_TOOLCHAIN: ${{matrix.rust}}

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- name: Cargo test
run: cargo test -p ${{matrix.crate}}
Expand All @@ -65,7 +70,9 @@ jobs:
env:
RUSTUP_TOOLCHAIN: ${{matrix.rust}}
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- name: Cargo test
run: cargo test -p tract-cuda -p test-cuda
Expand All @@ -80,7 +87,9 @@ jobs:
env:
RUSTUP_TOOLCHAIN: "1.91.0"
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false
- name: Minimum-BOM GPU smoke
run: harness/cuda-minimum-deploy/gpu-ci.sh

Expand All @@ -94,7 +103,9 @@ jobs:
env:
RUSTUP_TOOLCHAIN: ${{matrix.rust}}
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false

- name: Cargo test
run: cargo test -p tract-metal -p test-metal
Expand All @@ -111,7 +122,9 @@ jobs:
env:
RUSTUP_TOOLCHAIN: ${{matrix.rust}}
steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false
- run: rustup component add clippy && cargo clippy
- name: fmt
run: rustup component add rustfmt && cargo fmt --check
Expand All @@ -127,7 +140,9 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
persist-credentials: false
- name: Install cargo-deny
run: |
curl -L https://github.com/EmbarkStudios/cargo-deny/releases/download/$VERSION/cargo-deny-$VERSION-x86_64-unknown-linux-musl.tar.gz \
Expand Down
80 changes: 71 additions & 9 deletions .github/workflows/cross-platform.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,57 @@
schedule:
- cron: '0 5 * * *'
workflow_dispatch:
inputs:
pr_number:
description: "Optional PR number to test (from fork ok). Leave empty to run on selected branch."
required: false
type: number

env:
CARGO_INCREMENTAL: false
FORCE_JAVASCRIPT_ACTIONS_TO_NODE20: true
RUSTUP_TOOLCHAIN: 1.91.0

permissions:
contents: read

jobs:
prepare:
runs-on: ubuntu-latest
outputs:
test_ref: ${{ steps.set.outputs.test_ref }}
tract_bench_branch_name: ${{ steps.set.outputs.tract_bench_branch_name }}
steps:
- id: set
uses: actions/github-script@3a2844b7e9c422d3c10d287c895573f7108da1b3 # v9
with:
script: |
core.info(`event: ${context.eventName}`);
core.info(`payload.inputs: ${JSON.stringify(context.payload.inputs)}`);
core.info(`payload.pull_request.number: ${context.payload.pull_request?.number}`);
const prInput = context.payload.inputs?.pr_number ?? context.payload.pull_request?.number;
core.info(`prInput: ${prInput} (type: ${typeof prInput})`);
let ref, branch;
if (prInput) {
const pr = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: Number(prInput),
});
ref = pr.data.head.sha;
branch = `pr-${prInput}-${pr.data.head.ref}`;
} else {
ref = process.env.GITHUB_SHA;
branch = process.env.GITHUB_HEAD_REF || process.env.GITHUB_REF_NAME || 'main';
}
const benchBranch = branch.replace(/\//g, '_');
core.info(`test_ref: ${ref}`);
core.info(`tract_bench_branch_name: ${benchBranch}`);
core.setOutput('test_ref', ref);
core.setOutput('tract_bench_branch_name', benchBranch);

linux:
needs: prepare
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -40,20 +83,24 @@
contents: read

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
ref: ${{ needs.prepare.outputs.test_ref }}
fetch-depth: 0
persist-credentials: false

- name: Get current date
id: date
run: echo "date=$(date +'%Y-%m-%d')" >> $GITHUB_OUTPUT

- name: Configure AWS Credentials
continue-on-error: true
uses: aws-actions/configure-aws-credentials@v6
uses: aws-actions/configure-aws-credentials@d979d5b3a71173a29b74b5b88418bfda9437d885 # v6

Check warning

Code scanning / zizmor

action's hash pin has mismatched or missing version comment: points to commit e7f100cf4c00 Warning

action's hash pin has mismatched or missing version comment: points to commit e7f100cf4c00
with:
role-to-assume: arn:aws:iam::567805100031:role/github-runner-tract-ci
aws-region: us-east-2

- uses: actions/cache@v5
- uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5
with:
path: |
~/.rustup
Expand All @@ -65,16 +112,22 @@
key: ${{ runner.os }}-${{matrix.platform}}-${{steps.date.outputs.date}}

- name: Setup wasmtime
if: ${{ matrix.platform }} == "wasm32-wasi"
uses: bytecodealliance/actions/wasmtime/setup@v1
if: matrix.platform == 'wasm32-wasi'
uses: bytecodealliance/actions/wasmtime/setup@9152e710e9f7182e4c29ad218e4f335a7b203613 # v1

- name: Cross script
env:
PLATFORM: ${{matrix.platform}}
AWS_EC2_METADATA_DISABLED: true
run: .travis/cross.sh
NEEDS_PREPARE_OUTPUTS_TRACT_BENCH_BRANCH_NAME: ${{ needs.prepare.outputs.tract_bench_branch_name }}
NEEDS_PREPARE_OUTPUTS_TEST_REF: ${{ needs.prepare.outputs.test_ref }}
run: |
export TRACT_BENCH_BRANCH_NAME="${NEEDS_PREPARE_OUTPUTS_TRACT_BENCH_BRANCH_NAME}"
export GITHUB_SHA="${NEEDS_PREPARE_OUTPUTS_TEST_REF}"
.travis/cross.sh

apple:
needs: prepare
strategy:
fail-fast: false
matrix:
Expand All @@ -88,11 +141,15 @@
contents: read

steps:
- uses: actions/checkout@v6
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
with:
ref: ${{ needs.prepare.outputs.test_ref }}
fetch-depth: 0
persist-credentials: false

- name: Configure AWS Credentials
continue-on-error: true
uses: aws-actions/configure-aws-credentials@v6
uses: aws-actions/configure-aws-credentials@d979d5b3a71173a29b74b5b88418bfda9437d885 # v6

Check warning

Code scanning / zizmor

action's hash pin has mismatched or missing version comment: points to commit e7f100cf4c00 Warning

action's hash pin has mismatched or missing version comment: points to commit e7f100cf4c00
with:
role-to-assume: arn:aws:iam::567805100031:role/github-runner-tract-ci
aws-region: us-east-2
Expand All @@ -104,4 +161,9 @@
- name: Cross script
env:
PLATFORM: ${{matrix.platform}}
run: .travis/cross.sh
NEEDS_PREPARE_OUTPUTS_TRACT_BENCH_BRANCH_NAME: ${{ needs.prepare.outputs.tract_bench_branch_name }}
NEEDS_PREPARE_OUTPUTS_TEST_REF: ${{ needs.prepare.outputs.test_ref }}
run: |
export TRACT_BENCH_BRANCH_NAME="${NEEDS_PREPARE_OUTPUTS_TRACT_BENCH_BRANCH_NAME}"
export GITHUB_SHA="${NEEDS_PREPARE_OUTPUTS_TEST_REF}"
.travis/cross.sh
Loading
Loading