linalg/cache sorcery: LLC/SLC-aware budget for the L3 outer blocking tier by czoli1976 · Pull Request #2352 · sonos/tract

czoli1976 · 2026-06-07T21:22:11Z

Stacked on #2349 (cache module) + #2350 (L3 outer tier). Review only the top commit — GitHub shows the whole stack until the lower PRs merge (a fork branch can't be a PR base in upstream).

Problem

#2350's outer tier fires only when the per-CPU cache topology exposes an architectural L3 (cpu/cache level 3). Many Arm SoCs (and Apple) have no cluster L3 but a System-Level Cache (SLC) shared with the GPU/NPU/display (Qualcomm LLCC, Apple SLC), which is never listed under cpu/cache/index* — so those parts see l3 == 0 and miss a real last-level cache the tier could use.

"LLC" = role (last cache before DRAM). "SLC" = a specific shared interconnect cache that is one kind of LLC. This generalises "detect L3" → "establish the LLC budget," and treats a contended SLC more conservatively than a dedicated L3.

Change

New cache::last_level_cache() -> Option<(usize, LlcKind)> resolving, first hit wins:

TRACT_LLC_BYTES env override ("8M", "33554432"); kind = SystemLevel iff TRACT_LLC_CONTENDED is set, else Dedicated.
architectural L3 (cache_info().l3 > l2) → Dedicated.
best-effort Linux devicetree SLC probe (cache-level == 3 + cache-size, outside /cpus) → SystemLevel.

l3_block_budget_bytes() now budgets a Dedicated L3 at ~½ and a contended SystemLevel cache at ~¼ (can't assume residency of lines the GPU/NPU keep evicting). Unknown ⇒ None ⇒ no outer tier (unchanged, regression-safe).

Notes

Purely additive; default behavior on parts with a normal L3 is identical (L3 path, ½ budget).
SLCs whose size is fixed in the controller (e.g. Qualcomm LLCC carries no cache-size in DT) fall to the TRACT_LLC_BYTES override — documented.
Pure resolver resolve_llc() is unit-tested for priority/regression-safety; the DT probe is panic-tested.

Prior art

Runtime cache sizing: Eigen queryCacheSizes (CPUID/sysctl) + manage_caching_sizes; glibc sysconf(_SC_LEVELx_CACHE_SIZE); ACPI PPTT; hwloc.
SLC exposure: Qualcomm LLCC (drivers/soc/qcom/llcc-qcom.c, devicetree qcom,llcc; LWN "SDM845 System Cache Driver"); generic devicetree cache bindings (cache-level/cache-size).
Gap this addresses: mainstream GEMM libs (OpenBLAS/BLIS/Eigen/MKL) detect L1/L2/L3 but don't chase the SLC — they equate "L3" with "LLC" — so SLC-aware blocking on SoCs whose LLC is an SLC is under-addressed.

Caveats (honest)

The ¼ SLC fraction and the depth-4 DT walk are heuristics; they want validation on a real SLC SoC (Snapdragon/Apple). None was available where this was authored (x86/Apple-Silicon hosts — the SLC path is inert on both: x86 takes the Dedicated L3 branch, Apple Silicon exposes no DT SLC, so locally only the regression-safe path is exercised).
The robust core is the TRACT_LLC_BYTES override; the devicetree autodetect is best-effort and only kicks in when there is no architectural L3 and a DT node actually carries a numeric cache-size.

This falls in the Extreme Category

The single-thread MMM block-budget probed L2 with detection logic inlined in frame/mmm/mod.rs, reusable nowhere and limited to macOS/Linux L2. Move it into a cache module that exposes L1d/L2/L3 through one memoised probe (macOS/iOS via sysctlbyname, Linux/Android via /sys, Windows via wmic) and have the block budget read it. The existing L2 budget is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…walk The single-thread tile walk blocked one level, sizing panel blocks to L2 only; at large k a grid that exceeds L2 still re-fetches shared A/B panels from DRAM as it sweeps. Wrap the L2 inner block in an outer super-block sized to L3 (from the crate::cache probe) so a group of inner blocks stays L3-resident across the sweep. The outer tier engages only when an L3 larger than L2 is detected; otherwise the edge is the whole grid and the walk is identical to before. Still pure tile reordering, so bit-exact with the naive loop. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sonos#2350's outer tier fires only when the per-CPU cache topology exposes an architectural L3 (cpu/cache level 3). Many Arm SoCs (and Apple) have no cluster L3 but a System-Level Cache (SLC) shared with the GPU/NPU/display (Qualcomm LLCC, Apple SLC), which is never listed under cpu/cache/index* — so those parts see l3 == 0 and miss a real last-level cache the tier could use. Add cache::last_level_cache() -> Option<(usize, LlcKind)> resolving, first hit wins: (1) TRACT_LLC_BYTES env override (+ TRACT_LLC_CONTENDED); (2) architectural L3 → Dedicated; (3) best-effort Linux devicetree SLC (cache-level == 3 + cache-size, outside /cpus) → SystemLevel. l3_block_budget_bytes() now budgets a Dedicated L3 at ~1/2 and a contended SystemLevel cache at ~1/4. Unknown ⇒ None ⇒ no outer tier (unchanged, regression-safe). Purely additive — default behaviour on parts with a normal L3 is identical. Pure resolver unit-tested for priority/regression-safety; DT probe is panic-tested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

czoli1976 · 2026-06-09T08:40:00Z

@kali can you please review this Cache PR Trio ?

czoli1976 and others added 3 commits June 7, 2026 16:12

czoli1976 changed the title ~~linalg/cache: LLC/SLC-aware budget for the L3 outer blocking tier~~ linalg/cache sorcery: LLC/SLC-aware budget for the L3 outer blocking tier Jun 7, 2026

czoli1976 marked this pull request as ready for review June 9, 2026 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

linalg/cache sorcery: LLC/SLC-aware budget for the L3 outer blocking tier#2352

linalg/cache sorcery: LLC/SLC-aware budget for the L3 outer blocking tier#2352
czoli1976 wants to merge 3 commits into
sonos:mainfrom
czoli1976:feature/mmm-st-llc-slc

czoli1976 commented Jun 7, 2026 •

edited

Loading

Uh oh!

czoli1976 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

czoli1976 commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Change

Notes

Prior art

Caveats (honest)

Uh oh!

czoli1976 commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

czoli1976 commented Jun 7, 2026 •

edited

Loading