Pre-fault large allocations to reduce PTE lock contention by zkfriendly · Pull Request #291 · worldfnd/provekit

zkfriendly · 2026-02-16T09:55:55Z

Pre-fault large allocations to reduce PTE lock contention

Use madvise(MADV_POPULATE_WRITE) and MADV_HUGEPAGE on large allocations (>=128 KiB) so that rayon worker threads don't all hit anonymous page faults simultaneously, avoiding contention on the kernel's PTE spinlock.

Changes

Added prefault() in the global allocator that hints huge pages and pre-faults writable pages for large allocations on Linux.
Uses runtime page size via sysconf(_SC_PAGESIZE) instead of a hardcoded value.
Overflow-safe alignment arithmetic; only advises pages fully within the owned allocation.
No-op on non-Linux targets.

Benchmark results

Benchmark 1: prefault
  Time (mean ± σ):      6.984 s ±  0.187 s    [User: 60.346 s, System: 1.147 s]
  Range (min … max):    6.730 s …  7.281 s    10 runs

Benchmark 2: base
  Time (mean ± σ):      7.902 s ±  0.096 s    [User: 63.198 s, System: 2.212 s]
  Range (min … max):    7.680 s …  8.039 s    10 runs

Summary
  prefault ran
    1.13 ± 0.03 times faster than base

~13% wall-clock improvement, ~1 s less system time (fewer kernel page faults).

`perf stat` comparison

Metric	prefault	base	Delta
page-faults	24,595	119,805	4.9x fewer
dTLB-load-misses	2,618,368	5,210,664	2.0x fewer
dTLB-store-misses	1,136,056	3,069,966	2.7x fewer
sys time	1.98 s	2.65 s	25% less
user time	56.44 s	58.12 s	2.9% less
wall-clock	8.57 s	9.44 s	9.2% faster

Pre-faulting eliminates ~95k minor faults that would otherwise happen in parallel across rayon threads, which is what drives the system-time reduction.
The TLB miss reduction comes from MADV_HUGEPAGE consolidating pages into 2 MiB mappings.

zkfriendly changed the base branch from main to px/whir-pr215-compat February 16, 2026 10:10

perf: prefault + mimalloc

79908bc

zkfriendly force-pushed the perf branch from 920afa2 to 79908bc Compare February 16, 2026 12:26

Bisht13 deleted the branch worldfnd:px/whir-pr215-compat February 16, 2026 14:53

Bisht13 closed this Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Pre-fault large allocations to reduce PTE lock contention#291

Pre-fault large allocations to reduce PTE lock contention#291
zkfriendly wants to merge 1 commit intoworldfnd:px/whir-pr215-compatfrom
zkfriendly:perf

zkfriendly commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

zkfriendly commented Feb 16, 2026