Skip to content

Comments

Pre-fault large allocations to reduce PTE lock contention#291

Closed
zkfriendly wants to merge 1 commit intoworldfnd:px/whir-pr215-compatfrom
zkfriendly:perf
Closed

Pre-fault large allocations to reduce PTE lock contention#291
zkfriendly wants to merge 1 commit intoworldfnd:px/whir-pr215-compatfrom
zkfriendly:perf

Conversation

@zkfriendly
Copy link
Collaborator

Pre-fault large allocations to reduce PTE lock contention

Use madvise(MADV_POPULATE_WRITE) and MADV_HUGEPAGE on large allocations (>=128 KiB) so that rayon worker threads don't all hit anonymous page faults simultaneously, avoiding contention on the kernel's PTE spinlock.

Changes

  • Added prefault() in the global allocator that hints huge pages and pre-faults writable pages for large allocations on Linux.
  • Uses runtime page size via sysconf(_SC_PAGESIZE) instead of a hardcoded value.
  • Overflow-safe alignment arithmetic; only advises pages fully within the owned allocation.
  • No-op on non-Linux targets.

Benchmark results

Benchmark 1: prefault
  Time (mean ± σ):      6.984 s ±  0.187 s    [User: 60.346 s, System: 1.147 s]
  Range (min … max):    6.730 s …  7.281 s    10 runs

Benchmark 2: base
  Time (mean ± σ):      7.902 s ±  0.096 s    [User: 63.198 s, System: 2.212 s]
  Range (min … max):    7.680 s …  8.039 s    10 runs

Summary
  prefault ran
    1.13 ± 0.03 times faster than base

~13% wall-clock improvement, ~1 s less system time (fewer kernel page faults).

perf stat comparison

Metric prefault base Delta
page-faults 24,595 119,805 4.9x fewer
dTLB-load-misses 2,618,368 5,210,664 2.0x fewer
dTLB-store-misses 1,136,056 3,069,966 2.7x fewer
sys time 1.98 s 2.65 s 25% less
user time 56.44 s 58.12 s 2.9% less
wall-clock 8.57 s 9.44 s 9.2% faster

Pre-faulting eliminates ~95k minor faults that would otherwise happen in parallel across rayon threads, which is what drives the system-time reduction.
The TLB miss reduction comes from MADV_HUGEPAGE consolidating pages into 2 MiB mappings.

@zkfriendly zkfriendly changed the base branch from main to px/whir-pr215-compat February 16, 2026 10:10
@Bisht13 Bisht13 deleted the branch worldfnd:px/whir-pr215-compat February 16, 2026 14:53
@Bisht13 Bisht13 closed this Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants