[PeerDAS] Precomputed MSMs - up to 1.62x acceleration#619
Conversation
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughWalkthroughReworks KZG trusted-setup into a context with optional MSM precomputation, adds a PrecomputedMSM implementation and benchmarks, refactors the Toeplitz/FK20 pipeline to accept either raw affine spectra or precomputed tables, updates many callsites/tests/FFI bindings to the new context API, and applies related elliptic API and assertion-message changes. ChangesPrecomputed MSM, KZG context & FK20 pipeline
Elliptic API ergonomics & assertion standardization
Sequence Diagram(s)sequenceDiagram
participant Caller
participant CKZG as EthereumKZGContext
participant Toeplitz as ToeplitzAccumulator
participant MSM as PrecomputedMSM
participant Proof as kzg_coset_prove
Caller->>CKZG: new(filepath, t, b)
CKZG->>CKZG: load_from_file(filepath, t, b)
CKZG->>CKZG: computePolyphaseDecompositionFourier()
CKZG->>MSM: init(rawPoints, t, b) × N
CKZG-->>Caller: ctx initialized (kind=kPrecompute)
Caller->>Toeplitz: init(L)
Caller->>Toeplitz: accumulate(circulant)
Toeplitz->>Toeplitz: FFT circulant -> coeffs
Caller->>Proof: kzg_coset_prove_impl(request)
alt kind == kPrecompute
Proof->>MSM: msm_vartime(scalars)
MSM-->>Proof: weighted sum (precomp)
else kind == kNoPrecompute
Proof->>Proof: multiScalarMul_vartime(scalars, rawPoints)
end
Proof->>Toeplitz: finish(output, polyphaseSpectrumBank)
Toeplitz-->>Proof: ifft(result)
Proof-->>Caller: FK20 proofs
Caller->>CKZG: delete()
CKZG->>MSM: destroy() × N (if precomputed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Greptile SummaryThis PR introduces Herold-Hagopian precomputed MSMs for FK20 multi-proofs in PeerDAS, achieving a ~1.62x speedup (145 ms to 88 ms per blob) by replacing per-proof Pippenger MSMs with precomputed subset-sum lookup tables parameterised by
Confidence Score: 4/5The core algorithm and dispatch logic are correct; the one memory-safety gap in the new PrecomputedMSM type is easy to close before merge. PrecomputedMSM owns a heap-allocated table pointer via freeHeapAligned in =destroy, but no =copy prohibition is defined. Any accidental by-value copy will produce two objects sharing the same raw pointer and trigger a double-free on destruction. The fix is a one-liner identical to what ToeplitzAccumulator already uses. All other changes (Toeplitz refactor, dispatch logic, bindings) look correct and well-tested. constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim needs the =copy prohibition; constantine/commitments_setups/ethereum_kzg_srs.nim is worth a second look for the rawPoints memory overhead. Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant SRS as ethereum_kzg_srs
participant PSB as PolyphaseSpectrumBank
participant KZG as kzg_multiproofs
participant Toeplitz as ToeplitzAccumulator
participant PMSM as PrecomputedMSM
Caller->>SRS: ctt_eth_kzg_context_new_with_precompute(t, b)
SRS->>SRS: computePolyphaseDecompositionFourier
SRS->>PSB: fill rawPoints (always)
SRS->>PMSM: init(points[pos], t, b) for each pos
Note over PMSM: Builds subset-sum table 2^b entries/window
Caller->>KZG: kzg_coset_prove(..., PSB.precompPoints)
loop offset in 0..L-1
KZG->>Toeplitz: accumulate(circulant FFT)
end
KZG->>Toeplitz: finish(output, polyphaseSpectrumBank)
loop i in 0..size-1
Toeplitz->>PMSM: msm_vartime(output[i], scalars)
loop t_i in 0..t-1
Note over PMSM: double if t_i > 0, lookup table
end
end
Toeplitz->>Toeplitz: ec_ifft_nn(output)
KZG->>Caller: proofs (affine, bit-reversed)
Reviews (1): Last reviewed commit: "Runtime Precomp param" | Re-trigger Greptile |
| proc `=destroy`*[EC; N](ctx: var PrecomputedMSM[EC, N]) {.raises: [].} = | ||
| if ctx.table != nil: | ||
| freeHeapAligned(cast[pointer](ctx.table)) | ||
| ctx.table = nil | ||
| ctx.tableLen = 0 | ||
| ctx.t = 0 | ||
| ctx.b = 0 |
There was a problem hiding this comment.
PrecomputedMSM defines a custom =destroy that calls freeHeapAligned on ctx.table, but does not define =copy. In Nim with ARC, the compiler will generate a bitwise (shallow) copy for any assignment or pass-by-value, producing two objects that share the same raw table pointer. When both are destroyed, freeHeapAligned is called twice on the same address — a double-free. The sibling type ToeplitzAccumulator already has {.error.} on =copy; PrecomputedMSM needs the same guard.
| proc `=destroy`*[EC; N](ctx: var PrecomputedMSM[EC, N]) {.raises: [].} = | |
| if ctx.table != nil: | |
| freeHeapAligned(cast[pointer](ctx.table)) | |
| ctx.table = nil | |
| ctx.tableLen = 0 | |
| ctx.t = 0 | |
| ctx.b = 0 | |
| proc `=destroy`*[EC; N](ctx: var PrecomputedMSM[EC, N]) {.raises: [].} = | |
| if ctx.table != nil: | |
| freeHeapAligned(cast[pointer](ctx.table)) | |
| ctx.table = nil | |
| ctx.tableLen = 0 | |
| ctx.t = 0 | |
| ctx.b = 0 | |
| proc `=copy`*[EC; N](dst: var PrecomputedMSM[EC, N], src: PrecomputedMSM[EC, N]) {.error.} |
| case kind*: PolyphaseKind | ||
| of kNoPrecompute: | ||
| discard | ||
| of kPrecompute: | ||
| precompPoints* {.align: 64.}: array[CELLS_PER_EXT_BLOB, PrecomputedMSM[EC_ShortW_Jac[Fp[BLS12_381], G1], FIELD_ELEMENTS_PER_CELL]] | ||
|
|
||
| # On the number of 𝔾2 points: | ||
| # - In the Deneb specs, https://github.com/ethereum/consensus-specs/blob/v1.3.0/specs/deneb/polynomial-commitments.md |
There was a problem hiding this comment.
rawPoints is unconditionally populated in both modes
rawPoints (~1.18 MiB: 128 × 64 × 144 B) lives in the non-variant part of PolyphaseSpectrumBank and is always filled during load_ckzg4844. When kPrecompute is active the field is never read — every call to kzg_coset_prove dispatches to the precompPoints overload — so this allocation is pure waste. For a default (t=64, b=12) context the tables already consume ~554 MiB; adding a silent extra MiB undermines the memory-budget documentation. Moving rawPoints into the kNoPrecompute branch of the variant would keep both paths honest.
|
|
||
| /** Load trusted setup from path | ||
| * Currently the only format supported `cttEthTSFormat_ckzg4844` | ||
| * is from the reference implementation c-kzg-4844 text file | ||
| /** Create a new KZG context from trusted setup file. | ||
| * Loads SRS, computes polyphase decomposition as raw affine points, | ||
| * and sets the context to kNoPrecompute mode (~1.8 MiB). | ||
| */ | ||
| ctt_eth_trusted_setup_status ctt_eth_trusted_setup_load( | ||
| ctt_eth_trusted_setup_status ctt_eth_kzg_context_new( | ||
| ctt_eth_kzg_context** ctx, | ||
| const char* filepath, | ||
| ctt_eth_trusted_setup_format format | ||
| ) __attribute__((__warn_unused_result__)); | ||
| ) __attribute__((__warn_unused_result__)); | ||
|
|
||
| /** Destroy a trusted setup | ||
| /** Create a new KZG context with precomputed MSM tables. | ||
| * Same as ctt_eth_kzg_context_new but also builds PrecomputedMSM lookup | ||
| * tables for FK20 proofs (PeerDAS). | ||
| * | ||
| * @param t base groups (stride between precomputed layers) | ||
| * @param b bits per window (window size = 2^b) | ||
| * | ||
| * SPEED / MEMORY TRADEOFF (Intel i7-265K, FK20 proofs = 128 MSMs per blob): | ||
| * - no precompute: ~145 ms/blob, ~1.8 MiB | ||
| * - t=64,b=8: ~109 ms/blob, ~101 MiB per MSM (~12.8 GiB total) | ||
| * - t=64,b=12: ~89 ms/blob, ~8.7 MiB per MSM (~1.1 GiB total) | ||
| * - t=128,b=8: ~105 ms/blob, ~50 MiB per MSM (~6.4 GiB total) | ||
| * - t=128,b=12: ~92 ms/blob, ~4.3 MiB per MSM (~0.6 GiB total) | ||
| * - t=256,b=8: ~105 ms/blob, ~25 MiB per MSM (~3.2 GiB total) | ||
| * | ||
| * Larger b = faster per MSM but exponentially more memory (2^b entries). | ||
| * Larger t = fewer doublings but more precomputed layers. | ||
| * Default (t=64, b=12): ~89 ms/blob proving, ~1.1 GiB total memory. | ||
| */ | ||
| void ctt_eth_trusted_setup_delete(ctt_eth_kzg_context* ctx); | ||
| ctt_eth_trusted_setup_status ctt_eth_kzg_context_new_with_precompute( | ||
| ctt_eth_kzg_context** ctx, | ||
| const char* filepath, | ||
| ctt_eth_trusted_setup_format format, | ||
| int t, | ||
| int b | ||
| ) __attribute__((__warn_unused_result__)); | ||
|
|
||
| /** Destroy a KZG context | ||
| */ | ||
| void ctt_eth_kzg_context_delete(ctt_eth_kzg_context* ctx); | ||
|
|
There was a problem hiding this comment.
Breaking C API rename without backward-compat shims
ctt_eth_trusted_setup_load and ctt_eth_trusted_setup_delete are completely removed and replaced by ctt_eth_kzg_context_new / ctt_eth_kzg_context_delete. Any downstream C consumer that linked against the previous header will get a link error with no migration path. The Go and Rust bindings are updated in this PR, but external C users are silently broken. Consider keeping the old names as deprecated thin wrappers (or at minimum a clearly called-out breaking change note in the changelog / migration guide) so existing consumers can adapt incrementally.
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (5)
benchmarks/bench_elliptic_template.nim (1)
361-371: 💤 Low valueLikely typo on the export marker for
t.
t, b*: intexports onlyb;tis private. Bothtandbare part of the same configuration tuple and are referenced together everywhere the context is used (fmt"t={ctx.t}, b={ctx.b}",msmPrecompEstimateOps(EC, N, ctx.t, ctx.b)). Either both should be exported or neither — the current asymmetry is most likely an accidental missing*.♻️ Suggested fix
- t, b*: int + t*, b*: int🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/bench_elliptic_template.nim` around lines 361 - 371, The struct PrecompBenchContext has an export typo: the line "t, b*: int" only exports b, leaving t unexported; update the field declaration in PrecompBenchContext so both fields are exported (e.g., mark t with a '*' like b) so that ctx.t and ctx.b are both public and usable in callers such as fmt"t={ctx.t}, b={ctx.b}" and msmPrecompEstimateOps.constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim (1)
55-70: 💤 Low valueDefensive guard:
msmPrecompEstimateOpsdivides bybwithout the samet==0 or b==0short-circuit asmsmPrecompSize.
msmPrecompSize(Line 75) returns 0 for the no-precompute mode(t=0, b=0), butmsmPrecompEstimateOpswill divide by zero on Line 65 ifb == 0and produce a negativedblift == 0. If callers ever pass the no-precompute sentinel through, this asserts/UBs. Mirror the guard for consistency.🛡️ Suggested fix
func msmPrecompEstimateOps*[EC](_: typedesc[EC]; N, t, b: int): tuple[add, dbl: int] = ## Returns (additions, doublings) for one MSM execution + if t == 0 or b == 0: + return (0, 0) const bits = EC.getScalarField().bits() # Doublings: one per t iteration (except first) let dbl = t - 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim` around lines 55 - 70, The function msmPrecompEstimateOps should mirror msmPrecompSize's sentinel handling: if t == 0 or b == 0 return (0, 0) immediately to avoid dividing by zero and producing a negative dbl; update msmPrecompEstimateOps (the func named msmPrecompEstimateOps*) to check for the no-precompute case up-front and early-return (add=0, dbl=0) before computing dbl, totalBitExtractions, numWindows or add.constantine/math/matrix/toeplitz.nim (1)
344-360: ⚡ Quick winKeep the
scratchScalarslayout guard in the precomputed overload too.This overload performs the same
F→F.getBigInt()reinterpret cast as the raw-points path, but it drops thestatic doAssert sizeof(F) == sizeof(F.getBigInt()). Without the same guard, a future field type can make the precomputed path compile into UB while the non-precomputed path still fails fast at compile time.Suggested change
proc finish*[EC, ECaff, F; N: static int]( ctx: var ToeplitzAccumulator[EC, ECaff, F], output: var openArray[EC], polyphaseSpectrumBank: openArray[PrecomputedMSM[EC, N]] ): ToeplitzStatus {.raises: [], meter.} = @@ let n = ctx.size if n == 0 or output.len != n or ctx.offset != ctx.L or polyphaseSpectrumBank.len != n or N != ctx.L: return Toeplitz_MismatchedSizes + static: doAssert sizeof(F) == sizeof(F.getBigInt()), "scratchScalars cast requires sizeof(F) == sizeof(F.getBigInt())" let scalars = cast[ptr UncheckedArray[F.getBigInt()]](ctx.scratchScalars)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@constantine/math/matrix/toeplitz.nim` around lines 344 - 360, The precomputed overload `finish` is missing the layout guard used in the raw-points path; add the same compile-time assertion (e.g. `static doAssert sizeof(F) == sizeof(F.getBigInt())`) near the start of `proc finish*[EC, ECaff, F; N: static int](ctx: var ToeplitzAccumulator[EC, ECaff, F], ...)` to ensure `ctx.scratchScalars` can be safely reinterpreted via `cast[ptr UncheckedArray[F.getBigInt()]](ctx.scratchScalars)` and prevent UB for incompatible `F` types.tests/commitments/t_kzg_multiproofs.nim (1)
831-910: ⚡ Quick winCover more than one precompute configuration here.
This new test only exercises the runtime-tunable path for
(t=64, b=8), so a regression in the parameter plumbing can still pass as long as that single point works. Please make this table-driven over at least a couple of representative(t, b)pairs so the new configurable API is actually covered.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/commitments/t_kzg_multiproofs.nim` around lines 831 - 910, The testFK20PrecompMSM currently only exercises one (t,b) pair; make it table-driven by defining a small list/array of representative (t,b) pairs (e.g. [(64,8),(32,4)] or similar) and loop over them, re-running the precompBank init, kzg_coset_prove, comparisons and verifications for each pair; update uses of the local constants t and b (and any derived values) inside the loop so precompBank.init(polyphaseBasis[pos], t = t, b = b), precompProofs, and verification calls are executed per (t,b) configuration and assert success for each configuration within testFK20PrecompMSM.constantine/commitments_setups/ethereum_kzg_srs.nim (1)
374-381: 💤 Low valueOptional: reuse
rawPoints[pos]instead of rebuildingpoints.After lines 370-372,
ctx.polyphaseSpectrumBank.rawPoints[pos]already contains exactly the same data that the inner loop is copying into thepointslocal. SincePrecomputedMSM.initacceptsbasis: openArray[affine(EC)](read-only), you can passrawPoints[pos]directly and avoid the redundant per-position array copy:♻️ Proposed simplification
# Build PrecomputedMSM tables if t > 0 and b > 0. if t > 0 and b > 0: ctx.polyphaseSpectrumBank.kind = kPrecompute for pos in 0 ..< CELLS_PER_EXT_BLOB: - var points {.noInit.}: array[FIELD_ELEMENTS_PER_CELL, BLS12_381_G1_Aff] - for offset in 0 ..< FIELD_ELEMENTS_PER_CELL: - points[offset] = tmp[offset][pos] - ctx.polyphaseSpectrumBank.precompPoints[pos].init(points, t = t, b = b) + ctx.polyphaseSpectrumBank.precompPoints[pos].init( + ctx.polyphaseSpectrumBank.rawPoints[pos], t = t, b = b) else: ctx.polyphaseSpectrumBank.kind = kNoPrecompute🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@constantine/commitments_setups/ethereum_kzg_srs.nim` around lines 374 - 381, Replace the per-position rebuild of a temporary array with the already-populated rawPoints entry: inside the block that sets ctx.polyphaseSpectrumBank.kind = kPrecompute, stop creating and filling the local var points from tmp; instead call ctx.polyphaseSpectrumBank.precompPoints[pos].init with ctx.polyphaseSpectrumBank.rawPoints[pos] (which already contains FIELD_ELEMENTS_PER_CELL BLS12_381_G1_Aff values matching what points would contain) and pass the same t and b; update references to tmp/points only for positions where rawPoints[pos] is not available.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benchmarks/bench_elliptic_template.nim`:
- Line 400: The field ctx.precompMemMiB is computed using / 1e6 which gives
decimal megabytes, but the name and output use MiB (mebibytes); change the
divisor to 1024*1024 so ctx.precompMemMiB = float64(msmPrecompSize(EC, N, t, b)
* sizeof(affine(EC))) / (1024*1024) to report true MiB.
- Around line 410-448: The code computes cycles using startClk and stopClk which
are only declared inside when SupportsGetTicks branches (startClk and stopClk in
the msm benchmark loop), causing a compile error when SupportsGetTicks is false;
move the cycles computation into the when SupportsGetTicks: branch (i.e.,
compute cycles = (stopClk - startClk) div iters) so it is only referenced when
startClk/stopClk exist and keep the existing when/else echo branches unchanged.
In `@benchmarks/bench_eth_eip7594_peerdas.nim`:
- Around line 184-209: The runPrecomp benchmark (proc runPrecomp) is never
executed so the precompute result is missing; either invoke runPrecomp() after
runNoPrecomp() or remove the unused runPrecomp proc to avoid dead code; locate
the procs runNoPrecomp and runPrecomp in this file and add a call to
runPrecomp() (or delete the runPrecomp definition) so the precompute benchmark
actually runs and emits its output.
- Around line 369-376: runConfig currently mutates the discriminant of the case
object polyphaseSpectrumBank by writing into precompPoints before setting
kind=kPrecompute (and runBaseline flips kind=kNoPrecompute), which is undefined;
fix by avoiding in-place discriminant mutation—either create a fresh ctx (or
fresh polyphaseSpectrumBank) per configuration and initialize its precompPoints
there inside runConfig, or allocate a separate precompute-only buffer and
populate that buffer and then atomically swap polyphaseSpectrumBank to point to
the new precompute bank (so you never write into the inactive branch); update
runConfig and any callers (e.g., runBaseline) to construct/swap the new bank
rather than mutating kind on a shared case object.
In `@benchmarks/bench_matrix_toeplitz.nim`:
- Around line 163-167: The allocation for `basis` is too small:
`allocHeapArrayAligned(array[L, BLS12_381_G1_aff], size, 64)` only allocates
`size` arrays but `rng.random_unsafe(basis.toOpenArray(0, size * L - 1))` writes
`size * L` elements; update the allocation to allocate `size * L` elements
(i.e., call `allocHeapArrayAligned` with length `size * L`) so
`basis.toOpenArray(0, size * L - 1)` stays within bounds; ensure you adjust only
the allocation argument and keep the existing `rng.random_unsafe` call and
`toOpenArray` indexing.
In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 423-424: The exported FFI proc new_with_precompute
(ctt_eth_kzg_context_new_with_precompute) declares parameters t and b as Nim int
which is pointer-sized; change their types to cint to match the C header's int
and avoid ABI mismatches, i.e. update proc new_with_precompute*(ctx: var ptr
EthereumKZGContext, filepath: cstring, format: TrustedSetupFormat, t, b: cint):
TrustedSetupStatus and likewise update the signature of load_from_file (the
method called inside) to accept t, b: cint so both declarations align with the C
ABI.
In `@constantine/ethereum_eip4844_kzg.nim`:
- Around line 25-26: The public constructors new and new_with_precompute expose
an EthereumKZGContext that can remain allocated when f.open(...) fails
(returning tsMissingOrInaccessibleFile), leaking memory and handing callers a
partially initialized ctx; fix by either moving the allocation of ctx until
after the file open succeeds or explicitly freeing and nulling ctx before every
early return on error (e.g., in the code path that calls f.open(...) and returns
tsMissingOrInaccessibleFile), ensuring EthereumKZGContext is only returned
non-nil on success.
In `@include/constantine/protocols/ethereum_eip4844_kzg.h`:
- Around line 205-223: The benchmark table and description in the comment are
inconsistent and misleading; update the memory entries so they increase with b
(e.g., show t=64,b=12 uses more memory than t=64,b=8), remove or correct the
"Default (t=64, b=12)" claim (since ctt_eth_kzg_context_new_with_precompute /
new_with_precompute in C has no defaults), and explicitly state that larger b
increases memory exponentially (2^b) while larger t trades layers for doublings;
reference the function name ctt_eth_kzg_context_new_with_precompute (or
new_with_precompute) so readers know this applies to that API and ensure the
table + summary match actual benchmark numbers and the PR notes.
In `@papers/MSM/H23` - Notes on MSMs with Precomputation.md:
- Line 31: Replace empty alt text in the markdown image embeds with descriptive
alt text: for the image currently written as

(and the other embeds at the same file around lines 44, 67, 74, 100, 135),
change to the pattern  using brief,
meaningful descriptions (e.g., "Figure 1: Naive algorithm binary decomposition
diagram") so each embedded image in "H23 - Notes on MSMs with Precomputation.md"
has non-empty alt text for accessibility and markdown linting.
- Line 133: Fix the visible misspellings across the note by replacing "verical"
with "vertical", "turns our" with "turns out", "indivudial" with "individual",
"endomorphis" with "endomorphism", and "precompuation" with "precomputation"
(these occur in the paragraph containing "There is a variant of the above using
verical blocks..." and similarly at the other flagged instances); update those
exact tokens in the document so all explanatory sections read correctly while
preserving surrounding phrasing and math.
In `@tests/testutils/eth_consensus_utils.nim`:
- Line 29: The test utility currently calls
ctx.new_with_precompute(TrustedSetupMainnet, kReferenceCKzg4844, 256, 8) which
forces the kPrecompute code path in polyphaseSpectrumBank (building
PrecomputedMSM tables) while the rest of the suite uses ctx.new(...)
(kNoPrecompute); either add a brief comment next to the new_with_precompute call
explaining why t=256, b=8 are required here and that the split intentionally
exercises both precomputed and non-precomputed branches across tests, or replace
the call with ctx.new(...) to match the other tests and avoid extra memory/setup
overhead in CI.
---
Nitpick comments:
In `@benchmarks/bench_elliptic_template.nim`:
- Around line 361-371: The struct PrecompBenchContext has an export typo: the
line "t, b*: int" only exports b, leaving t unexported; update the field
declaration in PrecompBenchContext so both fields are exported (e.g., mark t
with a '*' like b) so that ctx.t and ctx.b are both public and usable in callers
such as fmt"t={ctx.t}, b={ctx.b}" and msmPrecompEstimateOps.
In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 374-381: Replace the per-position rebuild of a temporary array
with the already-populated rawPoints entry: inside the block that sets
ctx.polyphaseSpectrumBank.kind = kPrecompute, stop creating and filling the
local var points from tmp; instead call
ctx.polyphaseSpectrumBank.precompPoints[pos].init with
ctx.polyphaseSpectrumBank.rawPoints[pos] (which already contains
FIELD_ELEMENTS_PER_CELL BLS12_381_G1_Aff values matching what points would
contain) and pass the same t and b; update references to tmp/points only for
positions where rawPoints[pos] is not available.
In `@constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim`:
- Around line 55-70: The function msmPrecompEstimateOps should mirror
msmPrecompSize's sentinel handling: if t == 0 or b == 0 return (0, 0)
immediately to avoid dividing by zero and producing a negative dbl; update
msmPrecompEstimateOps (the func named msmPrecompEstimateOps*) to check for the
no-precompute case up-front and early-return (add=0, dbl=0) before computing
dbl, totalBitExtractions, numWindows or add.
In `@constantine/math/matrix/toeplitz.nim`:
- Around line 344-360: The precomputed overload `finish` is missing the layout
guard used in the raw-points path; add the same compile-time assertion (e.g.
`static doAssert sizeof(F) == sizeof(F.getBigInt())`) near the start of `proc
finish*[EC, ECaff, F; N: static int](ctx: var ToeplitzAccumulator[EC, ECaff, F],
...)` to ensure `ctx.scratchScalars` can be safely reinterpreted via `cast[ptr
UncheckedArray[F.getBigInt()]](ctx.scratchScalars)` and prevent UB for
incompatible `F` types.
In `@tests/commitments/t_kzg_multiproofs.nim`:
- Around line 831-910: The testFK20PrecompMSM currently only exercises one (t,b)
pair; make it table-driven by defining a small list/array of representative
(t,b) pairs (e.g. [(64,8),(32,4)] or similar) and loop over them, re-running the
precompBank init, kzg_coset_prove, comparisons and verifications for each pair;
update uses of the local constants t and b (and any derived values) inside the
loop so precompBank.init(polyphaseBasis[pos], t = t, b = b), precompProofs, and
verification calls are executed per (t,b) configuration and assert success for
each configuration within testFK20PrecompMSM.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 75893922-7362-46fe-8705-d97cda1dad3d
⛔ Files ignored due to path filters (7)
papers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 1 - Naive algorithm binary decomposition.pngis excluded by!**/*.pngpapers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 2 - Vertical blocks.pngis excluded by!**/*.pngpapers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 3 - Arbitrary blocks.pngis excluded by!**/*.pngpapers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 4 - Horizontal blocks.pngis excluded by!**/*.pngpapers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 5 - Pippenger precomputation.pngis excluded by!**/*.pngpapers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 6 - Horizontal variant vertical blocks.pngis excluded by!**/*.pngpapers/MSM/images/H24 - Verkle Trees - Another iteration of VKTs MSM/H24 - Fig 1 - Gottfried algorithm visualization.pngis excluded by!**/*.png
📒 Files selected for processing (53)
benchmarks/bench_ec_msm_bls12_381_g1.nimbenchmarks/bench_ec_msm_precomp_bandersnatch.nimbenchmarks/bench_ec_msm_precomp_bls12_381_g1.nimbenchmarks/bench_elliptic_parallel_template.nimbenchmarks/bench_elliptic_template.nimbenchmarks/bench_eth_eip4844_kzg.nimbenchmarks/bench_eth_eip7594_peerdas.nimbenchmarks/bench_eth_evm_precompiles.nimbenchmarks/bench_kzg_multiproofs.nimbenchmarks/bench_matrix_toeplitz.nimbenchmarks/eth_eip7594/benchset_generation.nimbenchmarks/eth_eip7594/perf_compute_cells.nimbenchmarks/eth_eip7594/perf_compute_cells_and_kzg_proofs.nimbenchmarks/eth_eip7594/perf_recover_cells_and_kzg_proofs.nimbenchmarks/eth_eip7594/perf_verify_cell_kzg_proof_batch.nimconstantine-go/constantine.goconstantine-rust/constantine-ethereum-kzg/src/lib.rsconstantine-rust/constantine-sys/src/bindings32.rsconstantine-rust/constantine-sys/src/bindings64.rsconstantine.nimbleconstantine/commitments/kzg_multiproofs.nimconstantine/commitments_setups/ethereum_kzg_srs.nimconstantine/eth_eip7594_peerdas.nimconstantine/ethereum_eip4844_kzg.nimconstantine/ethereum_evm_precompiles.nimconstantine/math/arithmetic/limbs_crandall.nimconstantine/math/arithmetic/limbs_montgomery.nimconstantine/math/elliptic/ec_multi_scalar_mul_precomp.nimconstantine/math/elliptic/ec_scalar_mul.nimconstantine/math/elliptic/ec_scalar_mul_vartime.nimconstantine/math/elliptic/ec_shortweierstrass_batch_ops.nimconstantine/math/elliptic/ec_twistededwards_batch_ops.nimconstantine/math/extension_fields/exponentiations.nimconstantine/math/matrix/toeplitz.nimconstantine/math_arbitrary_precision/arithmetic/limbs_montgomery.nimconstantine/math_compiler/impl_fields_globals.nimconstantine/named/deriv/precompute.nimconstantine/named/zoo_subgroups.niminclude/constantine/protocols/ethereum_eip4844_kzg.hmetering/m_kzg_multiproofs.nimpapers/MSM/H23 - Notes on MSMs with Precomputation.mdpapers/MSM/H24 - Verkle Trees - Another iteration of VKTs MSM.mdtests/commitments/t_kzg_multiproofs.nimtests/eth_eip7594_peerdas/t_cells_and_kzg_proofs_opt.nimtests/eth_eip7594_peerdas/t_compute_cells_opt.nimtests/eth_eip7594_peerdas/t_peerdas_recovery.nimtests/math_elliptic_curves/t_ec_multi_scalar_mul_precomp.nimtests/math_matrix/t_toeplitz.nimtests/t_eth_eip7594_peerdas.nimtests/t_ethereum_eip4844_deneb_kzg.nimtests/t_ethereum_eip4844_deneb_kzg_parallel.nimtests/t_ethereum_evm_precompiles.nimtests/testutils/eth_consensus_utils.nim
There was a problem hiding this comment.
Code Review
This pull request introduces precomputed Multi-Scalar Multiplication (MSM) tables to optimize FK20 proofs for PeerDAS, significantly improving performance for KZG operations. The changes include a new module for precomputed MSM, updates to the KZG context management (replacing trusted_setup_load with new and new_with_precompute), and updates to benchmarks and tests to support these optimizations. I have identified two critical issues in the new precomputation logic: potential processing of uninitialized memory in precomputeWindow and a type safety issue with the scalars parameter in msm_vartime.
| scratchspace[0].setNeutral() | ||
|
|
||
| var currentSize = 1 | ||
| for basisIdx in 0 ..< basisSlice.len: | ||
| let prevSize = currentSize | ||
| currentSize = min(currentSize * 2, windowSize) | ||
| for s in 0..<prevSize: | ||
| if (s + prevSize) < windowSize: | ||
| scratchspace[s + prevSize].sum_vartime(scratchspace[s], basisSlice[basisIdx]) | ||
|
|
||
| tableSlice.batchAffine_vartime(scratchspace) |
There was a problem hiding this comment.
In precomputeWindow, if basisSlice.len < b (which happens for the last window if the total number of points is not a multiple of b), only the first 2^basisSlice.len elements of scratchspace are initialized. However, batchAffine_vartime is called on the entire scratchspace of size windowSize (2^b). This results in processing uninitialized memory, which could lead to undefined behavior or crashes during the affine conversion if the garbage data represents invalid group elements. The conversion should be restricted to the initialized portion of the buffer.
let numPoints = 1 shl basisSlice.len
tableSlice.toOpenArray(0, numPoints - 1).batchAffine_vartime(scratchspace.toOpenArray(0, numPoints - 1))
| func msm_vartime*[EC; N]( | ||
| ctx: PrecomputedMSM[EC, N], | ||
| r: var EC, | ||
| scalars: openArray[BigInt] |
There was a problem hiding this comment.
The scalars parameter is typed as openArray[BigInt]. Since BigInt is a generic type in Constantine (e.g., BigInt[bits: static int]), this might lead to compilation errors or unexpected behavior if the compiler cannot infer the static bit length. It is safer to make the function generic over the bit length of the scalars.
scalars: openArray[BigInt[FrBits]]
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
benchmarks/bench_elliptic_parallel_template.nim (1)
111-169:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReturn body timing here, not wall time around
bench(...).
benchalready measures the benchmark body and then prints the result. Timing around the wholebench(...)call makesperfNaive/perfMSM*include formatting and stdout overhead, soreportMSMParallel()is no longer comparing just MSM runtime. This skews the small-Nspeedups the most.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/bench_elliptic_parallel_template.nim` around lines 111 - 169, The timings are taken around the bench(...) calls which include printing/formatting overhead; inside proc msmParallelBench* you should measure only the benchmark body and store those times into result.perf* — move the getMonotime()/inNanoseconds timing into the closures passed to bench (e.g. around the loop/body used for naive scalarMul, scalarMul_vartime, multiScalarMul_reference_vartime, multiScalarMul_vartime and tp.multiScalarMul_vartime_parallel) and assign result.perfNaive, result.perfMSMbaseline, result.perfMSMopt, result.perfMSMpara from those internal measurements (per-iteration) instead of using the outer startNaive/startMSMbaseline/startMSMopt/startMSMpara and stop* variables; remove the outer wall-clock timing around bench calls so perf* reflects only the MSM work.
♻️ Duplicate comments (2)
papers/MSM/H23 - Notes on MSMs with Precomputation.md (1)
20-20:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix remaining grammar/typo slips in the explanatory text.
There are a few small wording errors that hurt readability in an otherwise strong note.
Proposed edit
- so we can view $a_i$'s as have this many bits. + so we can view $a_i$'s as having this many bits. - The algorithm has two parameter, namely the blocksize $b$ and the number of rows $s$ where we perform precomputation. + The algorithm has two parameters, namely the block size $b$ and the number of rows $s$ where we perform precomputation. - Here, the situation is somewhat different, as 252 doublings would be very significant relative to the total cost. For this part of of the MSMs, we can use a much larger $s$, e.g. $s=64$ or $s=128$ (giving $t=4$ or $t=2$, which means 3 or 1 doublings). + Here, the situation is different, as 252 doublings would be very significant relative to the total cost. For this part of the MSMs, we can use a much larger $s$, e.g. $s=64$ or $s=128$ (giving $t=4$ or $t=2$, which means 3 or 1 doublings).Also applies to: 188-188, 192-192
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@papers/MSM/H23` - Notes on MSMs with Precomputation.md at line 20, The sentence "so we can view $a_i$'s as have this many bits." is grammatically incorrect; change it to "so we can view $a_i$'s as having this many bits." and scan the nearby explanatory text for similar slips (the occurrences noted around the same paragraph as well as the other instances referenced) to fix subject-verb agreement and awkward phrasing so the explanations read smoothly and consistently.benchmarks/bench_eth_eip7594_peerdas.nim (1)
188-231:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDiscriminant-mutation pattern still violates
polyphaseSpectrumBankcase-object semantics.
runNoPrecompflipskind = kNoPrecompute(line 190) andrunPrecompwrites intoprecompPoints[pos].init(...)(line 204) before restoringkind = kPrecompute(line 205). On the firstrunPrecompinvocation afterrunNoPrecomp, the writes at line 204 land on the inactive branch of the case object, which is undefined under Nim's case-object rules. On subsequentrunPrecompcalls inside thePrecompConfigsloop,kindis alreadykPrecompute, so the writes silently overwrite the previous configuration'sPrecomputedMSMtables without invoking=destroy, leaking the earlier allocation each iteration. Per snippet 4 (ethereum_kzg_srs.nim:448-457),deleteonly frees the currently-active tables, so all but the last config's tables are leaked, and the inactive-branch write before the first kind flip risks ORC/ARC clobbering the freshly initialized data when the discriminant assignment zero-initializes the new branch.Either build a fresh
ctxper(t, b)(preferred — also avoids cross-config state bleed into the verification/recovery benchmarks that follow), or stage a separatearray[CELLS_PER_EXT_BLOB, PrecomputedMSM[...]]buffer, fully populate it under the existing discriminant, and then atomically swap it in. Avoid touchingkinddirectly across iterations.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/bench_eth_eip7594_peerdas.nim` around lines 188 - 231, The benchmark flips polyphaseSpectrumBank.kind and writes into precompPoints while the discriminant may be the inactive branch, causing undefined writes and leaking prior PrecomputedMSM tables; fix by avoiding in-place discriminant mutation across configs — either recreate a fresh ctx for each runPrecomp invocation (preferred) so each (t,b) uses its own polyphaseSpectrumBank, or allocate a temporary array[CELLS_PER_EXT_BLOB, PrecomputedMSM...] buffer, fully init it while the existing kind remains unchanged, then atomically swap that buffer into ctx.polyphaseSpectrumBank and set kind = kPrecompute; ensure you do not write into precompPoints[] before setting the matching kind and that previous active tables are properly deleted via the existing delete path.
🧹 Nitpick comments (1)
include/constantine/protocols/ethereum_eip4844_kzg.h (1)
205-238: ⚡ Quick winDocument the valid range and failure behavior for
tandb.The tradeoff table is useful, but the public API contract still leaves callers guessing whether zero, negative, or oversized values are rejected, clamped, or undefined. Since these parameters directly control potentially very large allocations, please spell out the supported domain and what status is returned for invalid inputs.
Suggested doc tweak
/** Create a new KZG context with precomputed MSM tables. * Same as ctt_eth_kzg_context_new but also builds PrecomputedMSM lookup * tables for FK20 proofs (PeerDAS). * - * `@param` t base groups (stride between precomputed layers) - * `@param` b bits per window (window size = 2^b) + * `@param` t base groups (stride between precomputed layers). Must be positive + * and within the implementation-supported range. + * `@param` b bits per window (window size = 2^b). Must be positive and within + * the implementation-supported range. + * Invalid values return an error status.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@include/constantine/protocols/ethereum_eip4844_kzg.h` around lines 205 - 238, Update the docblock for ctt_eth_kzg_context_new_with_precompute to explicitly state the valid domains and failure semantics for parameters t and b (e.g., t must be positive power-of-two or within [min_t,max_t], b must be non-negative and <= max_b, or whatever the implementation accepts), document what happens for zero/negative/oversized values (rejected with a specific ctt_eth_trusted_setup_status error such as CTT_ETH_TRUSTED_SETUP_INVALID_PARAMETER or CTT_ETH_TRUSTED_SETUP_OOM), and note that oversized b may cause large allocations and will return the appropriate error status rather than undefined behavior; reference the returned ctt_eth_trusted_setup_status values callers should expect on invalid inputs or allocation failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benchmarks/bench_elliptic_parallel_template.nim`:
- Around line 194-227: benchPrecompMSMInline currently ignores its ctx and
builds a fresh PrecompBenchContext, producing a non-comparable baseline; change
it to reuse the caller's dataset by initializing or populating benchCtx with
ctx.points[0 ..< N] and ctx.coefs[0 ..< N] (respecting N, t, b) before timing
the precomputed MSM. Concretely, instead of new(PrecompBenchContext[EC, N], seed
= 42'u64, t = t, b = b) create/populate benchCtx so its precomp and scalars are
built from ctx.points[0 ..< N] and ctx.coefs[0 ..< N] (or call the existing
PrecompBenchContext initializer/method that accepts these slices), then run
benchCtx.precomp.msm_vartime(resultEC, benchCtx.scalars) as before so
reportPrecompSpeedup compares like-for-like with
ctx.msmParallelBench/perf.perfMSMOpt.
In `@benchmarks/bench_elliptic_template.nim`:
- Around line 475-478: In doBench, the scaled iteration expression can evaluate
to 0 when N is large; ensure the count passed to ctx.benchPrecompMSM is clamped
to at least 1 by computing a scaledIters = max(1, iters div max(1, N div 10))
and passing scaledIters to ctx.benchPrecompMSM; apply the same clamping change
to the other precomp invocation sites (the other occurrences around the
precomp/reference benchmark code) so benchPrecompMSM never receives 0 and
downstream division by benchIters cannot fail.
- Around line 386-391: The reference MSM workload consumes RNG outputs in the
opposite order to PrecompBenchContext.new, causing mismatched inputs; update the
reference MSM blocks so they generate ctx.scalars first using
ctx.rng.random_unsafe(BigInt[bits]) and then generate ctx.basisJac with
ctx.rng.random_unsafe(EC) followed by ctx.basisJac[i].clearCofactor(), matching
the order used by PrecompBenchContext.new; apply this same change to both
occurrences where the reference data is regenerated.
In `@constantine-go/constantine.go`:
- Around line 105-114: EthKzgContextNewWithPrecompute currently casts Go ints t
and b directly to C.int which can truncate or accept negative values; validate
that t and b are within the 32-bit signed range and non-negative (e.g., 0 <= t,b
<= math.MaxInt32) and also enforce any sane upper bounds you expect for
precompute sizes before converting to C.int, returning an error if they are out
of range; then only after validation perform the C.int conversion for the call
to ctt_eth_kzg_context_new_with_precompute and proceed to use ctx.cCtx as
before.
- Around line 128-129: The Delete method currently has a value receiver which
frees the C pointer on a copy; change its receiver to a pointer receiver (func
(ctx *EthKzgContext) Delete()) so the original EthKzgContext.cCtx is freed, and
after calling C.ctt_eth_kzg_context_delete(ctx.cCtx) set ctx.cCtx to nil to
avoid use-after-free or double-free on subsequent calls; locate the method by
the symbol names EthKzgContext, Delete, cCtx and the C call
C.ctt_eth_kzg_context_delete.
In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 403-415: The function load_from_file allocates ctx via
alloc0HeapAligned before calling ctx.load_ckzg4844; if load_ckzg4844 returns a
failure (e.g., tsInvalidFile) the function currently returns without freeing the
allocated ctx, leaking memory. Fix by freeing and nulling ctx on any non-success
status: after let status = ctx.load_ckzg4844(f, int(t), int(b)) check if status
!= tsSuccess then free the allocated context (use the matching deallocation
function for alloc0HeapAligned, e.g., dealloc or free), set ctx = nil, ensure
fileio.close(f) still runs, and then return status; otherwise proceed to return
the success status as before.
---
Outside diff comments:
In `@benchmarks/bench_elliptic_parallel_template.nim`:
- Around line 111-169: The timings are taken around the bench(...) calls which
include printing/formatting overhead; inside proc msmParallelBench* you should
measure only the benchmark body and store those times into result.perf* — move
the getMonotime()/inNanoseconds timing into the closures passed to bench (e.g.
around the loop/body used for naive scalarMul, scalarMul_vartime,
multiScalarMul_reference_vartime, multiScalarMul_vartime and
tp.multiScalarMul_vartime_parallel) and assign result.perfNaive,
result.perfMSMbaseline, result.perfMSMopt, result.perfMSMpara from those
internal measurements (per-iteration) instead of using the outer
startNaive/startMSMbaseline/startMSMopt/startMSMpara and stop* variables; remove
the outer wall-clock timing around bench calls so perf* reflects only the MSM
work.
---
Duplicate comments:
In `@benchmarks/bench_eth_eip7594_peerdas.nim`:
- Around line 188-231: The benchmark flips polyphaseSpectrumBank.kind and writes
into precompPoints while the discriminant may be the inactive branch, causing
undefined writes and leaking prior PrecomputedMSM tables; fix by avoiding
in-place discriminant mutation across configs — either recreate a fresh ctx for
each runPrecomp invocation (preferred) so each (t,b) uses its own
polyphaseSpectrumBank, or allocate a temporary array[CELLS_PER_EXT_BLOB,
PrecomputedMSM...] buffer, fully init it while the existing kind remains
unchanged, then atomically swap that buffer into ctx.polyphaseSpectrumBank and
set kind = kPrecompute; ensure you do not write into precompPoints[] before
setting the matching kind and that previous active tables are properly deleted
via the existing delete path.
In `@papers/MSM/H23` - Notes on MSMs with Precomputation.md:
- Line 20: The sentence "so we can view $a_i$'s as have this many bits." is
grammatically incorrect; change it to "so we can view $a_i$'s as having this
many bits." and scan the nearby explanatory text for similar slips (the
occurrences noted around the same paragraph as well as the other instances
referenced) to fix subject-verb agreement and awkward phrasing so the
explanations read smoothly and consistently.
---
Nitpick comments:
In `@include/constantine/protocols/ethereum_eip4844_kzg.h`:
- Around line 205-238: Update the docblock for
ctt_eth_kzg_context_new_with_precompute to explicitly state the valid domains
and failure semantics for parameters t and b (e.g., t must be positive
power-of-two or within [min_t,max_t], b must be non-negative and <= max_b, or
whatever the implementation accepts), document what happens for
zero/negative/oversized values (rejected with a specific
ctt_eth_trusted_setup_status error such as
CTT_ETH_TRUSTED_SETUP_INVALID_PARAMETER or CTT_ETH_TRUSTED_SETUP_OOM), and note
that oversized b may cause large allocations and will return the appropriate
error status rather than undefined behavior; reference the returned
ctt_eth_trusted_setup_status values callers should expect on invalid inputs or
allocation failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 73119252-bcee-4759-96d7-09c8c5f33df4
📒 Files selected for processing (20)
benchmarks/bench_ec_msm_bandersnatch.nimbenchmarks/bench_ec_msm_bls12_381_g1.nimbenchmarks/bench_ec_msm_bls12_381_g2.nimbenchmarks/bench_ec_msm_bn254_snarks_g1.nimbenchmarks/bench_ec_msm_pasta.nimbenchmarks/bench_elliptic_parallel_template.nimbenchmarks/bench_elliptic_template.nimbenchmarks/bench_eth_eip7594_peerdas.nimconstantine-go/constantine.goconstantine-go/eth_kzg7594_peerdas_test.goconstantine-rust/constantine-ethereum-kzg/src/lib.rsconstantine-rust/constantine-ethereum-kzg/tests/t_eth_kzg7594.rsconstantine/commitments_setups/ethereum_kzg_srs.nimconstantine/math/elliptic/ec_multi_scalar_mul_precomp.niminclude/constantine/protocols/ethereum_eip4844_kzg.hpapers/MSM/H23 - Notes on MSMs with Precomputation.mdtests/eth_eip7594_peerdas/t_cells_and_kzg_proofs_opt.nimtests/eth_eip7594_peerdas/t_compute_cells_opt.nimtests/t_eth_eip7594_peerdas.nimtests/testutils/eth_consensus_utils.nim
🚧 Files skipped from review as they are similar to previous changes (1)
- constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim
| func EthKzgContextNewWithPrecompute(trustedSetupFile string, t, b int) (ctx EthKzgContext, err error) { | ||
| cFile := C.CString(trustedSetupFile) | ||
| defer C.free(unsafe.Pointer(cFile)) | ||
| status := C.ctt_eth_kzg_context_new_with_precompute( | ||
| &ctx.cCtx, | ||
| cFile, | ||
| C.cttEthTSFormat_ckzg4844, | ||
| C.int(t), | ||
| C.int(b), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the file
fd -t f "constantine.go" --path "*constantine-go*"Repository: mratsim/constantine
Length of output: 1945
🏁 Script executed:
# Read the specific function mentioned (lines 105-114)
sed -n '100,125p' constantine-go/constantine.goRepository: mratsim/constantine
Length of output: 1945
🏁 Script executed:
# Search for other similar constructors to understand validation patterns
rg "Context.*New" constantine-go/ -A 10 -B 2Repository: mratsim/constantine
Length of output: 17317
🏁 Script executed:
# Check for any existing validation of int parameters in the codebase
rg "C\.int" constantine-go/ -B 2 -A 2 | head -60Repository: mratsim/constantine
Length of output: 617
🏁 Script executed:
# Look for documentation on t and b parameters
rg "precompute|parameter|knob" constantine-go/ -i -B 1 -A 1 | head -40Repository: mratsim/constantine
Length of output: 3151
Validate t and b parameters before converting to C.int to prevent truncation and invalid allocation.
The function directly casts Go int parameters to C.int without validation. On 64-bit systems, Go int exceeds C's 32-bit signed integer range, and truncation happens silently during the cast. Negative values also pass through unchecked. Since this constructor allocates large precomputed tables (documented as MiB to GiB scale), validation in Go prevents invalid states from reaching the C allocator.
Suggested fix
func EthKzgContextNewWithPrecompute(trustedSetupFile string, t, b int) (ctx EthKzgContext, err error) {
+ const maxCInt = int(^uint32(0) >> 1)
+ if t <= 0 || t > maxCInt {
+ return ctx, errors.New("EthKzgContextNewWithPrecompute: t must be > 0 and fit in C.int")
+ }
+ if b <= 0 || b > maxCInt {
+ return ctx, errors.New("EthKzgContextNewWithPrecompute: b must be > 0 and fit in C.int")
+ }
+
cFile := C.CString(trustedSetupFile)
defer C.free(unsafe.Pointer(cFile))
status := C.ctt_eth_kzg_context_new_with_precompute(
&ctx.cCtx,
cFile,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@constantine-go/constantine.go` around lines 105 - 114,
EthKzgContextNewWithPrecompute currently casts Go ints t and b directly to C.int
which can truncate or accept negative values; validate that t and b are within
the 32-bit signed range and non-negative (e.g., 0 <= t,b <= math.MaxInt32) and
also enforce any sane upper bounds you expect for precompute sizes before
converting to C.int, returning an error if they are out of range; then only
after validation perform the C.int conversion for the call to
ctt_eth_kzg_context_new_with_precompute and proceed to use ctx.cCtx as before.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
constantine/commitments_setups/ethereum_kzg_srs.nim (1)
411-423:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMemory leak on
tsInvalidFilestill present.The previous review marked this as addressed, but
load_from_filestill allocatesctxbefore parsing (line 419) and returnsstatuswithout freeing on failure. Downstream wrappers (e.g.constantine-rust/constantine-ethereum-kzg/src/lib.rs) drop the raw pointer onErr, so every failed load leaks theEthereumKZGContextbuffer (~5 MiB+ once the SRS arrays are accounted for).Additionally, the
t = 64, b = 12parameters at line 411 are no longer used inside the body after the refactor —load_ckzg4844(f)is now called without them, and PeerDAS setup is performed by the callers innew*/new_with_precompute*. They can be dropped.🛡️ Proposed fix
-proc load_from_file(ctx: var ptr EthereumKZGContext, filepath: cstring, format: TrustedSetupFormat, t = 64, b = 12): TrustedSetupStatus = +proc load_from_file(ctx: var ptr EthereumKZGContext, filepath: cstring, format: TrustedSetupFormat): TrustedSetupStatus = ## Load from a trusted setup file. var f: File let ok = f.open(filepath, kRead) if not ok: return tsMissingOrInaccessibleFile - ctx = alloc0HeapAligned(EthereumKZGContext, alignment = 64) + let tmp = alloc0HeapAligned(EthereumKZGContext, alignment = 64) - let status = ctx.load_ckzg4844(f) + let status = tmp.load_ckzg4844(f) fileio.close(f) - return status + if status != tsSuccess: + freeHeapAligned(tmp) + ctx = nil + else: + ctx = tmp + return status🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@constantine/commitments_setups/ethereum_kzg_srs.nim` around lines 411 - 423, The function load_from_file currently allocates ctx before calling ctx.load_ckzg4844(f), which leaks when load_ckzg4844 returns an error (e.g., tsInvalidFile); change the control flow so either (A) defer allocation of the EthereumKZGContext until after load_ckzg4844 succeeds, or (B) if you must allocate before parsing, free/dealloc the allocated ctx immediately when status != tsOK before returning; also remove the unused parameters t and b from the load_from_file signature since they are not used (they were removed from load_ckzg4844 and PeerDAS is handled by callers), and ensure references to load_from_file and its callers are updated accordingly.benchmarks/bench_elliptic_template.nim (1)
486-501:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReference MSM still uses a different RNG consumption order than
PrecompBenchContext.new.
PrecompBenchContext.new(lines 386–393) consumes the RNG asscalars → basisJac, but the reference block here generatesbasisJac2 → scalars2. With the same seed they still produce different inputs, so precomp rows and reference rows aren't comparable side-by-side.Same applies to the clamping concern below:
iters div max(1, N div 10)on lines 478 and 501 evaluates to0whenN > 10 * iters, which makesbenchPrecompMSM(0)a no-op and causes division-by-zero in the reference path's(stop-start) div benchIters.🐛 Proposed fix
- proc doBench(_: typedesc[EC], N, t, b: static int, iters: int) {.noInline.} = - # Wrap in a proc to ensure destruction of the large context - let ctx = new(PrecompBenchContext[EC, N], seed = 42'u64, t = t, b = b) - ctx.benchPrecompMSM(iters div max(1, N div 10)) + proc doBench(_: typedesc[EC], N, t, b: static int, iters: int) {.noInline.} = + # Wrap in a proc to ensure destruction of the large context + let ctx = new(PrecompBenchContext[EC, N], seed = 42'u64, t = t, b = b) + let scaledIters = max(1, iters div max(1, N div 10)) + ctx.benchPrecompMSM(scaledIters) @@ - for i in 0 ..< N: - basisJac2[i] = rng2.random_unsafe(EC) - basisJac2[i].clearCofactor() - - basis2.asUnchecked().batchAffine_vartime(basisJac2.asUnchecked(), N) - - for i in 0 ..< N: - scalars2[i] = rng2.random_unsafe(BigInt[bits]) - - let benchIters = iters div max(1, N div 10) + for i in 0 ..< N: + scalars2[i] = rng2.random_unsafe(BigInt[bits]) + + for i in 0 ..< N: + basisJac2[i] = rng2.random_unsafe(EC) + basisJac2[i].clearCofactor() + + basis2.asUnchecked().batchAffine_vartime(basisJac2.asUnchecked(), N) + + let benchIters = max(1, iters div max(1, N div 10))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/bench_elliptic_template.nim` around lines 486 - 501, The reference MSM block must consume the RNG in the same order as PrecompBenchContext.new (scalars → basisJac) and ensure benchIters never becomes zero; change the seq generation to fill scalars2 from rng2 before filling basisJac2 (i.e., move the scalars2 loop before the basisJac2 loop or swap the two loops that assign rng2.random_unsafe to scalars2 and basisJac2) so it matches PrecompBenchContext.new, and replace the benchIters computation with a safe version that forces a minimum of 1 (e.g., benchIters = max(1, iters div max(1, N div 10))) to avoid zero iterations and division-by-zero when calling benchPrecompMSM or computing timings.
🧹 Nitpick comments (2)
constantine/commitments_setups/ethereum_kzg_srs.nim (2)
350-377: ⚖️ Poor tradeoffInconsistent treatment of
=destroyon the bank betweensetupPolyphaseSpectrumBankanddelete.
delete(lines 462–464) deliberately avoids calling\=destroy`(ctx.polyphaseSpectrumBank)because the destructor was observed to raise/escaperaises: [], and instead destroys eachprecompPoints[i]individually. YetsetupPolyphaseSpectrumBankat line 357 calls exactly that destructor unconditionally inside the same{.push raises: [], checks: off.}region. If the original concern was real, this call site has the same problem; if it was not, the workaround indelete` is unnecessary.Please either:
- replicate the per-element destruction loop here as well, or
- if
\=destroy`(bank)is in fact safe underraises: [], remove the workaround indelete` (just call the destructor) and keep this site consistent.Also worth noting: directly mutating
ctx.polyphaseSpectrumBank.kind(lines 367, 374) after=destroyis an unusual pattern. Constructing a fresh value viactx.polyphaseSpectrumBank = PolyphaseSpectrumBank(kind: kPrecompute)(orkNoPrecompute) makes the discriminator transition explicit and lets the runtime handle the union reset.Also applies to: 461-473
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@constantine/commitments_setups/ethereum_kzg_srs.nim` around lines 350 - 377, The setupPolyphaseSpectrumBank currently calls `=destroy(ctx.polyphaseSpectrumBank)` while `delete` avoids that call and instead destroys each `precompPoints[i]` individually; make these two sites consistent: either replace the `=destroy(ctx.polyphaseSpectrumBank)` in `proc setupPolyphaseSpectrumBank` with the same per-element destruction loop used in `delete` (iterate `for i in 0 ..< CELLS_PER_EXT_BLOB: ctx.polyphaseSpectrumBank.precompPoints[i].destroy()` or equivalent), or change `delete` to simply call `=destroy(ctx.polyphaseSpectrumBank)` like setup does. Additionally, instead of mutating `ctx.polyphaseSpectrumBank.kind` after destruction, reinitialize the whole union explicitly (e.g., `ctx.polyphaseSpectrumBank = PolyphaseSpectrumBank(kind: kPrecompute)` or `kNoPrecompute`) so the discriminator transition is handled safely by the runtime; apply the same pattern in both `setupPolyphaseSpectrumBank` and `delete`.
431-459: ⚡ Quick winAdd explicit conversions for
cint→intparameters to match codebase patterns.At line 459, the
cintvariablestandbare passed directly tosetupKzg7594PeerDAS(ctx, t, b: int). While Nim may permit this implicitly, the codebase precedent (e.g., threadpool.nim:997 usingcast[cint](numThreads)) establishes explicit type conversions for allcint↔intinteractions. Use explicit conversion for clarity and consistency:result = ctx.load_from_file(filepath, format) if result == tsSuccess: ctx.setupKzg4844ProtoDanksharding() ctx.setupKzg7594PeerDAS(int(t), int(b))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@constantine/commitments_setups/ethereum_kzg_srs.nim` around lines 431 - 459, The call in new_with_precompute passes C-int parameters t and b directly to setupKzg7594PeerDAS; follow the repo pattern and convert them explicitly to Nim int before calling (e.g. cast or int() on t and b) so update the invocation of ctx.setupKzg7594PeerDAS to pass converted values (reference symbols: new_with_precompute, ctx.setupKzg7594PeerDAS, parameters t and b).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benchmarks/bench_elliptic_parallel_template.nim`:
- Around line 166-169: The perf assignments use uninitialized MonoTime when the
naive/baseline branches are skipped for numInputs > 100000; either wrap the
result.perfNaive and result.perfMSMbaseline assignments with the same numInputs
<= 100000 guard used when setting startNaive/stopNaive and
startMSMbaseline/stopMSMbaseline, or initialize startNaive/stopNaive and
startMSMbaseline/stopMSMbaseline to a known value (or set
result.perfNaive/result.perfMSMbaseline to 0.0) when those branches are skipped
so the computed perf values are deterministic; locate the assignments to
result.perfNaive, result.perfMSMbaseline and the variables startNaive,
stopNaive, startMSMbaseline, stopMSMbaseline to apply the guard or
initialization.
In `@benchmarks/bench_eth_eip7594_peerdas.nim`:
- Around line 463-467: The recovery benchmarks reuse the same
"recover_cells_and_kzg_proofs" labels making output ambiguous; update the bench
labeling to include the precompute config (e.g., append "precompute t=256 b=8"
or "no precompute") so both runs are distinct. Specifically, when calling
ctx.setupPolyphaseSpectrumBank(t = 256, b = 8) before
benchRecoverCellsAndKZGProofs_WorstCase and
benchRecoverCellsAndKZGProofs_VaryingAvailability, pass or construct a unique
label into those bench functions (or into the internal
recover_cells_and_kzg_proofs label generator) that includes the t and b values
or an explicit "precompute" suffix so console output and downstream parsers can
distinguish the two runs.
In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 219-232: The inline comment in polyphaseSpectrumBank
(constantine/commitments_setups/ethereum_kzg_srs.nim) incorrectly states ~4.33
MiB per table × 128 ≈ 554 MB for the t=64,b=12 default; reconcile it with the
actual calculation used by msmPrecompSize and the benchmark in
new_with_precompute by updating the comment to reflect the correct per-table and
total memory (≈8.25 MiB per table → ~1056 MiB total) or, if the benchmark is
wrong, correct the benchmark math; locate references to pointsPerColumn,
expandedBasisLen, numWindows, windowSize and msmPrecompSize and ensure the
textual numbers match the computed values (pointsPerColumn = ceil(255/64)=4,
expandedBasisLen=256, numWindows=ceil(256/12)=22, windowSize=4096, total affine
points ≈22*4096 and bytes ≈ affine_points*~96) so the comment and
new_with_precompute benchmark agree.
In `@constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim`:
- Around line 142-212: The init procedure for PrecomputedMSM leaks a previous
ctx.table when called twice; modify init (the init*[EC; N] routine) to check if
ctx.table is non-nil at the start and, if so, free the existing allocation
(using freeHeapAligned or the same allocator used) and reset
ctx.tableLen/ctx.t/ctx.b as appropriate before proceeding to allocate a new
ctx.table with allocHeapArrayAligned; ensure this cleanup also handles the
kNoPrecompute (t==0 or b==0) path so repeated init calls do not leak memory.
In `@scripts/parse_peerdas_benchmarks.py`:
- Around line 279-283: The final note currently hardcodes "~34% speedup" via the
lines.append call that adds "Precompute=8 (c-kzg-4844) trades 96 MiB memory for
~34% speedup in FK20 operations"; change this to compute the FK20 speedup from
the parsed benchmark rows (use the FK20 timings you already parse earlier in the
script) and format the computed percentage into the appended string, or if those
values aren't available, remove the numeric claim and leave a generic sentence;
update the lines.append call that contains the hardcoded note accordingly so the
report always reflects derived data rather than a fixed value.
---
Duplicate comments:
In `@benchmarks/bench_elliptic_template.nim`:
- Around line 486-501: The reference MSM block must consume the RNG in the same
order as PrecompBenchContext.new (scalars → basisJac) and ensure benchIters
never becomes zero; change the seq generation to fill scalars2 from rng2 before
filling basisJac2 (i.e., move the scalars2 loop before the basisJac2 loop or
swap the two loops that assign rng2.random_unsafe to scalars2 and basisJac2) so
it matches PrecompBenchContext.new, and replace the benchIters computation with
a safe version that forces a minimum of 1 (e.g., benchIters = max(1, iters div
max(1, N div 10))) to avoid zero iterations and division-by-zero when calling
benchPrecompMSM or computing timings.
In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 411-423: The function load_from_file currently allocates ctx
before calling ctx.load_ckzg4844(f), which leaks when load_ckzg4844 returns an
error (e.g., tsInvalidFile); change the control flow so either (A) defer
allocation of the EthereumKZGContext until after load_ckzg4844 succeeds, or (B)
if you must allocate before parsing, free/dealloc the allocated ctx immediately
when status != tsOK before returning; also remove the unused parameters t and b
from the load_from_file signature since they are not used (they were removed
from load_ckzg4844 and PeerDAS is handled by callers), and ensure references to
load_from_file and its callers are updated accordingly.
---
Nitpick comments:
In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 350-377: The setupPolyphaseSpectrumBank currently calls
`=destroy(ctx.polyphaseSpectrumBank)` while `delete` avoids that call and
instead destroys each `precompPoints[i]` individually; make these two sites
consistent: either replace the `=destroy(ctx.polyphaseSpectrumBank)` in `proc
setupPolyphaseSpectrumBank` with the same per-element destruction loop used in
`delete` (iterate `for i in 0 ..< CELLS_PER_EXT_BLOB:
ctx.polyphaseSpectrumBank.precompPoints[i].destroy()` or equivalent), or change
`delete` to simply call `=destroy(ctx.polyphaseSpectrumBank)` like setup does.
Additionally, instead of mutating `ctx.polyphaseSpectrumBank.kind` after
destruction, reinitialize the whole union explicitly (e.g.,
`ctx.polyphaseSpectrumBank = PolyphaseSpectrumBank(kind: kPrecompute)` or
`kNoPrecompute`) so the discriminator transition is handled safely by the
runtime; apply the same pattern in both `setupPolyphaseSpectrumBank` and
`delete`.
- Around line 431-459: The call in new_with_precompute passes C-int parameters t
and b directly to setupKzg7594PeerDAS; follow the repo pattern and convert them
explicitly to Nim int before calling (e.g. cast or int() on t and b) so update
the invocation of ctx.setupKzg7594PeerDAS to pass converted values (reference
symbols: new_with_precompute, ctx.setupKzg7594PeerDAS, parameters t and b).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 30be442e-76e9-4fb4-ac17-df21dafa9af0
📒 Files selected for processing (11)
benchmarks/bench_elliptic_parallel_template.nimbenchmarks/bench_elliptic_template.nimbenchmarks/bench_eth_eip7594_peerdas.nimbenchmarks/bench_gmp_modexp.nimbenchmarks/bench_gt_parallel_template.nimbenchmarks/zkalc.nimconstantine/commitments_setups/ethereum_kzg_srs.nimconstantine/math/elliptic/ec_multi_scalar_mul_precomp.nimconstantine/math/elliptic/ec_shortweierstrass_batch_ops.nimconstantine/math/elliptic/ec_twistededwards_batch_ops.nimscripts/parse_peerdas_benchmarks.py
| func init*[EC; N]( | ||
| ctx: var PrecomputedMSM[EC, N], | ||
| basis: openArray[affine(EC)], | ||
| t, b: int) = | ||
| ## Build the precomputed lookup table for `basis` points. | ||
| ## | ||
| ## **Preconditions** | ||
| ## - `basis.len == N` (checked at runtime via `doAssert`) | ||
| ## - `t >= 1` and `b >= 1` (checked at runtime via `doAssert`) | ||
| ## - Passing `(0, 0)` skips table construction (kNoPrecompute mode). | ||
| ## | ||
| ## **Important:** `init` assumes the caller provides *well-formed* basis points: | ||
| ## - Each point MUST lie on the curve | ||
| ## - Each point MUST be in the correct prime-order subgroup | ||
| ## - Each point MUST NOT be the neutral element (infinity) | ||
| ## | ||
| ## No on-curve or subgroup validation is performed. Passing invalid points | ||
| ## silently produces meaningless tables. This is safe in practice because | ||
| ## callers that parse external data (e.g. SRS files) validate during parsing, | ||
| ## and hardcoded/test bases are known-correct. | ||
|
|
||
| doAssert basis.len == N | ||
|
|
||
| # (0, 0) means no precompute | ||
| if t == 0 or b == 0: | ||
| ctx.t = 0 | ||
| ctx.b = 0 | ||
| return | ||
|
|
||
| ctx.t = t | ||
| ctx.b = b | ||
|
|
||
| const FrBits = EC.getScalarField().bits() | ||
| let pointsPerColumn = (FrBits + t - 1) div t | ||
| let expandedBasisLen = N * pointsPerColumn | ||
| let expandedBasis = allocHeapArrayAligned(EC, expandedBasisLen, alignment = 64) | ||
|
|
||
| # Fill expanded basis: P, P^(2^t), P^(2^(2t)), ... for each basis point | ||
| var idx = 0 | ||
| for i in 0 ..< N: | ||
| expandedBasis[idx].fromAffine(basis[i]) | ||
| idx += 1 | ||
| # Expand: P_j = P_0^(2^(j*t)) via doubling chain | ||
| for j in 1 ..< pointsPerColumn: | ||
| expandedBasis[idx].double(expandedBasis[idx - 1]) | ||
| for _ in 1 ..< t: | ||
| expandedBasis[idx].double() | ||
| idx += 1 | ||
|
|
||
| let numWindows = (expandedBasisLen + b - 1) div b | ||
| let windowSize = 1 shl b | ||
| let totalTableSize = msmPrecompSize(EC, N, t, b) | ||
|
|
||
| ctx.table = allocHeapArrayAligned(affine(EC), totalTableSize, alignment = 64) | ||
| ctx.tableLen = totalTableSize | ||
|
|
||
| let scratchspace = allocHeapArrayAligned(EC, windowSize, alignment = 64) | ||
|
|
||
| for windowIdx in 0 ..< numWindows: | ||
| let startIdx = windowIdx * b | ||
| let endIdx = min(startIdx + b, expandedBasisLen) | ||
|
|
||
| precomputeWindow( | ||
| ctx.table.toOpenArray(windowIdx*windowSize, windowIdx*windowSize + windowSize - 1), | ||
| expandedBasis.toOpenArray(startIdx, endIdx - 1), | ||
| scratchspace.toOpenArray(windowSize), | ||
| b | ||
| ) | ||
|
|
||
| freeHeapAligned(scratchspace) | ||
| freeHeapAligned(expandedBasis) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
init leaks the previous ctx.table if called twice.
init allocates ctx.table = allocHeapArrayAligned(...) at line 195 unconditionally. If a caller invokes init on a PrecomputedMSM that already has a non-nil table (e.g. to switch (t, b) parameters), the previous table allocation is leaked. The current in-tree call site (setupPolyphaseSpectrumBank) works around this by =destroy-ing the bank first, but init is exported and should be safe on its own.
Add a defensive cleanup at the top of init:
🛡️ Proposed fix
doAssert basis.len == N
+ # Free any pre-existing table to make init idempotent / re-usable.
+ if ctx.table != nil:
+ freeHeapAligned(cast[pointer](ctx.table))
+ ctx.table = nil
+ ctx.tableLen = 0
+
# (0, 0) means no precompute
if t == 0 or b == 0:
ctx.t = 0
ctx.b = 0
return🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim` around lines 142 -
212, The init procedure for PrecomputedMSM leaks a previous ctx.table when
called twice; modify init (the init*[EC; N] routine) to check if ctx.table is
non-nil at the start and, if so, free the existing allocation (using
freeHeapAligned or the same allocator used) and reset ctx.tableLen/ctx.t/ctx.b
as appropriate before proceeding to allocate a new ctx.table with
allocHeapArrayAligned; ensure this cleanup also handles the kNoPrecompute (t==0
or b==0) path so repeated init calls do not leak memory.
| lines.append("- \u00b9 Recovery: c-kzg-4844 uses precompute=8 (96 MiB); constantine uses t=256, b=8 (24 MiB)") | ||
| lines.append("- \u00b2 c-kzg-4844 verifies 8192 cells (64 blobs); constantine matches this config") | ||
| lines.append("- \u0394% shows constantine relative to c-kzg-4844 (negative = faster)") | ||
| lines.append("- c-kzg-4844 precompute levels and constantine (t, b) configs are not directly comparable") | ||
| lines.append("- Precompute=8 (c-kzg-4844) trades 96 MiB memory for ~34% speedup in FK20 operations") |
There was a problem hiding this comment.
Don't hardcode the FK20 speedup note.
The ~34% speedup statement is fixed text, so rerunning this script on different benchmark inputs can generate a report whose note is wrong even when the table above is correct. Derive that figure from the parsed rows or drop the numeric claim.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/parse_peerdas_benchmarks.py` around lines 279 - 283, The final note
currently hardcodes "~34% speedup" via the lines.append call that adds
"Precompute=8 (c-kzg-4844) trades 96 MiB memory for ~34% speedup in FK20
operations"; change this to compute the FK20 speedup from the parsed benchmark
rows (use the FK20 timings you already parse earlier in the script) and format
the computed percentage into the appended string, or if those values aren't
available, remove the numeric claim and leave a generic sentence; update the
lines.append call that contains the hardcoded note accordingly so the report
always reflects derived data rather than a fixed value.
There was a problem hiding this comment.
A significant part of c-kzg-4844 backend is constant-time and invariant to inputs.
The part that is variable-time, is that their scalar-mul can get passed a bit-length, either 255-bit for fully constant-time operation (blst_fr is 255-bit) or measured if variable-time is OK, typically for FFT since roots of unity can be really small or even 1.
for the MSM though all inputs are cryptographic in nature and so indistinguishable from random. Hence I would be very surprised if the 34% speedup changes from random input to random input.
There was a problem hiding this comment.
@mratsim, that's a fair and well-reasoned point. Since all MSM inputs are cryptographic scalars (indistinguishable from random), the bit-length measurement path in their variable-time scalar-mul doesn't apply here — the measured ~34% figure is effectively stable across runs. I'll withdraw the suggestion.
🐰 ~ curious ears, learning new tricks ~
(\_/)
(•ᴗ•)
/> 🔬
✏️ Learnings added
Learnt from: mratsim
Repo: mratsim/constantine PR: 619
File: scripts/parse_peerdas_benchmarks.py:279-283
Timestamp: 2026-05-06T08:27:20.477Z
Learning: In `scripts/parse_peerdas_benchmarks.py` (constantine repo), the hardcoded `~34% speedup` note for c-kzg-4844 precompute=8 FK20 operations is intentional and stable. c-kzg-4844's backend is largely constant-time; the variable-time scalar-mul (bit-length measurement) only applies to FFT roots of unity, not MSM. MSM inputs are cryptographic scalars (indistinguishable from random), so the speedup ratio does not vary across benchmark runs.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
f9a2d2c to
e90f1cc
Compare
This adds Herold-Hagopian precomputed MSM, with configurable runtime parameters for RAM vs perf tradeoff.
99.9% of the time for compute_cells_and_kzg_proofs is spent in FK20 proofs
Notes:
Screenshots of raw output
vs c-kzg-4844 with precompute=8 (see bench #617)
it's 120ms in C vs 88ms in Nim, a 36% speedup
Summary by CodeRabbit