[PeerDAS] Precomputed MSMs - up to 1.62x acceleration by mratsim · Pull Request #619 · mratsim/constantine

mratsim · 2026-05-05T15:28:14Z

This adds Herold-Hagopian precomputed MSM, with configurable runtime parameters for RAM vs perf tradeoff.
99.9% of the time for compute_cells_and_kzg_proofs is spent in FK20 proofs

Benchmark	Precompute	c-kzg-4844 (serial)	constantine (serial)	Δ%
blob_to_kzg_commitment	—	29.857 ms	19.556 ms	-34.5%
compute_kzg_proof	—	31.482 ms	20.235 ms	-35.7%
compute_blob_kzg_proof	—	31.691 ms	19.858 ms	-37.3%
verify_kzg_proof	—	0.802 ms	0.568 ms	-29.1%
verify_blob_kzg_proof	—	1.196 ms	0.955 ms	-20.2%
verify_blob_kzg_proof_batch 1	—	1.203 ms	1.044 ms	-13.2%
verify_blob_kzg_proof_batch 2	—	2.017 ms	1.608 ms	-20.3%
verify_blob_kzg_proof_batch 4	—	3.600 ms	2.760 ms	-23.3%
verify_blob_kzg_proof_batch 8	—	6.637 ms	4.967 ms	-25.2%
verify_blob_kzg_proof_batch 16	—	13.056 ms	9.205 ms	-29.5%
verify_blob_kzg_proof_batch 32	—	25.704 ms	17.765 ms	-30.9%
verify_blob_kzg_proof_batch 64	—	51.174 ms	34.736 ms	-32.1%
precompute_load (L0)	—	1163.746 ms	—	—
PeerDAS (EIP-7594)
compute_cells	—	1.932 ms	1.020 ms	-47.2%
compute_cells_and_kzg_proofs	no precomp	183.384 ms	141.489 ms	-22.8%
compute_cells_and_kzg_proofs	ckzg precomp=1, 768 KiB	608.452 ms	—	—
compute_cells_and_kzg_proofs	ckzg precomp=2, 1536 KiB	315.771 ms	—	—
compute_cells_and_kzg_proofs	ckzg precomp=3, 3 MiB	226.319 ms	—	—
compute_cells_and_kzg_proofs	ckzg precomp=4, 6 MiB	182.381 ms	—	—
compute_cells_and_kzg_proofs	ckzg precomp=5, 12 MiB	156.495 ms	—	—
compute_cells_and_kzg_proofs	ckzg precomp=6, 24 MiB	138.634 ms	—	—
compute_cells_and_kzg_proofs	ckzg precomp=7, 48 MiB	129.472 ms	—	—
compute_cells_and_kzg_proofs	ckzg precomp=8, 96 MiB	120.608 ms	—	—
compute_cells_and_kzg_proofs	ctt t=64, b=6, 32.2 MiB	—	115.359 ms	—
compute_cells_and_kzg_proofs	ctt t=64, b=8, 96.0 MiB	—	105.269 ms	—
compute_cells_and_kzg_proofs	ctt t=64, b=10, 312.0 MiB	—	95.802 ms	—
compute_cells_and_kzg_proofs	ctt t=64, b=12, 1056.0 MiB	—	87.591 ms	—
compute_cells_and_kzg_proofs	ctt t=128, b=6, 16.5 MiB	—	114.424 ms	—
compute_cells_and_kzg_proofs	ctt t=128, b=8, 48.0 MiB	—	101.874 ms	—
compute_cells_and_kzg_proofs	ctt t=128, b=10, 156.0 MiB	—	95.423 ms	—
compute_cells_and_kzg_proofs	ctt t=128, b=12, 528.0 MiB	—	88.957 ms	—
compute_cells_and_kzg_proofs	ctt t=256, b=6, 8.2 MiB	—	117.055 ms	—
compute_cells_and_kzg_proofs	ctt t=256, b=8, 24.0 MiB	—	98.698 ms	—
compute_cells_and_kzg_proofs	ctt t=256, b=10, 84.0 MiB	—	97.307 ms	—
compute_cells_and_kzg_proofs	ctt t=256, b=12, 288.0 MiB	—	93.098 ms	—
recover_cells_and_kzg_proofs¹	see ¹	137.245 ms	97.272 ms	-29.1%
verify_cell_kzg_proof_batch²	—	439.979 ms	381.058 ms	-13.4%

Notes:

¹ Recovery: c-kzg-4844 uses precompute=8 (96 MiB); constantine uses t=256, b=8 (24 MiB)
² c-kzg-4844 verifies 8192 cells (64 blobs); constantine matches this config
Δ% shows constantine relative to c-kzg-4844 (negative = faster)
c-kzg-4844 precompute levels and constantine (t, b) configs are not directly comparable
Precompute=8 (c-kzg-4844) trades 96 MiB memory for ~34% speedup in FK20 operations

Screenshots of raw output

vs c-kzg-4844 with precompute=8 (see bench #617)

it's 120ms in C vs 88ms in Nim, a 36% speedup

Summary by CodeRabbit

New Features
- Optional precomputed MSM support with configurable windowing and new KZG context constructors that can create contexts with precomputation.
Benchmarks
- New precomp MSM benchmark drivers and enhanced reporting that compares precomputed vs non-precomputed paths and parallel variants.
Tests
- New tests validating precomputed MSM correctness; test suites now run both with and without precompute.
Documentation
- Added comprehensive MSM research and Verkle-tree/MSM analysis.
Tooling
- Improved benchmark parsing/reporting script and small timing/reporting fixes.

coderabbitai · 2026-05-05T15:28:29Z

Warning

Rate limit exceeded

@mratsim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 1 second before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3ceff025-1869-4205-9959-8446f595add1

📥 Commits

Reviewing files that changed from the base of the PR and between 58ff170 and 6590341.

📒 Files selected for processing (11)

benchmarks/bench_elliptic_parallel_template.nim
benchmarks/bench_elliptic_template.nim
benchmarks/bench_eth_eip7594_peerdas.nim
constantine-go/constantine.go
constantine-rust/constantine-sys/src/bindings32.rs
constantine-rust/constantine-sys/src/bindings64.rs
constantine/commitments_setups/ethereum_kzg_srs.nim
constantine/math/elliptic/ec_shortweierstrass_batch_ops.nim
constantine/math/elliptic/ec_twistededwards_batch_ops.nim
papers/MSM/H23 - Notes on MSMs with Precomputation.md
tests/math_elliptic_curves/t_ec_multi_scalar_mul_precomp.nim

📝 Walkthrough

Walkthrough

Reworks KZG trusted-setup into a context with optional MSM precomputation, adds a PrecomputedMSM implementation and benchmarks, refactors the Toeplitz/FK20 pipeline to accept either raw affine spectra or precomputed tables, updates many callsites/tests/FFI bindings to the new context API, and applies related elliptic API and assertion-message changes.

Changes

Precomputed MSM, KZG context & FK20 pipeline

Layer / File(s)	Summary
Data Shape & Types `constantine/commitments_setups/ethereum_kzg_srs.nim`, `constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim`	Adds `PolyphaseSpectrumBank` (kNoPrecompute/kPrecompute), `PrecomputedMSM[EC,N]` (affine table, t,b params) and changes `EthereumKZGContext.polyphaseSpectrumBank` to hold either `rawPoints` or `precompPoints`.
Core Implementation `constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim`, `constantine/math/matrix/toeplitz.nim`	Implements PrecomputedMSM.init/msm_vartime, msmPrecompSize/EstimateOps; redefines ToeplitzAccumulator to store FFT coeffs, accumulate only circulant FFTs, and have finish read external polyphase spectrum (affine array or PrecomputedMSM).
FK20 / Proof Integration `constantine/commitments/kzg_multiproofs.nim`, `constantine/eth_eip7594_peerdas.nim`	Introduces `kzg_coset_prove_impl` template and updates kzg_coset_prove callsites to branch on `polyphaseSpectrumBank.kind`, passing raw affine arrays or PrecomputedMSM tables accordingly.
FFI / Bindings & API `include/constantine/protocols/ethereum_eip4844_kzg.h`, `constantine-go/constantine.go`, `constantine-rust/.../lib.rs`, `constantine-rust/.../bindings*.rs`	Replaces trusted-setup loader/deleter with context API: `ctt_eth_kzg_context_new`, `ctt_eth_kzg_context_new_with_precompute`, `ctt_eth_kzg_context_delete`; adds Go/Rust wrappers/builders for precompute constructor and updates destructor calls.
Benchmarks / Drivers `benchmarks/bench_elliptic_template.nim`, `benchmarks/bench_elliptic_parallel_template.nim`, `benchmarks/bench_ec_msm_precomp_*.nim`, `benchmarks/bench_ec_msm_bls12_381_g1.nim`	Adds PrecompBenchContext, benchPrecompMSM/benchPrecompMSMTable/runPrecompMSMBench; new benchmark drivers (BLS12-381 G1, Bandersnatch); benchmarks capture and report MsmParallelBenchResult and precomp speedups.
Tests `tests/commitments/t_kzg_multiproofs.nim`, `tests/math_elliptic_curves/t_ec_multi_scalar_mul_precomp.nim`, various tests under `tests/eth_eip7594_peerdas` and `tests/t_*`	Adds test validating PrecomputedMSM results vs reference MSM, updates many tests to initialize/destroy contexts via new/new_with_precompute/delete, and parameterizes some test suites across no-precompute and precompute modes.
Docs / Packaging `papers/MSM/H23...md`, `papers/MSM/H24...md`, `constantine.nimble`	Adds MSM design/benchmark docs and updates nimble test/bench lists and tasks for the new precomp benchmarks.

Elliptic API ergonomics & assertion standardization

Layer / File(s)	Summary
API / Signatures `constantine/math/elliptic/ec_shortweierstrass_batch_ops.nim`, `constantine/math/elliptic/ec_twistededwards_batch_ops.nim`	Replaces fixed-size `array[N,...]` batchAffine / batchAffine_vartime overloads with `openArray` variants that assert length equality at runtime and forward to pointer implementations.
Scalar-mul Control Flow `constantine/math/elliptic/ec_scalar_mul_vartime.nim`	Refactors `scalarMul_jy00_vartime` to precompute both `paff` and `npaff` and defer initial doubling until after first non-zero recoded digit, changing initialization/accumulation flow.
Assertion Messages & Debug Placement multiple files under `constantine/math/`, `constantine/named/`	Standardizes internal assertion error strings to `"[ctt] Internal error: ..."`, moves some checks under `debug:` blocks; no functional changes otherwise.
Misc. Type Tightening `constantine/named/zoo_subgroups.nim`	Restricts exported `clearCofactor` generic to `EC_ShortW

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant CKZG as EthereumKZGContext
    participant Toeplitz as ToeplitzAccumulator
    participant MSM as PrecomputedMSM
    participant Proof as kzg_coset_prove

    Caller->>CKZG: new(filepath, t, b)
    CKZG->>CKZG: load_from_file(filepath, t, b)
    CKZG->>CKZG: computePolyphaseDecompositionFourier()
    CKZG->>MSM: init(rawPoints, t, b) × N
    CKZG-->>Caller: ctx initialized (kind=kPrecompute)

    Caller->>Toeplitz: init(L)
    Caller->>Toeplitz: accumulate(circulant)
    Toeplitz->>Toeplitz: FFT circulant -> coeffs

    Caller->>Proof: kzg_coset_prove_impl(request)
    alt kind == kPrecompute
        Proof->>MSM: msm_vartime(scalars)
        MSM-->>Proof: weighted sum (precomp)
    else kind == kNoPrecompute
        Proof->>Proof: multiScalarMul_vartime(scalars, rawPoints)
    end

    Proof->>Toeplitz: finish(output, polyphaseSpectrumBank)
    Toeplitz-->>Proof: ifft(result)
    Proof-->>Caller: FK20 proofs

    Caller->>CKZG: delete()
    CKZG->>MSM: destroy() × N (if precomputed)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

mratsim/constantine#616: Related changes to the Toeplitz/FFT/MSM pipeline and FK20 flow.
mratsim/constantine#613: Overlapping work on precomputed MSM, KZG context lifecycle, and PeerDAS FK20 integration.
mratsim/constantine#609: Related FFT/polynomial ordering and bit-reversal API changes.

Poem

🐇 I built my tables, windows neat and spry,

Scalars hop through precomp rows like pie,
Toeplitz hums its FFT tune so clear,
FK20 proofs now spring without a fear,
Hop, hop—MSM speedup, carrots near!

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch precomp-MSM-runtime-rebase

greptile-apps · 2026-05-05T15:33:57Z

Greptile Summary

This PR introduces Herold-Hagopian precomputed MSMs for FK20 multi-proofs in PeerDAS, achieving a ~1.62x speedup (145 ms to 88 ms per blob) by replacing per-proof Pippenger MSMs with precomputed subset-sum lookup tables parameterised by t (stride) and b (window bits). The context API is restructured around a new PolyphaseSpectrumBank variant type and exposes a new_with_precompute(t, b) constructor alongside a no-precompute default.

New PrecomputedMSM[EC, N]: builds subset-sum tables for each expanded basis point via binary-tree precomputation; msm_vartime iterates t layers with one doubling per layer (Horner/Herold evaluation order).
ToeplitzAccumulator refactored: the internal points buffer is removed; polyphase points are now passed directly to finish, with two overloads (raw-affine array and PrecomputedMSM) dispatched via template kzg_coset_prove_impl.
C/Go/Rust bindings updated: ctt_eth_trusted_setup_load/delete renamed to ctt_eth_kzg_context_new/delete; ctt_eth_kzg_context_new_with_precompute added.

Confidence Score: 4/5

The core algorithm and dispatch logic are correct; the one memory-safety gap in the new PrecomputedMSM type is easy to close before merge.

PrecomputedMSM owns a heap-allocated table pointer via freeHeapAligned in =destroy, but no =copy prohibition is defined. Any accidental by-value copy will produce two objects sharing the same raw pointer and trigger a double-free on destruction. The fix is a one-liner identical to what ToeplitzAccumulator already uses. All other changes (Toeplitz refactor, dispatch logic, bindings) look correct and well-tested.

constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim needs the =copy prohibition; constantine/commitments_setups/ethereum_kzg_srs.nim is worth a second look for the rawPoints memory overhead.

Important Files Changed

Filename	Overview
constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim	New file: Herold-Hagopian precomputed MSM with `PrecomputedMSM[EC, N]` type; custom `=destroy` but missing `=copy` prohibition creates a double-free risk on accidental copy.
constantine/math/matrix/toeplitz.nim	Removed internal `points` buffer from `ToeplitzAccumulator`; caller-supplied `polyphaseSpectrumBank` is now passed to `finish`, with two overloads (affine array and `PrecomputedMSM`). Logic is sound.
constantine/commitments_setups/ethereum_kzg_srs.nim	Adds `PolyphaseSpectrumBank` variant type and wires up precomputed MSM tables; `rawPoints` is always populated even in `kPrecompute` mode, wasting ~1.18 MiB per context.
constantine/commitments/kzg_multiproofs.nim	Refactored `kzg_coset_prove` into a template + two `proc` overloads (affine vs PrecomputedMSM); `accumulate` signature simplified; logic preserved correctly.
constantine/eth_eip7594_peerdas.nim	Adds `case` dispatch on `polyphaseSpectrumBank.kind` to route FK20 proof computation through either raw-affine or precomputed-MSM overload at both call sites.
include/constantine/protocols/ethereum_eip4844_kzg.h	C header: renames `ctt_eth_trusted_setup_load`/`delete` to `ctt_eth_kzg_context_new`/`delete` and adds `ctt_eth_kzg_context_new_with_precompute`; breaking rename has no backward-compat shims.
constantine-go/constantine.go	Go bindings updated to use renamed C symbols and exposes new `EthKzgContextNewWithPrecompute` helper with t/b parameters.
constantine-rust/constantine-ethereum-kzg/src/lib.rs	Rust bindings updated to use renamed symbols; `load_trusted_setup_with_precompute` builder method added with correct error handling.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant SRS as ethereum_kzg_srs
    participant PSB as PolyphaseSpectrumBank
    participant KZG as kzg_multiproofs
    participant Toeplitz as ToeplitzAccumulator
    participant PMSM as PrecomputedMSM

    Caller->>SRS: ctt_eth_kzg_context_new_with_precompute(t, b)
    SRS->>SRS: computePolyphaseDecompositionFourier
    SRS->>PSB: fill rawPoints (always)
    SRS->>PMSM: init(points[pos], t, b) for each pos
    Note over PMSM: Builds subset-sum table 2^b entries/window

    Caller->>KZG: kzg_coset_prove(..., PSB.precompPoints)
    loop offset in 0..L-1
        KZG->>Toeplitz: accumulate(circulant FFT)
    end
    KZG->>Toeplitz: finish(output, polyphaseSpectrumBank)
    loop i in 0..size-1
        Toeplitz->>PMSM: msm_vartime(output[i], scalars)
        loop t_i in 0..t-1
            Note over PMSM: double if t_i > 0, lookup table
        end
    end
    Toeplitz->>Toeplitz: ec_ifft_nn(output)
    KZG->>Caller: proofs (affine, bit-reversed)

_{Reviews (1): Last reviewed commit: "Runtime Precomp param" | Re-trigger Greptile}

greptile-apps · 2026-05-05T15:34:01Z

+proc `=destroy`*[EC; N](ctx: var PrecomputedMSM[EC, N]) {.raises: [].} =
+  if ctx.table != nil:
+    freeHeapAligned(cast[pointer](ctx.table))
+    ctx.table = nil
+  ctx.tableLen = 0
+  ctx.t = 0
+  ctx.b = 0


PrecomputedMSM defines a custom =destroy that calls freeHeapAligned on ctx.table, but does not define =copy. In Nim with ARC, the compiler will generate a bitwise (shallow) copy for any assignment or pass-by-value, producing two objects that share the same raw table pointer. When both are destroyed, freeHeapAligned is called twice on the same address — a double-free. The sibling type ToeplitzAccumulator already has {.error.} on =copy; PrecomputedMSM needs the same guard.

Suggested change

proc `=destroy`*[EC; N](ctx: var PrecomputedMSM[EC, N]) {.raises: [].} =

if ctx.table != nil:

freeHeapAligned(cast[pointer](ctx.table))

ctx.table = nil

ctx.tableLen = 0

ctx.t = 0

ctx.b = 0

proc `=destroy`*[EC; N](ctx: var PrecomputedMSM[EC, N]) {.raises: [].} =

if ctx.table != nil:

freeHeapAligned(cast[pointer](ctx.table))

ctx.table = nil

ctx.tableLen = 0

ctx.t = 0

ctx.b = 0

proc `=copy`*[EC; N](dst: var PrecomputedMSM[EC, N], src: PrecomputedMSM[EC, N]) {.error.}

greptile-apps · 2026-05-05T15:34:02Z

+    case kind*: PolyphaseKind
+    of kNoPrecompute:
+      discard
+    of kPrecompute:
+      precompPoints* {.align: 64.}: array[CELLS_PER_EXT_BLOB, PrecomputedMSM[EC_ShortW_Jac[Fp[BLS12_381], G1], FIELD_ELEMENTS_PER_CELL]]
+
 # On the number of 𝔾2 points:
 #   - In the Deneb specs, https://github.com/ethereum/consensus-specs/blob/v1.3.0/specs/deneb/polynomial-commitments.md


rawPoints is unconditionally populated in both modes

rawPoints (~1.18 MiB: 128 × 64 × 144 B) lives in the non-variant part of PolyphaseSpectrumBank and is always filled during load_ckzg4844. When kPrecompute is active the field is never read — every call to kzg_coset_prove dispatches to the precompPoints overload — so this allocation is pure waste. For a default (t=64, b=12) context the tables already consume ~554 MiB; adding a silent extra MiB undermines the memory-budget documentation. Moving rawPoints into the kNoPrecompute branch of the variant would keep both paths honest.

greptile-apps · 2026-05-05T15:34:03Z


-/** Load trusted setup from path
- *  Currently the only format supported `cttEthTSFormat_ckzg4844`
- *  is from the reference implementation c-kzg-4844 text file
+/** Create a new KZG context from trusted setup file.
+ *  Loads SRS, computes polyphase decomposition as raw affine points,
+ *  and sets the context to kNoPrecompute mode (~1.8 MiB).
 */
-ctt_eth_trusted_setup_status ctt_eth_trusted_setup_load(
+ctt_eth_trusted_setup_status ctt_eth_kzg_context_new(
    ctt_eth_kzg_context** ctx,
    const char* filepath,
    ctt_eth_trusted_setup_format format
-) __attribute__((__warn_unused_result__));
+    ) __attribute__((__warn_unused_result__));

-/** Destroy a trusted setup
+/** Create a new KZG context with precomputed MSM tables.
+ *  Same as ctt_eth_kzg_context_new but also builds PrecomputedMSM lookup
+ *  tables for FK20 proofs (PeerDAS).
+ *
+ *  @param t  base groups (stride between precomputed layers)
+ *  @param b  bits per window (window size = 2^b)
+ *
+ *  SPEED / MEMORY TRADEOFF (Intel i7-265K, FK20 proofs = 128 MSMs per blob):
+ *  - no precompute: ~145 ms/blob, ~1.8 MiB
+ *  - t=64,b=8:     ~109 ms/blob, ~101 MiB per MSM (~12.8 GiB total)
+ *  - t=64,b=12:     ~89 ms/blob, ~8.7 MiB per MSM (~1.1 GiB total)
+ *  - t=128,b=8:     ~105 ms/blob, ~50 MiB per MSM (~6.4 GiB total)
+ *  - t=128,b=12:    ~92 ms/blob, ~4.3 MiB per MSM (~0.6 GiB total)
+ *  - t=256,b=8:     ~105 ms/blob, ~25 MiB per MSM (~3.2 GiB total)
+ *
+ *  Larger b = faster per MSM but exponentially more memory (2^b entries).
+ *  Larger t = fewer doublings but more precomputed layers.
+ *  Default (t=64, b=12): ~89 ms/blob proving, ~1.1 GiB total memory.
 */
-void ctt_eth_trusted_setup_delete(ctt_eth_kzg_context* ctx);
+ctt_eth_trusted_setup_status ctt_eth_kzg_context_new_with_precompute(
+    ctt_eth_kzg_context** ctx,
+    const char* filepath,
+    ctt_eth_trusted_setup_format format,
+    int t,
+    int b
+    ) __attribute__((__warn_unused_result__));

+/** Destroy a KZG context
+ */
+void ctt_eth_kzg_context_delete(ctt_eth_kzg_context* ctx);



Breaking C API rename without backward-compat shims

ctt_eth_trusted_setup_load and ctt_eth_trusted_setup_delete are completely removed and replaced by ctt_eth_kzg_context_new / ctt_eth_kzg_context_delete. Any downstream C consumer that linked against the previous header will get a link error with no migration path. The Go and Rust bindings are updated in this PR, but external C users are silently broken. Consider keeping the old names as deprecated thin wrappers (or at minimum a clearly called-out breaking change note in the changelog / migration guide) so existing consumers can adapt incrementally.

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (5)

benchmarks/bench_elliptic_template.nim (1)
361-371: 💤 Low value

Likely typo on the export marker for t.

t, b*: int exports only b; t is private. Both t and b are part of the same configuration tuple and are referenced together everywhere the context is used (fmt"t={ctx.t}, b={ctx.b}", msmPrecompEstimateOps(EC, N, ctx.t, ctx.b)). Either both should be exported or neither — the current asymmetry is most likely an accidental missing *.
♻️ Suggested fix
-    t, b*: int
+    t*, b*: int
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/bench_elliptic_template.nim` around lines 361 - 371, The struct
PrecompBenchContext has an export typo: the line "t, b*: int" only exports b,
leaving t unexported; update the field declaration in PrecompBenchContext so
both fields are exported (e.g., mark t with a '*' like b) so that ctx.t and
ctx.b are both public and usable in callers such as fmt"t={ctx.t}, b={ctx.b}"
and msmPrecompEstimateOps.
constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim (1)
55-70: 💤 Low value

Defensive guard: msmPrecompEstimateOps divides by b without the same t==0 or b==0 short-circuit as msmPrecompSize.

msmPrecompSize (Line 75) returns 0 for the no-precompute mode (t=0, b=0), but msmPrecompEstimateOps will divide by zero on Line 65 if b == 0 and produce a negative dbl if t == 0. If callers ever pass the no-precompute sentinel through, this asserts/UBs. Mirror the guard for consistency.
🛡️ Suggested fix
 func msmPrecompEstimateOps*[EC](_: typedesc[EC]; N, t, b: int): tuple[add, dbl: int] =
   ## Returns (additions, doublings) for one MSM execution
+  if t == 0 or b == 0:
+    return (0, 0)
   const bits = EC.getScalarField().bits()

   # Doublings: one per t iteration (except first)
   let dbl = t - 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim` around lines 55 -
70, The function msmPrecompEstimateOps should mirror msmPrecompSize's sentinel
handling: if t == 0 or b == 0 return (0, 0) immediately to avoid dividing by
zero and producing a negative dbl; update msmPrecompEstimateOps (the func named
msmPrecompEstimateOps*) to check for the no-precompute case up-front and
early-return (add=0, dbl=0) before computing dbl, totalBitExtractions,
numWindows or add.
constantine/math/matrix/toeplitz.nim (1)
344-360: ⚡ Quick win

Keep the scratchScalars layout guard in the precomputed overload too.

This overload performs the same F → F.getBigInt() reinterpret cast as the raw-points path, but it drops the static doAssert sizeof(F) == sizeof(F.getBigInt()). Without the same guard, a future field type can make the precomputed path compile into UB while the non-precomputed path still fails fast at compile time.
Suggested change
 proc finish*[EC, ECaff, F; N: static int](
   ctx: var ToeplitzAccumulator[EC, ECaff, F],
   output: var openArray[EC],
   polyphaseSpectrumBank: openArray[PrecomputedMSM[EC, N]]
 ): ToeplitzStatus {.raises: [], meter.} =
@@
   let n = ctx.size
   if n == 0 or output.len != n or ctx.offset != ctx.L or polyphaseSpectrumBank.len != n or N != ctx.L:
     return Toeplitz_MismatchedSizes
+  static: doAssert sizeof(F) == sizeof(F.getBigInt()), "scratchScalars cast requires sizeof(F) == sizeof(F.getBigInt())"
   let scalars = cast[ptr UncheckedArray[F.getBigInt()]](ctx.scratchScalars)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@constantine/math/matrix/toeplitz.nim` around lines 344 - 360, The precomputed
overload `finish` is missing the layout guard used in the raw-points path; add
the same compile-time assertion (e.g. `static doAssert sizeof(F) ==
sizeof(F.getBigInt())`) near the start of `proc finish*[EC, ECaff, F; N: static
int](ctx: var ToeplitzAccumulator[EC, ECaff, F], ...)` to ensure
`ctx.scratchScalars` can be safely reinterpreted via `cast[ptr
UncheckedArray[F.getBigInt()]](ctx.scratchScalars)` and prevent UB for
incompatible `F` types.
tests/commitments/t_kzg_multiproofs.nim (1)
831-910: ⚡ Quick win

Cover more than one precompute configuration here.

This new test only exercises the runtime-tunable path for (t=64, b=8), so a regression in the parameter plumbing can still pass as long as that single point works. Please make this table-driven over at least a couple of representative (t, b) pairs so the new configurable API is actually covered.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/commitments/t_kzg_multiproofs.nim` around lines 831 - 910, The
testFK20PrecompMSM currently only exercises one (t,b) pair; make it table-driven
by defining a small list/array of representative (t,b) pairs (e.g.
[(64,8),(32,4)] or similar) and loop over them, re-running the precompBank init,
kzg_coset_prove, comparisons and verifications for each pair; update uses of the
local constants t and b (and any derived values) inside the loop so
precompBank.init(polyphaseBasis[pos], t = t, b = b), precompProofs, and
verification calls are executed per (t,b) configuration and assert success for
each configuration within testFK20PrecompMSM.
constantine/commitments_setups/ethereum_kzg_srs.nim (1)
374-381: 💤 Low value

Optional: reuse rawPoints[pos] instead of rebuilding points.

After lines 370-372, ctx.polyphaseSpectrumBank.rawPoints[pos] already contains exactly the same data that the inner loop is copying into the points local. Since PrecomputedMSM.init accepts basis: openArray[affine(EC)] (read-only), you can pass rawPoints[pos] directly and avoid the redundant per-position array copy:
♻️ Proposed simplification
     # Build PrecomputedMSM tables if t > 0 and b > 0.
     if t > 0 and b > 0:
       ctx.polyphaseSpectrumBank.kind = kPrecompute
       for pos in 0 ..< CELLS_PER_EXT_BLOB:
-        var points {.noInit.}: array[FIELD_ELEMENTS_PER_CELL, BLS12_381_G1_Aff]
-        for offset in 0 ..< FIELD_ELEMENTS_PER_CELL:
-          points[offset] = tmp[offset][pos]
-        ctx.polyphaseSpectrumBank.precompPoints[pos].init(points, t = t, b = b)
+        ctx.polyphaseSpectrumBank.precompPoints[pos].init(
+          ctx.polyphaseSpectrumBank.rawPoints[pos], t = t, b = b)
     else:
       ctx.polyphaseSpectrumBank.kind = kNoPrecompute
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim` around lines 374 - 381,
Replace the per-position rebuild of a temporary array with the already-populated
rawPoints entry: inside the block that sets ctx.polyphaseSpectrumBank.kind =
kPrecompute, stop creating and filling the local var points from tmp; instead
call ctx.polyphaseSpectrumBank.precompPoints[pos].init with
ctx.polyphaseSpectrumBank.rawPoints[pos] (which already contains
FIELD_ELEMENTS_PER_CELL BLS12_381_G1_Aff values matching what points would
contain) and pass the same t and b; update references to tmp/points only for
positions where rawPoints[pos] is not available.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/bench_elliptic_template.nim`:
- Line 400: The field ctx.precompMemMiB is computed using / 1e6 which gives
decimal megabytes, but the name and output use MiB (mebibytes); change the
divisor to 1024*1024 so ctx.precompMemMiB = float64(msmPrecompSize(EC, N, t, b)
* sizeof(affine(EC))) / (1024*1024) to report true MiB.
- Around line 410-448: The code computes cycles using startClk and stopClk which
are only declared inside when SupportsGetTicks branches (startClk and stopClk in
the msm benchmark loop), causing a compile error when SupportsGetTicks is false;
move the cycles computation into the when SupportsGetTicks: branch (i.e.,
compute cycles = (stopClk - startClk) div iters) so it is only referenced when
startClk/stopClk exist and keep the existing when/else echo branches unchanged.

In `@benchmarks/bench_eth_eip7594_peerdas.nim`:
- Around line 184-209: The runPrecomp benchmark (proc runPrecomp) is never
executed so the precompute result is missing; either invoke runPrecomp() after
runNoPrecomp() or remove the unused runPrecomp proc to avoid dead code; locate
the procs runNoPrecomp and runPrecomp in this file and add a call to
runPrecomp() (or delete the runPrecomp definition) so the precompute benchmark
actually runs and emits its output.
- Around line 369-376: runConfig currently mutates the discriminant of the case
object polyphaseSpectrumBank by writing into precompPoints before setting
kind=kPrecompute (and runBaseline flips kind=kNoPrecompute), which is undefined;
fix by avoiding in-place discriminant mutation—either create a fresh ctx (or
fresh polyphaseSpectrumBank) per configuration and initialize its precompPoints
there inside runConfig, or allocate a separate precompute-only buffer and
populate that buffer and then atomically swap polyphaseSpectrumBank to point to
the new precompute bank (so you never write into the inactive branch); update
runConfig and any callers (e.g., runBaseline) to construct/swap the new bank
rather than mutating kind on a shared case object.

In `@benchmarks/bench_matrix_toeplitz.nim`:
- Around line 163-167: The allocation for `basis` is too small:
`allocHeapArrayAligned(array[L, BLS12_381_G1_aff], size, 64)` only allocates
`size` arrays but `rng.random_unsafe(basis.toOpenArray(0, size * L - 1))` writes
`size * L` elements; update the allocation to allocate `size * L` elements
(i.e., call `allocHeapArrayAligned` with length `size * L`) so
`basis.toOpenArray(0, size * L - 1)` stays within bounds; ensure you adjust only
the allocation argument and keep the existing `rng.random_unsafe` call and
`toOpenArray` indexing.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 423-424: The exported FFI proc new_with_precompute
(ctt_eth_kzg_context_new_with_precompute) declares parameters t and b as Nim int
which is pointer-sized; change their types to cint to match the C header's int
and avoid ABI mismatches, i.e. update proc new_with_precompute*(ctx: var ptr
EthereumKZGContext, filepath: cstring, format: TrustedSetupFormat, t, b: cint):
TrustedSetupStatus and likewise update the signature of load_from_file (the
method called inside) to accept t, b: cint so both declarations align with the C
ABI.

In `@constantine/ethereum_eip4844_kzg.nim`:
- Around line 25-26: The public constructors new and new_with_precompute expose
an EthereumKZGContext that can remain allocated when f.open(...) fails
(returning tsMissingOrInaccessibleFile), leaking memory and handing callers a
partially initialized ctx; fix by either moving the allocation of ctx until
after the file open succeeds or explicitly freeing and nulling ctx before every
early return on error (e.g., in the code path that calls f.open(...) and returns
tsMissingOrInaccessibleFile), ensuring EthereumKZGContext is only returned
non-nil on success.

In `@include/constantine/protocols/ethereum_eip4844_kzg.h`:
- Around line 205-223: The benchmark table and description in the comment are
inconsistent and misleading; update the memory entries so they increase with b
(e.g., show t=64,b=12 uses more memory than t=64,b=8), remove or correct the
"Default (t=64, b=12)" claim (since ctt_eth_kzg_context_new_with_precompute /
new_with_precompute in C has no defaults), and explicitly state that larger b
increases memory exponentially (2^b) while larger t trades layers for doublings;
reference the function name ctt_eth_kzg_context_new_with_precompute (or
new_with_precompute) so readers know this applies to that API and ensure the
table + summary match actual benchmark numbers and the PR notes.

In `@papers/MSM/H23` - Notes on MSMs with Precomputation.md:
- Line 31: Replace empty alt text in the markdown image embeds with descriptive
alt text: for the image currently written as
![](./images/H23%20-%20Notes%20on%20MSMs%20with%20Precomputation/H23%20-%20Fig%201%20-%20Naive%20algorithm%20binary%20decomposition.png)
(and the other embeds at the same file around lines 44, 67, 74, 100, 135),
change to the pattern ![Descriptive alt text](<same-path>) using brief,
meaningful descriptions (e.g., "Figure 1: Naive algorithm binary decomposition
diagram") so each embedded image in "H23 - Notes on MSMs with Precomputation.md"
has non-empty alt text for accessibility and markdown linting.
- Line 133: Fix the visible misspellings across the note by replacing "verical"
with "vertical", "turns our" with "turns out", "indivudial" with "individual",
"endomorphis" with "endomorphism", and "precompuation" with "precomputation"
(these occur in the paragraph containing "There is a variant of the above using
verical blocks..." and similarly at the other flagged instances); update those
exact tokens in the document so all explanatory sections read correctly while
preserving surrounding phrasing and math.

In `@tests/testutils/eth_consensus_utils.nim`:
- Line 29: The test utility currently calls
ctx.new_with_precompute(TrustedSetupMainnet, kReferenceCKzg4844, 256, 8) which
forces the kPrecompute code path in polyphaseSpectrumBank (building
PrecomputedMSM tables) while the rest of the suite uses ctx.new(...)
(kNoPrecompute); either add a brief comment next to the new_with_precompute call
explaining why t=256, b=8 are required here and that the split intentionally
exercises both precomputed and non-precomputed branches across tests, or replace
the call with ctx.new(...) to match the other tests and avoid extra memory/setup
overhead in CI.

---

Nitpick comments:
In `@benchmarks/bench_elliptic_template.nim`:
- Around line 361-371: The struct PrecompBenchContext has an export typo: the
line "t, b*: int" only exports b, leaving t unexported; update the field
declaration in PrecompBenchContext so both fields are exported (e.g., mark t
with a '*' like b) so that ctx.t and ctx.b are both public and usable in callers
such as fmt"t={ctx.t}, b={ctx.b}" and msmPrecompEstimateOps.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 374-381: Replace the per-position rebuild of a temporary array
with the already-populated rawPoints entry: inside the block that sets
ctx.polyphaseSpectrumBank.kind = kPrecompute, stop creating and filling the
local var points from tmp; instead call
ctx.polyphaseSpectrumBank.precompPoints[pos].init with
ctx.polyphaseSpectrumBank.rawPoints[pos] (which already contains
FIELD_ELEMENTS_PER_CELL BLS12_381_G1_Aff values matching what points would
contain) and pass the same t and b; update references to tmp/points only for
positions where rawPoints[pos] is not available.

In `@constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim`:
- Around line 55-70: The function msmPrecompEstimateOps should mirror
msmPrecompSize's sentinel handling: if t == 0 or b == 0 return (0, 0)
immediately to avoid dividing by zero and producing a negative dbl; update
msmPrecompEstimateOps (the func named msmPrecompEstimateOps*) to check for the
no-precompute case up-front and early-return (add=0, dbl=0) before computing
dbl, totalBitExtractions, numWindows or add.

In `@constantine/math/matrix/toeplitz.nim`:
- Around line 344-360: The precomputed overload `finish` is missing the layout
guard used in the raw-points path; add the same compile-time assertion (e.g.
`static doAssert sizeof(F) == sizeof(F.getBigInt())`) near the start of `proc
finish*[EC, ECaff, F; N: static int](ctx: var ToeplitzAccumulator[EC, ECaff, F],
...)` to ensure `ctx.scratchScalars` can be safely reinterpreted via `cast[ptr
UncheckedArray[F.getBigInt()]](ctx.scratchScalars)` and prevent UB for
incompatible `F` types.

In `@tests/commitments/t_kzg_multiproofs.nim`:
- Around line 831-910: The testFK20PrecompMSM currently only exercises one (t,b)
pair; make it table-driven by defining a small list/array of representative
(t,b) pairs (e.g. [(64,8),(32,4)] or similar) and loop over them, re-running the
precompBank init, kzg_coset_prove, comparisons and verifications for each pair;
update uses of the local constants t and b (and any derived values) inside the
loop so precompBank.init(polyphaseBasis[pos], t = t, b = b), precompProofs, and
verification calls are executed per (t,b) configuration and assert success for
each configuration within testFK20PrecompMSM.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 75893922-7362-46fe-8705-d97cda1dad3d

📥 Commits

Reviewing files that changed from the base of the PR and between bdd6c57 and 39dfc6d.

⛔ Files ignored due to path filters (7)

papers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 1 - Naive algorithm binary decomposition.png is excluded by !**/*.png
papers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 2 - Vertical blocks.png is excluded by !**/*.png
papers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 3 - Arbitrary blocks.png is excluded by !**/*.png
papers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 4 - Horizontal blocks.png is excluded by !**/*.png
papers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 5 - Pippenger precomputation.png is excluded by !**/*.png
papers/MSM/images/H23 - Notes on MSMs with Precomputation/H23 - Fig 6 - Horizontal variant vertical blocks.png is excluded by !**/*.png
papers/MSM/images/H24 - Verkle Trees - Another iteration of VKTs MSM/H24 - Fig 1 - Gottfried algorithm visualization.png is excluded by !**/*.png

📒 Files selected for processing (53)

benchmarks/bench_ec_msm_bls12_381_g1.nim
benchmarks/bench_ec_msm_precomp_bandersnatch.nim
benchmarks/bench_ec_msm_precomp_bls12_381_g1.nim
benchmarks/bench_elliptic_parallel_template.nim
benchmarks/bench_elliptic_template.nim
benchmarks/bench_eth_eip4844_kzg.nim
benchmarks/bench_eth_eip7594_peerdas.nim
benchmarks/bench_eth_evm_precompiles.nim
benchmarks/bench_kzg_multiproofs.nim
benchmarks/bench_matrix_toeplitz.nim
benchmarks/eth_eip7594/benchset_generation.nim
benchmarks/eth_eip7594/perf_compute_cells.nim
benchmarks/eth_eip7594/perf_compute_cells_and_kzg_proofs.nim
benchmarks/eth_eip7594/perf_recover_cells_and_kzg_proofs.nim
benchmarks/eth_eip7594/perf_verify_cell_kzg_proof_batch.nim
constantine-go/constantine.go
constantine-rust/constantine-ethereum-kzg/src/lib.rs
constantine-rust/constantine-sys/src/bindings32.rs
constantine-rust/constantine-sys/src/bindings64.rs
constantine.nimble
constantine/commitments/kzg_multiproofs.nim
constantine/commitments_setups/ethereum_kzg_srs.nim
constantine/eth_eip7594_peerdas.nim
constantine/ethereum_eip4844_kzg.nim
constantine/ethereum_evm_precompiles.nim
constantine/math/arithmetic/limbs_crandall.nim
constantine/math/arithmetic/limbs_montgomery.nim
constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim
constantine/math/elliptic/ec_scalar_mul.nim
constantine/math/elliptic/ec_scalar_mul_vartime.nim
constantine/math/elliptic/ec_shortweierstrass_batch_ops.nim
constantine/math/elliptic/ec_twistededwards_batch_ops.nim
constantine/math/extension_fields/exponentiations.nim
constantine/math/matrix/toeplitz.nim
constantine/math_arbitrary_precision/arithmetic/limbs_montgomery.nim
constantine/math_compiler/impl_fields_globals.nim
constantine/named/deriv/precompute.nim
constantine/named/zoo_subgroups.nim
include/constantine/protocols/ethereum_eip4844_kzg.h
metering/m_kzg_multiproofs.nim
papers/MSM/H23 - Notes on MSMs with Precomputation.md
papers/MSM/H24 - Verkle Trees - Another iteration of VKTs MSM.md
tests/commitments/t_kzg_multiproofs.nim
tests/eth_eip7594_peerdas/t_cells_and_kzg_proofs_opt.nim
tests/eth_eip7594_peerdas/t_compute_cells_opt.nim
tests/eth_eip7594_peerdas/t_peerdas_recovery.nim
tests/math_elliptic_curves/t_ec_multi_scalar_mul_precomp.nim
tests/math_matrix/t_toeplitz.nim
tests/t_eth_eip7594_peerdas.nim
tests/t_ethereum_eip4844_deneb_kzg.nim
tests/t_ethereum_eip4844_deneb_kzg_parallel.nim
tests/t_ethereum_evm_precompiles.nim
tests/testutils/eth_consensus_utils.nim

gemini-code-assist

Code Review

This pull request introduces precomputed Multi-Scalar Multiplication (MSM) tables to optimize FK20 proofs for PeerDAS, significantly improving performance for KZG operations. The changes include a new module for precomputed MSM, updates to the KZG context management (replacing trusted_setup_load with new and new_with_precompute), and updates to benchmarks and tests to support these optimizations. I have identified two critical issues in the new precomputation logic: potential processing of uninitialized memory in precomputeWindow and a type safety issue with the scalars parameter in msm_vartime.

gemini-code-assist · 2026-05-05T15:48:47Z

+  scratchspace[0].setNeutral()
+
+  var currentSize = 1
+  for basisIdx in 0 ..< basisSlice.len:
+    let prevSize = currentSize
+    currentSize = min(currentSize * 2, windowSize)
+    for s in 0..<prevSize:
+      if (s + prevSize) < windowSize:
+        scratchspace[s + prevSize].sum_vartime(scratchspace[s], basisSlice[basisIdx])
+
+  tableSlice.batchAffine_vartime(scratchspace)


In precomputeWindow, if basisSlice.len < b (which happens for the last window if the total number of points is not a multiple of b), only the first 2^basisSlice.len elements of scratchspace are initialized. However, batchAffine_vartime is called on the entire scratchspace of size windowSize (2^b). This results in processing uninitialized memory, which could lead to undefined behavior or crashes during the affine conversion if the garbage data represents invalid group elements. The conversion should be restricted to the initialized portion of the buffer.

let numPoints = 1 shl basisSlice.len tableSlice.toOpenArray(0, numPoints - 1).batchAffine_vartime(scratchspace.toOpenArray(0, numPoints - 1))

gemini-code-assist · 2026-05-05T15:48:47Z

+func msm_vartime*[EC; N](
+  ctx: PrecomputedMSM[EC, N],
+  r: var EC,
+  scalars: openArray[BigInt]


The scalars parameter is typed as openArray[BigInt]. Since BigInt is a generic type in Constantine (e.g., BigInt[bits: static int]), this might lead to compilation errors or unexpected behavior if the compiler cannot infer the static bit length. It is safer to make the function generic over the bit length of the scalars.

scalars: openArray[BigInt[FrBits]]

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

benchmarks/bench_elliptic_parallel_template.nim (1)
111-169: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return body timing here, not wall time around bench(...).

bench already measures the benchmark body and then prints the result. Timing around the whole bench(...) call makes perfNaive / perfMSM* include formatting and stdout overhead, so reportMSMParallel() is no longer comparing just MSM runtime. This skews the small-N speedups the most.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/bench_elliptic_parallel_template.nim` around lines 111 - 169, The
timings are taken around the bench(...) calls which include printing/formatting
overhead; inside proc msmParallelBench* you should measure only the benchmark
body and store those times into result.perf* — move the
getMonotime()/inNanoseconds timing into the closures passed to bench (e.g.
around the loop/body used for naive scalarMul, scalarMul_vartime,
multiScalarMul_reference_vartime, multiScalarMul_vartime and
tp.multiScalarMul_vartime_parallel) and assign result.perfNaive,
result.perfMSMbaseline, result.perfMSMopt, result.perfMSMpara from those
internal measurements (per-iteration) instead of using the outer
startNaive/startMSMbaseline/startMSMopt/startMSMpara and stop* variables; remove
the outer wall-clock timing around bench calls so perf* reflects only the MSM
work.

♻️ Duplicate comments (2)

papers/MSM/H23 - Notes on MSMs with Precomputation.md (1)
20-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix remaining grammar/typo slips in the explanatory text.

There are a few small wording errors that hurt readability in an otherwise strong note.
Proposed edit
- so we can view $a_i$'s as have this many bits.
+ so we can view $a_i$'s as having this many bits.

- The algorithm has two parameter, namely the blocksize $b$ and the number of rows $s$ where we perform precomputation.
+ The algorithm has two parameters, namely the block size $b$ and the number of rows $s$ where we perform precomputation.

- Here, the situation is somewhat different, as 252 doublings would be very significant relative to the total cost. For this part of of the MSMs, we can use a much larger $s$, e.g. $s=64$ or $s=128$ (giving $t=4$ or $t=2$, which means 3 or 1 doublings).
+ Here, the situation is different, as 252 doublings would be very significant relative to the total cost. For this part of the MSMs, we can use a much larger $s$, e.g. $s=64$ or $s=128$ (giving $t=4$ or $t=2$, which means 3 or 1 doublings).
Also applies to: 188-188, 192-192
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@papers/MSM/H23` - Notes on MSMs with Precomputation.md at line 20, The
sentence "so we can view $a_i$'s as have this many bits." is grammatically
incorrect; change it to "so we can view $a_i$'s as having this many bits." and
scan the nearby explanatory text for similar slips (the occurrences noted around
the same paragraph as well as the other instances referenced) to fix
subject-verb agreement and awkward phrasing so the explanations read smoothly
and consistently.
benchmarks/bench_eth_eip7594_peerdas.nim (1)
188-231: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Discriminant-mutation pattern still violates polyphaseSpectrumBank case-object semantics.

runNoPrecomp flips kind = kNoPrecompute (line 190) and runPrecomp writes into precompPoints[pos].init(...) (line 204) before restoring kind = kPrecompute (line 205). On the first runPrecomp invocation after runNoPrecomp, the writes at line 204 land on the inactive branch of the case object, which is undefined under Nim's case-object rules. On subsequent runPrecomp calls inside the PrecompConfigs loop, kind is already kPrecompute, so the writes silently overwrite the previous configuration's PrecomputedMSM tables without invoking =destroy, leaking the earlier allocation each iteration. Per snippet 4 (ethereum_kzg_srs.nim:448-457), delete only frees the currently-active tables, so all but the last config's tables are leaked, and the inactive-branch write before the first kind flip risks ORC/ARC clobbering the freshly initialized data when the discriminant assignment zero-initializes the new branch.

Either build a fresh ctx per (t, b) (preferred — also avoids cross-config state bleed into the verification/recovery benchmarks that follow), or stage a separate array[CELLS_PER_EXT_BLOB, PrecomputedMSM[...]] buffer, fully populate it under the existing discriminant, and then atomically swap it in. Avoid touching kind directly across iterations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/bench_eth_eip7594_peerdas.nim` around lines 188 - 231, The
benchmark flips polyphaseSpectrumBank.kind and writes into precompPoints while
the discriminant may be the inactive branch, causing undefined writes and
leaking prior PrecomputedMSM tables; fix by avoiding in-place discriminant
mutation across configs — either recreate a fresh ctx for each runPrecomp
invocation (preferred) so each (t,b) uses its own polyphaseSpectrumBank, or
allocate a temporary array[CELLS_PER_EXT_BLOB, PrecomputedMSM...] buffer, fully
init it while the existing kind remains unchanged, then atomically swap that
buffer into ctx.polyphaseSpectrumBank and set kind = kPrecompute; ensure you do
not write into precompPoints[] before setting the matching kind and that
previous active tables are properly deleted via the existing delete path.

🧹 Nitpick comments (1)

include/constantine/protocols/ethereum_eip4844_kzg.h (1)

205-238: ⚡ Quick win

Document the valid range and failure behavior for t and b.

The tradeoff table is useful, but the public API contract still leaves callers guessing whether zero, negative, or oversized values are rejected, clamped, or undefined. Since these parameters directly control potentially very large allocations, please spell out the supported domain and what status is returned for invalid inputs.

Suggested doc tweak

 /** Create a new KZG context with precomputed MSM tables.
  *  Same as ctt_eth_kzg_context_new but also builds PrecomputedMSM lookup
  *  tables for FK20 proofs (PeerDAS).
  *
- *  `@param` t  base groups (stride between precomputed layers)
- *  `@param` b  bits per window (window size = 2^b)
+ *  `@param` t  base groups (stride between precomputed layers). Must be positive
+ *            and within the implementation-supported range.
+ *  `@param` b  bits per window (window size = 2^b). Must be positive and within
+ *            the implementation-supported range.
+ *            Invalid values return an error status.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/constantine/protocols/ethereum_eip4844_kzg.h` around lines 205 - 238,
Update the docblock for ctt_eth_kzg_context_new_with_precompute to explicitly
state the valid domains and failure semantics for parameters t and b (e.g., t
must be positive power-of-two or within [min_t,max_t], b must be non-negative
and <= max_b, or whatever the implementation accepts), document what happens for
zero/negative/oversized values (rejected with a specific
ctt_eth_trusted_setup_status error such as
CTT_ETH_TRUSTED_SETUP_INVALID_PARAMETER or CTT_ETH_TRUSTED_SETUP_OOM), and note
that oversized b may cause large allocations and will return the appropriate
error status rather than undefined behavior; reference the returned
ctt_eth_trusted_setup_status values callers should expect on invalid inputs or
allocation failures.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/bench_elliptic_parallel_template.nim`:
- Around line 194-227: benchPrecompMSMInline currently ignores its ctx and
builds a fresh PrecompBenchContext, producing a non-comparable baseline; change
it to reuse the caller's dataset by initializing or populating benchCtx with
ctx.points[0 ..< N] and ctx.coefs[0 ..< N] (respecting N, t, b) before timing
the precomputed MSM. Concretely, instead of new(PrecompBenchContext[EC, N], seed
= 42'u64, t = t, b = b) create/populate benchCtx so its precomp and scalars are
built from ctx.points[0 ..< N] and ctx.coefs[0 ..< N] (or call the existing
PrecompBenchContext initializer/method that accepts these slices), then run
benchCtx.precomp.msm_vartime(resultEC, benchCtx.scalars) as before so
reportPrecompSpeedup compares like-for-like with
ctx.msmParallelBench/perf.perfMSMOpt.

In `@benchmarks/bench_elliptic_template.nim`:
- Around line 475-478: In doBench, the scaled iteration expression can evaluate
to 0 when N is large; ensure the count passed to ctx.benchPrecompMSM is clamped
to at least 1 by computing a scaledIters = max(1, iters div max(1, N div 10))
and passing scaledIters to ctx.benchPrecompMSM; apply the same clamping change
to the other precomp invocation sites (the other occurrences around the
precomp/reference benchmark code) so benchPrecompMSM never receives 0 and
downstream division by benchIters cannot fail.
- Around line 386-391: The reference MSM workload consumes RNG outputs in the
opposite order to PrecompBenchContext.new, causing mismatched inputs; update the
reference MSM blocks so they generate ctx.scalars first using
ctx.rng.random_unsafe(BigInt[bits]) and then generate ctx.basisJac with
ctx.rng.random_unsafe(EC) followed by ctx.basisJac[i].clearCofactor(), matching
the order used by PrecompBenchContext.new; apply this same change to both
occurrences where the reference data is regenerated.

In `@constantine-go/constantine.go`:
- Around line 105-114: EthKzgContextNewWithPrecompute currently casts Go ints t
and b directly to C.int which can truncate or accept negative values; validate
that t and b are within the 32-bit signed range and non-negative (e.g., 0 <= t,b
<= math.MaxInt32) and also enforce any sane upper bounds you expect for
precompute sizes before converting to C.int, returning an error if they are out
of range; then only after validation perform the C.int conversion for the call
to ctt_eth_kzg_context_new_with_precompute and proceed to use ctx.cCtx as
before.
- Around line 128-129: The Delete method currently has a value receiver which
frees the C pointer on a copy; change its receiver to a pointer receiver (func
(ctx *EthKzgContext) Delete()) so the original EthKzgContext.cCtx is freed, and
after calling C.ctt_eth_kzg_context_delete(ctx.cCtx) set ctx.cCtx to nil to
avoid use-after-free or double-free on subsequent calls; locate the method by
the symbol names EthKzgContext, Delete, cCtx and the C call
C.ctt_eth_kzg_context_delete.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 403-415: The function load_from_file allocates ctx via
alloc0HeapAligned before calling ctx.load_ckzg4844; if load_ckzg4844 returns a
failure (e.g., tsInvalidFile) the function currently returns without freeing the
allocated ctx, leaking memory. Fix by freeing and nulling ctx on any non-success
status: after let status = ctx.load_ckzg4844(f, int(t), int(b)) check if status
!= tsSuccess then free the allocated context (use the matching deallocation
function for alloc0HeapAligned, e.g., dealloc or free), set ctx = nil, ensure
fileio.close(f) still runs, and then return status; otherwise proceed to return
the success status as before.

---

Outside diff comments:
In `@benchmarks/bench_elliptic_parallel_template.nim`:
- Around line 111-169: The timings are taken around the bench(...) calls which
include printing/formatting overhead; inside proc msmParallelBench* you should
measure only the benchmark body and store those times into result.perf* — move
the getMonotime()/inNanoseconds timing into the closures passed to bench (e.g.
around the loop/body used for naive scalarMul, scalarMul_vartime,
multiScalarMul_reference_vartime, multiScalarMul_vartime and
tp.multiScalarMul_vartime_parallel) and assign result.perfNaive,
result.perfMSMbaseline, result.perfMSMopt, result.perfMSMpara from those
internal measurements (per-iteration) instead of using the outer
startNaive/startMSMbaseline/startMSMopt/startMSMpara and stop* variables; remove
the outer wall-clock timing around bench calls so perf* reflects only the MSM
work.

---

Duplicate comments:
In `@benchmarks/bench_eth_eip7594_peerdas.nim`:
- Around line 188-231: The benchmark flips polyphaseSpectrumBank.kind and writes
into precompPoints while the discriminant may be the inactive branch, causing
undefined writes and leaking prior PrecomputedMSM tables; fix by avoiding
in-place discriminant mutation across configs — either recreate a fresh ctx for
each runPrecomp invocation (preferred) so each (t,b) uses its own
polyphaseSpectrumBank, or allocate a temporary array[CELLS_PER_EXT_BLOB,
PrecomputedMSM...] buffer, fully init it while the existing kind remains
unchanged, then atomically swap that buffer into ctx.polyphaseSpectrumBank and
set kind = kPrecompute; ensure you do not write into precompPoints[] before
setting the matching kind and that previous active tables are properly deleted
via the existing delete path.

In `@papers/MSM/H23` - Notes on MSMs with Precomputation.md:
- Line 20: The sentence "so we can view $a_i$'s as have this many bits." is
grammatically incorrect; change it to "so we can view $a_i$'s as having this
many bits." and scan the nearby explanatory text for similar slips (the
occurrences noted around the same paragraph as well as the other instances
referenced) to fix subject-verb agreement and awkward phrasing so the
explanations read smoothly and consistently.

---

Nitpick comments:
In `@include/constantine/protocols/ethereum_eip4844_kzg.h`:
- Around line 205-238: Update the docblock for
ctt_eth_kzg_context_new_with_precompute to explicitly state the valid domains
and failure semantics for parameters t and b (e.g., t must be positive
power-of-two or within [min_t,max_t], b must be non-negative and <= max_b, or
whatever the implementation accepts), document what happens for
zero/negative/oversized values (rejected with a specific
ctt_eth_trusted_setup_status error such as
CTT_ETH_TRUSTED_SETUP_INVALID_PARAMETER or CTT_ETH_TRUSTED_SETUP_OOM), and note
that oversized b may cause large allocations and will return the appropriate
error status rather than undefined behavior; reference the returned
ctt_eth_trusted_setup_status values callers should expect on invalid inputs or
allocation failures.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 73119252-bcee-4759-96d7-09c8c5f33df4

📥 Commits

Reviewing files that changed from the base of the PR and between 39dfc6d and b31614c.

📒 Files selected for processing (20)

benchmarks/bench_ec_msm_bandersnatch.nim
benchmarks/bench_ec_msm_bls12_381_g1.nim
benchmarks/bench_ec_msm_bls12_381_g2.nim
benchmarks/bench_ec_msm_bn254_snarks_g1.nim
benchmarks/bench_ec_msm_pasta.nim
benchmarks/bench_elliptic_parallel_template.nim
benchmarks/bench_elliptic_template.nim
benchmarks/bench_eth_eip7594_peerdas.nim
constantine-go/constantine.go
constantine-go/eth_kzg7594_peerdas_test.go
constantine-rust/constantine-ethereum-kzg/src/lib.rs
constantine-rust/constantine-ethereum-kzg/tests/t_eth_kzg7594.rs
constantine/commitments_setups/ethereum_kzg_srs.nim
constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim
include/constantine/protocols/ethereum_eip4844_kzg.h
papers/MSM/H23 - Notes on MSMs with Precomputation.md
tests/eth_eip7594_peerdas/t_cells_and_kzg_proofs_opt.nim
tests/eth_eip7594_peerdas/t_compute_cells_opt.nim
tests/t_eth_eip7594_peerdas.nim
tests/testutils/eth_consensus_utils.nim

🚧 Files skipped from review as they are similar to previous changes (1)

constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim

coderabbitai · 2026-05-05T21:06:05Z

+func EthKzgContextNewWithPrecompute(trustedSetupFile string, t, b int) (ctx EthKzgContext, err error) {
+	cFile := C.CString(trustedSetupFile)
+	defer C.free(unsafe.Pointer(cFile))
+	status := C.ctt_eth_kzg_context_new_with_precompute(
+		&ctx.cCtx,
+		cFile,
+		C.cttEthTSFormat_ckzg4844,
+		C.int(t),
+		C.int(b),
+	)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the file fd -t f "constantine.go" --path "*constantine-go*"

Repository: mratsim/constantine

Length of output: 1945

🏁 Script executed:

# Read the specific function mentioned (lines 105-114) sed -n '100,125p' constantine-go/constantine.go

Repository: mratsim/constantine

Length of output: 1945

🏁 Script executed:

# Search for other similar constructors to understand validation patterns rg "Context.*New" constantine-go/ -A 10 -B 2

Repository: mratsim/constantine

Length of output: 17317

🏁 Script executed:

# Check for any existing validation of int parameters in the codebase rg "C\.int" constantine-go/ -B 2 -A 2 | head -60

Repository: mratsim/constantine

Length of output: 617

🏁 Script executed:

# Look for documentation on t and b parameters rg "precompute|parameter|knob" constantine-go/ -i -B 1 -A 1 | head -40

Repository: mratsim/constantine

Length of output: 3151

Validate t and b parameters before converting to C.int to prevent truncation and invalid allocation.

The function directly casts Go int parameters to C.int without validation. On 64-bit systems, Go int exceeds C's 32-bit signed integer range, and truncation happens silently during the cast. Negative values also pass through unchecked. Since this constructor allocates large precomputed tables (documented as MiB to GiB scale), validation in Go prevents invalid states from reaching the C allocator.

Suggested fix

func EthKzgContextNewWithPrecompute(trustedSetupFile string, t, b int) (ctx EthKzgContext, err error) { + const maxCInt = int(^uint32(0) >> 1) + if t <= 0 || t > maxCInt { + return ctx, errors.New("EthKzgContextNewWithPrecompute: t must be > 0 and fit in C.int") + } + if b <= 0 || b > maxCInt { + return ctx, errors.New("EthKzgContextNewWithPrecompute: b must be > 0 and fit in C.int") + } + cFile := C.CString(trustedSetupFile) defer C.free(unsafe.Pointer(cFile)) status := C.ctt_eth_kzg_context_new_with_precompute( &ctx.cCtx, cFile,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@constantine-go/constantine.go` around lines 105 - 114, EthKzgContextNewWithPrecompute currently casts Go ints t and b directly to C.int which can truncate or accept negative values; validate that t and b are within the 32-bit signed range and non-negative (e.g., 0 <= t,b <= math.MaxInt32) and also enforce any sane upper bounds you expect for precompute sizes before converting to C.int, returning an error if they are out of range; then only after validation perform the C.int conversion for the call to ctt_eth_kzg_context_new_with_precompute and proceed to use ctx.cCtx as before.

…ponsibility.

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (2)

constantine/commitments_setups/ethereum_kzg_srs.nim (1)

411-423: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Memory leak on tsInvalidFile still present.

The previous review marked this as addressed, but load_from_file still allocates ctx before parsing (line 419) and returns status without freeing on failure. Downstream wrappers (e.g. constantine-rust/constantine-ethereum-kzg/src/lib.rs) drop the raw pointer on Err, so every failed load leaks the EthereumKZGContext buffer (~5 MiB+ once the SRS arrays are accounted for).

Additionally, the t = 64, b = 12 parameters at line 411 are no longer used inside the body after the refactor — load_ckzg4844(f) is now called without them, and PeerDAS setup is performed by the callers in new* / new_with_precompute*. They can be dropped.

🛡️ Proposed fix

-proc load_from_file(ctx: var ptr EthereumKZGContext, filepath: cstring, format: TrustedSetupFormat, t = 64, b = 12): TrustedSetupStatus =
+proc load_from_file(ctx: var ptr EthereumKZGContext, filepath: cstring, format: TrustedSetupFormat): TrustedSetupStatus =
   ## Load from a trusted setup file.
 
   var f: File
   let ok = f.open(filepath, kRead)
   if not ok:
     return tsMissingOrInaccessibleFile
 
-  ctx = alloc0HeapAligned(EthereumKZGContext, alignment = 64)
+  let tmp = alloc0HeapAligned(EthereumKZGContext, alignment = 64)
 
-  let status = ctx.load_ckzg4844(f)
+  let status = tmp.load_ckzg4844(f)
   fileio.close(f)
-  return status
+  if status != tsSuccess:
+    freeHeapAligned(tmp)
+    ctx = nil
+  else:
+    ctx = tmp
+  return status

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim` around lines 411 - 423,
The function load_from_file currently allocates ctx before calling
ctx.load_ckzg4844(f), which leaks when load_ckzg4844 returns an error (e.g.,
tsInvalidFile); change the control flow so either (A) defer allocation of the
EthereumKZGContext until after load_ckzg4844 succeeds, or (B) if you must
allocate before parsing, free/dealloc the allocated ctx immediately when status
!= tsOK before returning; also remove the unused parameters t and b from the
load_from_file signature since they are not used (they were removed from
load_ckzg4844 and PeerDAS is handled by callers), and ensure references to
load_from_file and its callers are updated accordingly.

benchmarks/bench_elliptic_template.nim (1)

486-501: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reference MSM still uses a different RNG consumption order than PrecompBenchContext.new.

PrecompBenchContext.new (lines 386–393) consumes the RNG as scalars → basisJac, but the reference block here generates basisJac2 → scalars2. With the same seed they still produce different inputs, so precomp rows and reference rows aren't comparable side-by-side.

Same applies to the clamping concern below: iters div max(1, N div 10) on lines 478 and 501 evaluates to 0 when N > 10 * iters, which makes benchPrecompMSM(0) a no-op and causes division-by-zero in the reference path's (stop-start) div benchIters.

🐛 Proposed fix

-  proc doBench(_: typedesc[EC], N, t, b: static int, iters: int) {.noInline.} =
-    # Wrap in a proc to ensure destruction of the large context
-    let ctx = new(PrecompBenchContext[EC, N], seed = 42'u64, t = t, b = b)
-    ctx.benchPrecompMSM(iters div max(1, N div 10))
+  proc doBench(_: typedesc[EC], N, t, b: static int, iters: int) {.noInline.} =
+    # Wrap in a proc to ensure destruction of the large context
+    let ctx = new(PrecompBenchContext[EC, N], seed = 42'u64, t = t, b = b)
+    let scaledIters = max(1, iters div max(1, N div 10))
+    ctx.benchPrecompMSM(scaledIters)
@@
-  for i in 0 ..< N:
-    basisJac2[i] = rng2.random_unsafe(EC)
-    basisJac2[i].clearCofactor()
-
-  basis2.asUnchecked().batchAffine_vartime(basisJac2.asUnchecked(), N)
-
-  for i in 0 ..< N:
-    scalars2[i] = rng2.random_unsafe(BigInt[bits])
-
-  let benchIters = iters div max(1, N div 10)
+  for i in 0 ..< N:
+    scalars2[i] = rng2.random_unsafe(BigInt[bits])
+
+  for i in 0 ..< N:
+    basisJac2[i] = rng2.random_unsafe(EC)
+    basisJac2[i].clearCofactor()
+
+  basis2.asUnchecked().batchAffine_vartime(basisJac2.asUnchecked(), N)
+
+  let benchIters = max(1, iters div max(1, N div 10))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmarks/bench_elliptic_template.nim` around lines 486 - 501, The reference
MSM block must consume the RNG in the same order as PrecompBenchContext.new
(scalars → basisJac) and ensure benchIters never becomes zero; change the seq
generation to fill scalars2 from rng2 before filling basisJac2 (i.e., move the
scalars2 loop before the basisJac2 loop or swap the two loops that assign
rng2.random_unsafe to scalars2 and basisJac2) so it matches
PrecompBenchContext.new, and replace the benchIters computation with a safe
version that forces a minimum of 1 (e.g., benchIters = max(1, iters div max(1, N
div 10))) to avoid zero iterations and division-by-zero when calling
benchPrecompMSM or computing timings.

🧹 Nitpick comments (2)

constantine/commitments_setups/ethereum_kzg_srs.nim (2)
350-377: ⚖️ Poor tradeoff

Inconsistent treatment of =destroy on the bank between setupPolyphaseSpectrumBank and delete.

delete (lines 462–464) deliberately avoids calling \=destroy`(ctx.polyphaseSpectrumBank)because the destructor was observed to raise/escaperaises: [], and instead destroys each precompPoints[i]individually. YetsetupPolyphaseSpectrumBankat line 357 calls exactly that destructor unconditionally inside the same{.push raises: [], checks: off.}region. If the original concern was real, this call site has the same problem; if it was not, the workaround indelete` is unnecessary.

Please either:

replicate the per-element destruction loop here as well, or

if \=destroy`(bank)is in fact safe underraises: [], remove the workaround in delete` (just call the destructor) and keep this site consistent.

Also worth noting: directly mutating ctx.polyphaseSpectrumBank.kind (lines 367, 374) after =destroy is an unusual pattern. Constructing a fresh value via ctx.polyphaseSpectrumBank = PolyphaseSpectrumBank(kind: kPrecompute) (or kNoPrecompute) makes the discriminator transition explicit and lets the runtime handle the union reset.

Also applies to: 461-473
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim` around lines 350 - 377,
The setupPolyphaseSpectrumBank currently calls
`=destroy(ctx.polyphaseSpectrumBank)` while `delete` avoids that call and
instead destroys each `precompPoints[i]` individually; make these two sites
consistent: either replace the `=destroy(ctx.polyphaseSpectrumBank)` in `proc
setupPolyphaseSpectrumBank` with the same per-element destruction loop used in
`delete` (iterate `for i in 0 ..< CELLS_PER_EXT_BLOB:
ctx.polyphaseSpectrumBank.precompPoints[i].destroy()` or equivalent), or change
`delete` to simply call `=destroy(ctx.polyphaseSpectrumBank)` like setup does.
Additionally, instead of mutating `ctx.polyphaseSpectrumBank.kind` after
destruction, reinitialize the whole union explicitly (e.g.,
`ctx.polyphaseSpectrumBank = PolyphaseSpectrumBank(kind: kPrecompute)` or
`kNoPrecompute`) so the discriminator transition is handled safely by the
runtime; apply the same pattern in both `setupPolyphaseSpectrumBank` and
`delete`.
431-459: ⚡ Quick win

Add explicit conversions for cint → int parameters to match codebase patterns.

At line 459, the cint variables t and b are passed directly to setupKzg7594PeerDAS(ctx, t, b: int). While Nim may permit this implicitly, the codebase precedent (e.g., threadpool.nim:997 using cast[cint](numThreads)) establishes explicit type conversions for all cint ↔ int interactions. Use explicit conversion for clarity and consistency:
result = ctx.load_from_file(filepath, format)
if result == tsSuccess:
  ctx.setupKzg4844ProtoDanksharding()
  ctx.setupKzg7594PeerDAS(int(t), int(b))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim` around lines 431 - 459,
The call in new_with_precompute passes C-int parameters t and b directly to
setupKzg7594PeerDAS; follow the repo pattern and convert them explicitly to Nim
int before calling (e.g. cast or int() on t and b) so update the invocation of
ctx.setupKzg7594PeerDAS to pass converted values (reference symbols:
new_with_precompute, ctx.setupKzg7594PeerDAS, parameters t and b).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/bench_elliptic_parallel_template.nim`:
- Around line 166-169: The perf assignments use uninitialized MonoTime when the
naive/baseline branches are skipped for numInputs > 100000; either wrap the
result.perfNaive and result.perfMSMbaseline assignments with the same numInputs
<= 100000 guard used when setting startNaive/stopNaive and
startMSMbaseline/stopMSMbaseline, or initialize startNaive/stopNaive and
startMSMbaseline/stopMSMbaseline to a known value (or set
result.perfNaive/result.perfMSMbaseline to 0.0) when those branches are skipped
so the computed perf values are deterministic; locate the assignments to
result.perfNaive, result.perfMSMbaseline and the variables startNaive,
stopNaive, startMSMbaseline, stopMSMbaseline to apply the guard or
initialization.

In `@benchmarks/bench_eth_eip7594_peerdas.nim`:
- Around line 463-467: The recovery benchmarks reuse the same
"recover_cells_and_kzg_proofs" labels making output ambiguous; update the bench
labeling to include the precompute config (e.g., append "precompute t=256 b=8"
or "no precompute") so both runs are distinct. Specifically, when calling
ctx.setupPolyphaseSpectrumBank(t = 256, b = 8) before
benchRecoverCellsAndKZGProofs_WorstCase and
benchRecoverCellsAndKZGProofs_VaryingAvailability, pass or construct a unique
label into those bench functions (or into the internal
recover_cells_and_kzg_proofs label generator) that includes the t and b values
or an explicit "precompute" suffix so console output and downstream parsers can
distinguish the two runs.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 219-232: The inline comment in polyphaseSpectrumBank
(constantine/commitments_setups/ethereum_kzg_srs.nim) incorrectly states ~4.33
MiB per table × 128 ≈ 554 MB for the t=64,b=12 default; reconcile it with the
actual calculation used by msmPrecompSize and the benchmark in
new_with_precompute by updating the comment to reflect the correct per-table and
total memory (≈8.25 MiB per table → ~1056 MiB total) or, if the benchmark is
wrong, correct the benchmark math; locate references to pointsPerColumn,
expandedBasisLen, numWindows, windowSize and msmPrecompSize and ensure the
textual numbers match the computed values (pointsPerColumn = ceil(255/64)=4,
expandedBasisLen=256, numWindows=ceil(256/12)=22, windowSize=4096, total affine
points ≈22*4096 and bytes ≈ affine_points*~96) so the comment and
new_with_precompute benchmark agree.

In `@constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim`:
- Around line 142-212: The init procedure for PrecomputedMSM leaks a previous
ctx.table when called twice; modify init (the init*[EC; N] routine) to check if
ctx.table is non-nil at the start and, if so, free the existing allocation
(using freeHeapAligned or the same allocator used) and reset
ctx.tableLen/ctx.t/ctx.b as appropriate before proceeding to allocate a new
ctx.table with allocHeapArrayAligned; ensure this cleanup also handles the
kNoPrecompute (t==0 or b==0) path so repeated init calls do not leak memory.

In `@scripts/parse_peerdas_benchmarks.py`:
- Around line 279-283: The final note currently hardcodes "~34% speedup" via the
lines.append call that adds "Precompute=8 (c-kzg-4844) trades 96 MiB memory for
~34% speedup in FK20 operations"; change this to compute the FK20 speedup from
the parsed benchmark rows (use the FK20 timings you already parse earlier in the
script) and format the computed percentage into the appended string, or if those
values aren't available, remove the numeric claim and leave a generic sentence;
update the lines.append call that contains the hardcoded note accordingly so the
report always reflects derived data rather than a fixed value.

---

Duplicate comments:
In `@benchmarks/bench_elliptic_template.nim`:
- Around line 486-501: The reference MSM block must consume the RNG in the same
order as PrecompBenchContext.new (scalars → basisJac) and ensure benchIters
never becomes zero; change the seq generation to fill scalars2 from rng2 before
filling basisJac2 (i.e., move the scalars2 loop before the basisJac2 loop or
swap the two loops that assign rng2.random_unsafe to scalars2 and basisJac2) so
it matches PrecompBenchContext.new, and replace the benchIters computation with
a safe version that forces a minimum of 1 (e.g., benchIters = max(1, iters div
max(1, N div 10))) to avoid zero iterations and division-by-zero when calling
benchPrecompMSM or computing timings.

In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 411-423: The function load_from_file currently allocates ctx
before calling ctx.load_ckzg4844(f), which leaks when load_ckzg4844 returns an
error (e.g., tsInvalidFile); change the control flow so either (A) defer
allocation of the EthereumKZGContext until after load_ckzg4844 succeeds, or (B)
if you must allocate before parsing, free/dealloc the allocated ctx immediately
when status != tsOK before returning; also remove the unused parameters t and b
from the load_from_file signature since they are not used (they were removed
from load_ckzg4844 and PeerDAS is handled by callers), and ensure references to
load_from_file and its callers are updated accordingly.

---

Nitpick comments:
In `@constantine/commitments_setups/ethereum_kzg_srs.nim`:
- Around line 350-377: The setupPolyphaseSpectrumBank currently calls
`=destroy(ctx.polyphaseSpectrumBank)` while `delete` avoids that call and
instead destroys each `precompPoints[i]` individually; make these two sites
consistent: either replace the `=destroy(ctx.polyphaseSpectrumBank)` in `proc
setupPolyphaseSpectrumBank` with the same per-element destruction loop used in
`delete` (iterate `for i in 0 ..< CELLS_PER_EXT_BLOB:
ctx.polyphaseSpectrumBank.precompPoints[i].destroy()` or equivalent), or change
`delete` to simply call `=destroy(ctx.polyphaseSpectrumBank)` like setup does.
Additionally, instead of mutating `ctx.polyphaseSpectrumBank.kind` after
destruction, reinitialize the whole union explicitly (e.g.,
`ctx.polyphaseSpectrumBank = PolyphaseSpectrumBank(kind: kPrecompute)` or
`kNoPrecompute`) so the discriminator transition is handled safely by the
runtime; apply the same pattern in both `setupPolyphaseSpectrumBank` and
`delete`.
- Around line 431-459: The call in new_with_precompute passes C-int parameters t
and b directly to setupKzg7594PeerDAS; follow the repo pattern and convert them
explicitly to Nim int before calling (e.g. cast or int() on t and b) so update
the invocation of ctx.setupKzg7594PeerDAS to pass converted values (reference
symbols: new_with_precompute, ctx.setupKzg7594PeerDAS, parameters t and b).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 30be442e-76e9-4fb4-ac17-df21dafa9af0

📥 Commits

Reviewing files that changed from the base of the PR and between b31614c and 58ff170.

📒 Files selected for processing (11)

benchmarks/bench_elliptic_parallel_template.nim
benchmarks/bench_elliptic_template.nim
benchmarks/bench_eth_eip7594_peerdas.nim
benchmarks/bench_gmp_modexp.nim
benchmarks/bench_gt_parallel_template.nim
benchmarks/zkalc.nim
constantine/commitments_setups/ethereum_kzg_srs.nim
constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim
constantine/math/elliptic/ec_shortweierstrass_batch_ops.nim
constantine/math/elliptic/ec_twistededwards_batch_ops.nim
scripts/parse_peerdas_benchmarks.py

coderabbitai · 2026-05-05T23:15:53Z

+func init*[EC; N](
+    ctx: var PrecomputedMSM[EC, N],
+    basis: openArray[affine(EC)],
+    t, b: int) =
+  ## Build the precomputed lookup table for `basis` points.
+  ##
+  ## **Preconditions**
+  ## - `basis.len == N` (checked at runtime via `doAssert`)
+  ## - `t >= 1` and `b >= 1` (checked at runtime via `doAssert`)
+  ## - Passing `(0, 0)` skips table construction (kNoPrecompute mode).
+  ##
+  ## **Important:** `init` assumes the caller provides *well-formed* basis points:
+  ## - Each point MUST lie on the curve
+  ## - Each point MUST be in the correct prime-order subgroup
+  ## - Each point MUST NOT be the neutral element (infinity)
+  ##
+  ## No on-curve or subgroup validation is performed. Passing invalid points
+  ## silently produces meaningless tables. This is safe in practice because
+  ## callers that parse external data (e.g. SRS files) validate during parsing,
+  ## and hardcoded/test bases are known-correct.
+
+  doAssert basis.len == N
+
+  # (0, 0) means no precompute
+  if t == 0 or b == 0:
+    ctx.t = 0
+    ctx.b = 0
+    return
+
+  ctx.t = t
+  ctx.b = b
+
+  const FrBits = EC.getScalarField().bits()
+  let pointsPerColumn = (FrBits + t - 1) div t
+  let expandedBasisLen = N * pointsPerColumn
+  let expandedBasis = allocHeapArrayAligned(EC, expandedBasisLen, alignment = 64)
+
+  # Fill expanded basis: P, P^(2^t), P^(2^(2t)), ... for each basis point
+  var idx = 0
+  for i in 0 ..< N:
+    expandedBasis[idx].fromAffine(basis[i])
+    idx += 1
+    # Expand: P_j = P_0^(2^(j*t)) via doubling chain
+    for j in 1 ..< pointsPerColumn:
+      expandedBasis[idx].double(expandedBasis[idx - 1])
+      for _ in 1 ..< t:
+        expandedBasis[idx].double()
+      idx += 1
+
+  let numWindows = (expandedBasisLen + b - 1) div b
+  let windowSize = 1 shl b
+  let totalTableSize = msmPrecompSize(EC, N, t, b)
+
+  ctx.table = allocHeapArrayAligned(affine(EC), totalTableSize, alignment = 64)
+  ctx.tableLen = totalTableSize
+
+  let scratchspace = allocHeapArrayAligned(EC, windowSize, alignment = 64)
+
+  for windowIdx in 0 ..< numWindows:
+    let startIdx = windowIdx * b
+    let endIdx = min(startIdx + b, expandedBasisLen)
+
+    precomputeWindow(
+      ctx.table.toOpenArray(windowIdx*windowSize, windowIdx*windowSize + windowSize - 1),
+      expandedBasis.toOpenArray(startIdx, endIdx - 1),
+      scratchspace.toOpenArray(windowSize),
+      b
+    )
+
+  freeHeapAligned(scratchspace)
+  freeHeapAligned(expandedBasis)


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

init leaks the previous ctx.table if called twice.

init allocates ctx.table = allocHeapArrayAligned(...) at line 195 unconditionally. If a caller invokes init on a PrecomputedMSM that already has a non-nil table (e.g. to switch (t, b) parameters), the previous table allocation is leaked. The current in-tree call site (setupPolyphaseSpectrumBank) works around this by =destroy-ing the bank first, but init is exported and should be safe on its own.

Add a defensive cleanup at the top of init:

🛡️ Proposed fix

doAssert basis.len == N + # Free any pre-existing table to make init idempotent / re-usable. + if ctx.table != nil: + freeHeapAligned(cast[pointer](ctx.table)) + ctx.table = nil + ctx.tableLen = 0 + # (0, 0) means no precompute if t == 0 or b == 0: ctx.t = 0 ctx.b = 0 return

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@constantine/math/elliptic/ec_multi_scalar_mul_precomp.nim` around lines 142 - 212, The init procedure for PrecomputedMSM leaks a previous ctx.table when called twice; modify init (the init*[EC; N] routine) to check if ctx.table is non-nil at the start and, if so, free the existing allocation (using freeHeapAligned or the same allocator used) and reset ctx.tableLen/ctx.t/ctx.b as appropriate before proceeding to allocate a new ctx.table with allocHeapArrayAligned; ensure this cleanup also handles the kNoPrecompute (t==0 or b==0) path so repeated init calls do not leak memory.

coderabbitai · 2026-05-05T23:15:53Z

+    lines.append("- \u00b9 Recovery: c-kzg-4844 uses precompute=8 (96 MiB); constantine uses t=256, b=8 (24 MiB)")
+    lines.append("- \u00b2 c-kzg-4844 verifies 8192 cells (64 blobs); constantine matches this config")
+    lines.append("- \u0394% shows constantine relative to c-kzg-4844 (negative = faster)")
+    lines.append("- c-kzg-4844 precompute levels and constantine (t, b) configs are not directly comparable")
+    lines.append("- Precompute=8 (c-kzg-4844) trades 96 MiB memory for ~34% speedup in FK20 operations")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't hardcode the FK20 speedup note.

The ~34% speedup statement is fixed text, so rerunning this script on different benchmark inputs can generate a report whose note is wrong even when the table above is correct. Derive that figure from the parsed rows or drop the numeric claim.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/parse_peerdas_benchmarks.py` around lines 279 - 283, The final note currently hardcodes "~34% speedup" via the lines.append call that adds "Precompute=8 (c-kzg-4844) trades 96 MiB memory for ~34% speedup in FK20 operations"; change this to compute the FK20 speedup from the parsed benchmark rows (use the FK20 timings you already parse earlier in the script) and format the computed percentage into the appended string, or if those values aren't available, remove the numeric claim and leave a generic sentence; update the lines.append call that contains the hardcoded note accordingly so the report always reflects derived data rather than a fixed value.

A significant part of c-kzg-4844 backend is constant-time and invariant to inputs.

The part that is variable-time, is that their scalar-mul can get passed a bit-length, either 255-bit for fully constant-time operation (blst_fr is 255-bit) or measured if variable-time is OK, typically for FFT since roots of unity can be really small or even 1.

for the MSM though all inputs are cryptographic in nature and so indistinguishable from random. Hence I would be very surprised if the 34% speedup changes from random input to random input.

@mratsim, that's a fair and well-reasoned point. Since all MSM inputs are cryptographic scalars (indistinguishable from random), the bit-length measurement path in their variable-time scalar-mul doesn't apply here — the measured ~34% figure is effectively stable across runs. I'll withdraw the suggestion.

🐰 ~ curious ears, learning new tricks ~ (\_/) (•ᴗ•) /> 🔬

✏️ Learnings added

Learnt from: mratsim Repo: mratsim/constantine PR: 619 File: scripts/parse_peerdas_benchmarks.py:279-283 Timestamp: 2026-05-06T08:27:20.477Z Learning: In `scripts/parse_peerdas_benchmarks.py` (constantine repo), the hardcoded `~34% speedup` note for c-kzg-4844 precompute=8 FK20 operations is intentional and stable. c-kzg-4844's backend is largely constant-time; the variable-time scalar-mul (bit-length measurement) only applies to FFT roots of unity, not MSM. MSM inputs are cryptographic scalars (indistinguishable from random), so the speedup ratio does not vary across benchmark runs.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

mratsim added 2 commits May 5, 2026 14:49

Sketch of Precomputation accelerated MSM

68cf7cb

Runtime Precomp param

39dfc6d

greptile-apps Bot reviewed May 5, 2026

View reviewed changes

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 5, 2026

View reviewed changes

mratsim changed the title ~~[PeerDAS] Precomputed MSMs - 1.62x acceleration~~ [PeerDAS] Precomputed MSMs - up to 1.62x acceleration May 5, 2026

mratsim added 2 commits May 5, 2026 22:51

Review round 1

b31614c

uppercase/lowercase fixes

9627ea1

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

mratsim added 2 commits May 5, 2026 23:26

PrecomputedMSM init, remove deletion in init, caller/destructor res…

ddb9b83

…ponsibility.

Improve bench

537c4ad

mratsim added the peerdas label May 5, 2026

Up-to-date result parser

58ff170

coderabbitai Bot reviewed May 5, 2026

View reviewed changes

mratsim added 4 commits May 6, 2026 10:32

CodeRabbit review comments

021d616

Debunk internal review

70b3035

Regen speed / memory tradeoff in Rust bindings that is out of date

f5e7ffc

Internal review: nitpicks

e90f1cc

mratsim force-pushed the precomp-MSM-runtime-rebase branch from f9a2d2c to e90f1cc Compare May 6, 2026 08:58

I want my openarrays as values. Fix enthusiastic CodeRabbit fix

6590341

mratsim merged commit e6bee85 into master May 6, 2026
17 checks passed

mratsim deleted the precomp-MSM-runtime-rebase branch May 6, 2026 10:26

mratsim mentioned this pull request May 15, 2026

improve KZG performance by precomputation status-im/nimbus-eth2#8405

Merged

coderabbitai Bot mentioned this pull request Jun 18, 2026

Warning: missing operand; zero assumed spam with BigInt arithmetic on volatile variable #622

Open

Uh oh!

Conversation

mratsim commented May 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

greptile-apps Bot commented May 5, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mratsim May 6, 2026

Choose a reason for hiding this comment

mratsim commented May 5, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 5, 2026 •

edited

Loading

coderabbitai Bot May 5, 2026 •

edited

Loading