montgomery: Fast MSM in WebAssembly

by Gregor Mitscha-Baude

2nd place in the Wasm/MSM ZPrize in both 2022 and 2023.

A fast, multi-threaded implementation of elliptic curve multi-scalar multiplication (MSM) in WebAssembly. Works in Node.js and the browser. The Wasm is generated at runtime from TypeScript via wasmati, so adding a new curve is a matter of plugging in its parameters.

Install

npm install montgomery

Runtime requirements

The library uses SharedArrayBuffer and Atomics.waitAsync for the worker pool. Neither can be polyfilled, so they set a hard floor on where montgomery runs.

Runtime	Minimum version
Node.js	24+ (for `using` declarations)
Chrome	90
Firefox	145 (shipped Nov 2025 — earlier Firefox lacks `Atomics.waitAsync`)
Safari	16.4

Browsers additionally need the page to be cross-origin isolated for SharedArrayBuffer to be exposed — serve with Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp.

using declarations are used internally but compiled away in the web bundle, so browser support doesn't depend on them. If there is demand, we could do that for the Node.js build as well.

Quick start

import { Pallas, startThreads, stopThreads, type AffinePoint } from "montgomery";

// lazy factory: instantiates the Pallas curve on first call
const pallas = await Pallas();
b
// optional: spin up worker threads to parallelize large MSMs
await startThreads(4);

// inputs — replace with your own points and scalars
const points: AffinePoint[] = /* ... */;
const scalars: bigint[] = /* ... */;

// write inputs to wasm memory
const scalarPtr = pallas.Scalar.fromBigints(scalars);
const pointPtr = pallas.Affine.fromBigints(points);

// compute msm
const { result } = await pallas.Parallel.msm(scalarPtr, pointPtr, scalars.length);

// read back the affine bigint result
const scratch = pallas.Field.local.getPointers(5);
const affinePtr = pallas.Field.getPointer(pallas.Affine.size);
pallas.Projective.toAffine(scratch, affinePtr, result);
const { x, y } = pallas.Affine.toBigint(affinePtr);

await stopThreads();

For performance-sensitive workloads, stream the inputs as raw bytes straight into wasm memory — see Curve.Parallel.scalarsFromBytes / Curve.Parallel.pointsFromBytes. The submission examples in doc/zprize23/ show that pattern.

For best throughput on very large MSMs, Curve.Parallel.msmUnsafe skips the degenerate-addition check and is ~25% faster, but assumes the input points don't collide (the main application of the library is with pseudo-randomly generated points for prover-side computation, where the check is wasteful).

Curves

Lazy factories, all in montgomery:

Export	Curve
`Pallas`	First Pasta curve (short Weierstrass, used in Halo 2 / Mina)
`Vesta`	Pallas' sister curve, base/scalar fields swapped
`BLS12377`	BLS12-377 (short Weierstrass, used in Aleo)
`BLS12381`	BLS12-381 (short Weierstrass, used in Ethereum / Zcash Sapling)
`BN254`	BN254 / alt_bn128 G1 (short Weierstrass, used in Ethereum EIP-196/197 precompiles, Circom, Aztec)
`Secp256k1`	secp256k1 (short Weierstrass, used in Bitcoin / Ethereum signatures)
`Ed377`	Edwards-on-BLS12-377 (twisted Edwards, used in Aleo)
`Ed25519`	Ed25519 (twisted Edwards, birational twin of Curve25519; signatures)

Generic constructors Weierstraß.create(params) and TwistedEdwards.create(params) are also exported if you want to plug in your own curve parameters.

Under the hood, these constructors build a dedicated Wasm module on-the-fly based on the provided parameters, and distribute the module to Workers with memory sharing when parallelism is enabled.

Threads

Parallelism in both Node.js and browsers is supported by exposing methods to start and stop a global pool of workers. Performance-critical library methods like the MSM automatically shard their work across the currently available pool.

startThreads(n) spins up a pool of n workers (defaults to availableParallelism()); stopThreads() terminates it. Safe to call before or after curve creation: the library tracks which curves exist and resegments their memory when the thread count changes.

If you skip startThreads, the library transparently falls back to single-threaded execution on the main thread, without any overhead compared to dedicated single-threaded implementations.

Random points

montgomery has efficient random point and field element generation, suitable for testing:

const n = 1 << 16;

// both return an array of per-thread pointers; the first entry points at
// the full contiguous batch and is what msm wants
const [pointPtr] = await pallas.Parallel.randomPointsFast(n);
const [scalarPtr] = await pallas.Parallel.randomScalars(n);

const { result } = await pallas.Parallel.msmUnsafe(scalarPtr, pointPtr, n);

Warning. randomPointsFast() does not produce a cryptographically independent point basis. Each point is a random linear combination of a small precomputed basis, drawn from ~64 bits of entropy by default, so every output lives in a set of $2^{64}$ possible points. Fine for benchmarks and correctness tests; not safe for commitment schemes.

If you need an independent basis, curve.Affine.randomPoints(ptrs) (or curve.Curve.randomPoints(ptrs) on twisted edwards) samples each point from a random x-coordinate via a square root. Much slower (each point costs a sqrt), but cryptographically sound. It takes a pre-allocated pointer array:

const n = 1 << 10;
const ptrs = pallas.Field.global.getPointers(n, pallas.Affine.size);
pallas.Affine.randomPoints(ptrs);
// ptrs[0] is the pointer to the contiguous batch of n affine points

Bigint reference library

Every curve module ships with a pure-TS bigint reference implementation alongside the fast wasm one, exposed via curve.Bigint:

const pallas = await Pallas();

// bigint prime fields, with add/sub/mul/square/inverse/exp/sqrt/isSquare/random
const { Field, Scalar } = pallas.Bigint.Projective;

// bigint curve ops
pallas.Bigint.Affine.add(P, Q);
pallas.Bigint.Affine.scale(k, P);

// and a pippenger MSM for cross-checking the wasm one
const s = pallas.Bigint.Projective.msm(scalars, points);

For Weierstrass curves curve.Bigint has both Affine and Projective layers (each with add, double, negate, scale, isEqual, isOnCurve, toSubgroup, random, ...). Twisted edwards has its single extended-projective layer at curve.Bigint directly. The bigint implementation is comparatively slow but small, simple, and matches the wasm one in behavior — used internally as the test oracle and handy for small-scale crypto (like signatures), tutorials, or building custom protocols in plain TS.

Low-level arithmetic on Wasm pointers

Underneath the MSM, every curve exposes its full wasm field/scalar/curve arithmetic on raw pointers:

curve.Field / curve.Scalar — add, subtract, multiply, square, inverse, exp, sqrt, isEqual, isZero, reduce, toMontgomery/fromMontgomery, fromPackedBytes/toPackedBytes, writeBigint/readBigint, …
curve.Affine / curve.Projective (Weierstrass) or curve.Curve (twisted edwards) — add, double, negate, scale, isOnCurve, batchNormalize, toBigint/writeBigint, …

These are the same primitives the library's MSMs are built on: msm-batched-affine.ts (~590 lines of pure TS) and msm-basic.ts (~220 lines) touch no handwritten wasm — they compose the operations exposed on curve.Field / curve.Affine / curve.Projective. You can build other curve-level algorithms (pairings, zk-SNARK prover kernels, …) on the same API without leaving TypeScript.

A few highlights:

29×9 limb layout for 256-bit fields. 29-bit limbs packed into 9 i64 lanes let the Montgomery multiplication use i64 multiplies with enough headroom in the upper bits to accumulate partial products before carrying — a sweet spot for wasm, which has no native 64×64→128 multiply. Bain Capital Crypto's Optimizing Montgomery Multiplication in WebAssembly benchmarks several wasm multiplication variants against each other and finds this one (which they call "Mitscha-Baude's method", referencing this repo) the fastest.
Fast modular inverse at < 30 × MUL cost, based on Pornin's "Optimized Binary GCD for Modular Inversion" (eprint 2020/972). Much faster than the usual exp(x, p-2) Fermat trick.
Fast square root via Tonelli–Shanks optimized after Daniel Bernstein's "Faster square roots in annoying finite fields": the discrete-log phase caches roots-of-unity windows so it drops to a handful of multiplications, leaving the x^((t−1)/2) exponentiation as the dominant cost.

Constant-time: not a design goal

montgomery targets high-volume client-side computation (e.g. local SNARK provers) where the threat model does not include timing side channels: an adversary doesn't share the machine or observe wall-clock times of individual operations. Under that assumption we trade constant-time execution for raw throughput, and many core operations branch on their operands:

Field.inverse — Pornin binary GCD with data-dependent branches
Field.sqrt — Tonelli-Shanks with retry and lookup-driven digit extraction
Field.exp — standard square-and-multiply, branches per exponent bit
Field.reduce (and Field.add / Field.subtract through it) — conditional subtraction
Curve.scale, Curve.isOnCurve, all MSM paths — inherit variable timing from the above

If your threat model does include timing side channels (server-side key operations, co-tenant environments, remote attackers), use a library built for constant-time execution instead.

More

Scripts in scripts/ illustrate end-to-end use of each curve: run-msm-pallas.ts, run-msm-377.ts, run-msm-ed-377.ts, plus field-level benchmarks under scripts/field-benchmarks/.
doc/zprize23/ - ZPrize 2023 submission sources (twisted edwards + BLS12-377) and README, preserved as a reference.
doc/zprize22.md - ZPrize 2022 write-up explaining some of the algorithms underlying the MSM.

Name		Name	Last commit message	Last commit date
Latest commit History 1,174 Commits
.github/workflows		.github/workflows
doc		doc
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
build-for-browser		build-for-browser
package-lock.json		package-lock.json
package.json		package.json
run-in-browser		run-in-browser
tsconfig.json		tsconfig.json
tsconfig.web.json		tsconfig.web.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

montgomery: Fast MSM in WebAssembly

Install

Runtime requirements

Quick start

Curves

Threads

Random points

Bigint reference library

Low-level arithmetic on Wasm pointers

Constant-time: not a design goal

More

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

montgomery: Fast MSM in WebAssembly

Install

Runtime requirements

Quick start

Curves

Threads

Random points

Bigint reference library

Low-level arithmetic on Wasm pointers

Constant-time: not a design goal

More

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages