Witness Optimizations for SHA256: Dynamic AND/XOR Bit-Width + Spread-Based Binops#284
Witness Optimizations for SHA256: Dynamic AND/XOR Bit-Width + Spread-Based Binops#284
Conversation
… display - Added a new module for SHA256 chunk decompositions with spread-based representation in . - Introduced functions for computing spread, decomposing packed witnesses, and adding spread table constraints. - Updated the function in to reflect the new spread-based SHA256 cost summary. - Removed redundant calculations for SHA256 batched constraints and witnesses, simplifying the output. - Enhanced the display of optimal atomic width for binary operations in the circuit stats.
|
|
||
| let digest = sha256_var(bytes, 32); | ||
| assert(digest == result); | ||
| // let r = sha256_var(digest, 32); |
There was a problem hiding this comment.
Assert is commented out, this example computes a SHA256 digest but never checks the result. Please uncomment.
| const SPREAD_ALL_ONES: u64 = 0x5555_5555_5555_5555; | ||
|
|
||
| /// Pure function: compute spread(val) — interleave bits with zeros. | ||
| pub(crate) fn compute_spread(val: u64) -> u64 { |
There was a problem hiding this comment.
This function is duplicated identically in provekit/prover/src/witness/witness_builder.rs:423. The compiler uses one copy to build constraints, the prover uses the other to compute witnesses. If either copy is modified without updating the other, proofs break silently. Please move to provekit-common.
| Vec<usize>, | ||
| )>, | ||
| ) { | ||
| ) -> BTreeMap<u32, Vec<usize>> { |
There was a problem hiding this comment.
The spread approach requires p >> 2^64. The 3-way Maj spread sum reaches 3 × 0x5555555555555555 = 2^64 − 1, spread coefficients reach 4^31 = 2^62, and carry range checking needs 255 × 2^32 < p. All of these break silently for small fields (e.g. M31 with p ≈ 2^31). There's no guard anywhere. Consider:
assert!(FieldElement::MODULUS_BIT_SIZE > 64, "Spread-based SHA256 requires field modulus > 2^64");
This PR implements two major witness-count optimizations for the R1CS compiler targeting SHA256 circuits:
1. Dynamic Bit-Width for Combined AND/XOR Lookup Table
Introduces a cost model that searches over candidate atomic widths {2, 4, 8} and selects the width minimizing total witness count (table + decomposition + complementary + overhead)
Currently only applies to 8-bit operands; wider operands fall back to w=8
2. Spread Trick for SHA256 Bitwise Operations
Replaces the generic combined AND/XOR table approach for SHA256 with the spread-based technique from Eli Ben-Sasson et al.
Key idea: "spreading" a value by interleaving zeros between bits (
0b1011→0b01000101) converts bitwise XOR/AND into field addition/multiplication, eliminating per-operation lookup costsWhy ROTR/SHR are free: SHA256 decomposes each 32-bit word into chunks aligned to rotation boundaries. For example,
Sigma_0uses chunks[2, 11, 9, 10] bits exactly matching ROTR2, ROTR13, ROTR22. A rotation is just reading the same chunks in a different order with adjusted coefficients: ROTR2(a) reads chunks starting from index 1 instead of index 0. No new witnesses are allocated — the spread values computed during decomposition are reused with permuted coefficients. Similarly, SHR drops the lowest chunks entirely. The cost of rotation/shift is zero witnesses, zero constraints.SHA256 operations implemented via spread:
sigma_0,sigma_1,Sigma_0,Sigma_1,Ch,Maj, message schedule, compression roundsu32addition with single carry witness per additionStats
SHA256 (35 compression calls )
SHA256 (1 compression call)
Known Soundness Limitation (for M31)
Future Work
Dynamic spread table width: Currently fixed at w=8 (256 entries). A dynamic search over {4, 8, 16} could further reduce witnesses for circuits with fewer SHA256 calls
Shared lookup table: The spread table implicitly range-checks values to [0, 2^w - 1]. Merging with the existing range-check system would eliminate redundant table entries and save additional witnesses.