Skip to content

Part 3 - pow: remove pyethash C extension, always use pure Python ethash#973

Open
ping-ke wants to merge 17 commits intoupgrade/py313-baselinefrom
upgrade/ethash
Open

Part 3 - pow: remove pyethash C extension, always use pure Python ethash#973
ping-ke wants to merge 17 commits intoupgrade/py313-baselinefrom
upgrade/ethash

Conversation

@ping-ke
Copy link
Copy Markdown
Contributor

@ping-ke ping-ke commented Mar 15, 2026

Summary

pyethash is a C++ extension that is not compatible with Python 3.13. The pure Python implementation in ethereum.pow.ethash can serve as a replacement; however, directly removing pyethash caused a significant regression in synchronization performance (10–20× slower), making block sync impractically slow.

This PR:

  • Fixes the compatibility issue by removing pyethash
  • Introduces a 4-round optimization pipeline (R1–R4) to recover and exceed the original performance
  • Reduces sync time from 25.24s → 0.86s (~29× improvement vs old Python, faster than pyethash)

Problem

pyethash 0.1.27 crashes with segfault or floating point exception on Python 3.13 when calling hashimoto_light().

Removing pyethash and falling back to the existing pure Python implementation leads to:

  • Heavy overhead in hex encoding/decoding
  • Excessive Python object allocations in hot loops
  • Severe performance degradation in PoW and block synchronization

Root Cause

A bug in src/python/core.c of pyethash:

PyArg_ParseTuple uses "y#" format which writes Py_ssize_t (8 bytes on 64-bit) into int variables (4 bytes), causing stack corruption.

// core.c line 76-77 (pyethash 0.1.27)
int cache_size, header_size;  // BUG: should be Py_ssize_t
if (!PyArg_ParseTuple(args, "k" PY_STRING_FORMAT PY_STRING_FORMAT "K",
    &block_number, &cache_bytes, &cache_size, &header, &header_size, &nonce))

Solution

This PR removes pyethash entirely and replaces it with a progressively optimized Ethash implementation, eliminating Python bottlenecks in the PoW hot path while restoring (and exceeding) the original performance.

Optimization Strategy

To systematically eliminate bottlenecks, we applied four incremental optimization rounds, each targeting a different layer:

  • R1–R2 (Python-level optimizations)
    Remove serialization overhead and reduce Python object allocations

  • R3 (Cython hot loop)
    Move the hottest FNV mixing loop into C

  • R4 (Full C pipeline)
    Eliminate Python overhead entirely in the PoW critical path (including Keccak)

See #976 for more detail

Result

Root Block Sync Time (end-to-end)

Syncing one root block with 144 miniblocks (maximum load)

impl sync time vs pyethash vs old speedup vs R2
pyethash 1.47 s ~17×
old 25.24 s ~17×
R1 12.38 s ~8.4× ~2.0×
R2 8.52 s ~5.8× ~3.0×
R3 1.39 s ~0.95× ~18× ~6×
R4 0.86 s ~0.58× ~29× ~10×
  • Restores performance after removing pyethash
  • Achieves ~29× speedup vs original Python implementation
  • Achieves better performance than pyethash baseline

Test plan

  • Mining and PoW verification still pass in existing tests
  • No ImportError on Python 3.13
  • Benchmarks cover old / R1 / R2 / R3 / R4
  • Sync tested end-to-end with profiling and timing logs

pyethash is a C++ extension that is not compatible with Python 3.13.
The pure Python implementation in ethereum.pow.ethash is sufficient.
Remove the conditional import and always use the Python path, adding
@lru_cache to get_cache_slow for the same performance benefit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ping-ke added 2 commits March 30, 2026 23:22
Add comment explaining why pyethash C++ acceleration was removed
(not supported on Python 3.13) with link to #976
@ping-ke ping-ke changed the base branch from upgrade/py313-baseline to master April 5, 2026 02:37
@ping-ke ping-ke changed the base branch from master to upgrade/py313-baseline April 5, 2026 02:37
@ping-ke ping-ke requested a review from syntrust April 5, 2026 02:38
ping-ke added 7 commits April 9, 2026 15:42
…arrays

ethash_utils.py:
- replace hex-based encode_int/decode_int with struct.pack/unpack for
  serialize_hash and deserialize_hash (~30x faster per call)
- inline ethash_sha3_512/256 to skip intermediate list conversion (~5x faster
  on list input)
- add ethash_sha3_512_np and ethash_sha3_256_np: numpy ndarray variants that
  accept bytes or ndarray and return uint32 ndarray, eliminating tolist()/
  np.array() round-trips in the hot path
- consolidate keccak implementation here; ethash.py no longer duplicates it

ethash.py:
- store cache as 2D numpy uint32 ndarray (shape n x 16) via _get_cache
- use ethash_sha3_512_np/256_np throughout to keep data in ndarray form
- vectorize the 16-element mix update in calc_dataset_item and hashimoto
  inner loop using numpy arithmetic instead of list(map(fnv, ...))
- scalar fnv for cache_index uses plain Python int to avoid numpy scalar overhead

test_ethash.py:
- add TestEthashUtils covering serialize_hash, deserialize_hash, fnv,
  ethash_sha3_512/256 directly against reference implementations

Benchmark: hashimoto_light ~23% faster end-to-end vs pure Python baseline;
serialize_hash/deserialize_hash ~30x faster individually
ethash_utils.py:
- remove struct, _FMT_16I, _FMT_8I (only served deleted serialize/deserialize_hash)
- remove fnv (only used in tests, not in production path)
- remove ethash_sha3_512 list variant (replaced by numpy ndarray variant)
- remove serialize_hash, deserialize_hash, hash_words, xor, serialize_cache,
  deserialize_cache and related aliases (all replaced by ndarray.tobytes/frombuffer)

ethash.py:
- mkcache: use ethash_sha3_256_np(...).tobytes() directly, drop serialize_hash
- drop serialize_hash import (no longer needed)

ethpow.py:
- remove pyethash C-extension dead code paths (get_cache/hashimoto were always
  equal to get_cache_slow/hashimoto_slow after pyethash removal)
- keep get_cache_slow/hashimoto_slow structure as fallback for future Cython ext

test_ethash.py:
- remove test cases for deleted functions (serialize_hash, deserialize_hash, fnv,
  ethash_sha3_256, ethash_sha3_512 list variant)
- cache/dataset hex comparison uses ndarray.tobytes().hex() directly

bench_before_after.py, bench_hashimoto_compare.py:
- add old/mid/new three-way comparison
- old implementations kept inline for regression reference
- new side imports from current ethash module directly
…umpy

ethash_cy.pyx: typed C loop replacing the 256-iteration FNV parent mixing
in calc_dataset_item. ethash.py auto-imports when built, falls back to pure
Python otherwise. bench_hashimoto_compare.py extended with R3 column.
- ethash.py: rewrite with numpy uint32 arrays (R2); add ETHASH_LIB env var
  to select python/cython/auto at runtime
- ethash_cy.pyx: add mix_parents (R3), cy_calc_dataset_item and
  cy_hashimoto_light with C keccak (R4)
- keccak_tiny.c/h: portable C Keccak implementation for Cython R4
- ethpow.py: use ETHASH_LIB-aware hashimoto_light; simplify check_pow/mine
- setup.py: build Cython extension with keccak_tiny.c
- old_ethash.py: extract original hex-based implementation as reference baseline
- bench_hashimoto_compare.py: merge bench_before_after.py; add R3/R4 sections;
  import old impl from old_ethash.py
- test_ethash.py: use old_ethash as baseline for cython correctness test
- remove bench_before_after.py
@ping-ke
Copy link
Copy Markdown
Contributor Author

ping-ke commented Apr 9, 2026

Add Performance improvements and related tests/bench. See #976 for more infor.

@qzhodl
Copy link
Copy Markdown
Contributor

qzhodl commented Apr 14, 2026

I didn't see the rust code in the latest change here

ping-ke added 2 commits April 14, 2026 15:41
- Add ethereum/pow/ethash_rs: full Rust implementation of ethash
  (mkcache, hashimoto_light, mix_parents) using PyO3 0.22 + tiny-keccak
- Integrate setuptools-rust into setup.py so build_ext --inplace
  places ethash_rs.so in the source tree alongside ethash_cy.so
- Refactor ethash.py: ETHASH_LIB branches only import functions;
  _get_cache and hashimoto_light defined once with None-check fallback
- Auto-detection order: ethash_rs -> ethash_cy -> pure Python
- Update Dockerfile to install Rust toolchain and build both extensions
- Update README and requirements.txt for Rust/maturin/setuptools-rust
- Add test_rust_matches_python_fallback to verify Rust output
- bench_hashimoto_compare.py: add R5 section covering rs_mkcache,
  rs_calc_dataset_item, and rs_hashimoto_light; skipped gracefully when
  the Rust extension is not available (same pattern as R3/R4)
- ethash.py: fix auto-detect to check for the expected symbol
  (rs_hashimoto_light / cy_hashimoto_light) after import, preventing the
  ethash_rs/ Cargo source directory (a namespace package) from being
  mistaken for a built extension
…(10)

- Replace global np.seterr(over='ignore') with np.errstate(over='ignore')
  scoped to the two FNV multiply sites in calc_dataset_item and hashimoto,
  avoiding unintended suppression of overflow warnings elsewhere
- Use dtype='<u4' (explicit little-endian) in ethash_sha3_512/256 instead
  of native-endian np.uint32, matching Ethereum spec on big-endian hosts
- Restore lru_cache(10): 8 shards can span ~6-7 different epochs
  simultaneously, so 2-3 slots would cause frequent cache eviction
- test_cython_matches_python_fallback and test_rust_matches_python_fallback
  now fail on ImportError instead of skipping
@ping-ke ping-ke requested a review from qzhodl April 15, 2026 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants