Conversation
…mutation pressure The val-window service emits a lightweight DELETE on t_validator_rewards_summary every finalized checkpoint event (~6:24 min). On networks with ~1M validators this lowers to a ClickHouse mutation that rewrites the in-window parts (14-55 GiB each on hoodi); each fire takes minutes and queues up faster than it can drain, saturating one merge core and stalling the head until an operator restart. Make the boundary advance gate the fire: only emit DELETE once the window's lower boundary has advanced by DELETE_CADENCE_EPOCHS since the last successful fire (default 32 epochs ~3.4h, ~70x headroom over the ~150s mutation cost on hoodi). The first event after start always fires to anchor the baseline; subsequent events skip until the cadence is met. The DELETE statement and boundary calculation are unchanged - the only observable difference is up to (cadence-1) extra epochs retained beyond the strict window (0.16% overshoot vs. the 20250-epoch window). The per-epoch surgical delete used by reorg recovery (DeleteStateMetrics) is untouched. Set DELETE_CADENCE_EPOCHS=1 for legacy behaviour.
…UpTo race Each FinalizedCheckpointEvent in head mode launches a new `go AdvanceFinalized(...)`. When a previous invocation is still running (common when ProcessStateTransitionMetrics takes longer than the ~6:24 min finalized interval — networks with ~1M validators, or any catch-up scenario), two goroutines race over the same StateHistory: the newer one runs CleanUpTo at the end of its loop and evicts entries that the older one is still blocked on inside StateHistory.Wait / BlockHistory.Wait. The blocked goroutine then waits forever holding a processerBook slot, and successive races leak the whole 32-slot pool, surfacing as floods of "Waiting for too long to acquire page" warnings and a stuck head. Observed on goteth-hoodi this morning: a single dependency state at epoch 93105 was evicted while a ProcessStateTransitionMetrics goroutine held a Wait on it, blocking that slot for 30+ minutes; the analyzer stopped advancing past dbHeadEpoch 93110 even with ClickHouse healthy. Skip overlapping invocations via TryLock. The skipped one would have iterated a subset of the state keys the next invocation will see, and its CleanUpTo would have been a subset of what the next one performs, so dropping it is monotonically safe — no work is lost. The historical-mode synchronous call site (routines.go:208) is unaffected: head mode only starts after historical completes, so TryLock always succeeds there.
…tatus Validators between deposit and activation can be in one of two spec-defined sub-states: pending_initialized (eligibility epoch is FAR_FUTURE_EPOCH) or pending_queued (eligibility epoch is set). goteth read ActivationEligibilityEpoch from the beacon state into local memory but never persisted it, so downstream consumers could not split the two sub-states. This commit: - adds f_activation_eligibility_epoch (UInt64, default FAR_FUTURE_EPOCH) to t_validator_last_status via migration 000036 - extends the ValidatorLastStatus struct, ToArray, and the ClickHouse INSERT to carry the new field - reads validator.ActivationEligibilityEpoch in processValLastStatus - adds three invariant tests in tests/db_validator_test.py - documents the column in docs/tables.md Fixes #266
Fix advance finalized redownload
* fix: block rewards overflow * use helper for bigInt conversion * use uint256 instead of string * update docs --------- Co-authored-by: Zyra-V21 <zyrav21@proton.me>
PR #259 (block-rewards overflow fix) merged into dev today and claimed migration number 036 for alter_block_rewards_uint256. Renumber this PR's migration to 037 to avoid the collision and keep numerical ordering deterministic on rebase. No content change; pure file rename. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…letes fix: prevent goteth stalls on networks with large validator sets
fix base fee byte order
fix: persist activation_eligibility_epoch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes the current
devline tomasteras v3.8.2. Five changes across data completeness, arithmetic correctness, and indexer stability.Data / schema
f_activation_eligibility_epochont_validator_last_status(migration000037), enabling downstreampending_initializedvspending_queueddistinction. (fix: persist activation_eligibility_epoch #268)Bug fixes
t_block_rewardstoUInt256/*big.Intso large values no longer wrap (migration000036). (Fix block rewards overflow #259)AdvanceFinalizedto prevent a concurrentCleanUpTorace in steady-state head mode. (Fix advance finalized redownload #263)Closes
Known issues (not addressed here)
HandleReorgleaves stale rewards int_block_rewardsfor reorged slots) is a separate reorg path from Fix advance finalized redownload #263 and remains open.Deploy notes
000036_alter_block_rewards_uint256—MODIFY COLUMNon the three reward columns oft_block_rewards(UInt64 → UInt256). This triggers a background ClickHouse mutation that rewrites those columns across all parts; on large tables it needs free disk and is not instantaneous. Monitorsystem.mutationsfor completion.000037_add_activation_eligibility_epoch— additiveADD COLUMNont_validator_last_status, non-blocking / instant.f_activation_eligibility_epochat its FAR_FUTURE_EPOCH default for freshly written rows.Test plan
000036and000037apply cleanly on a master-versioned DBsystem.mutationsfor thet_block_rewardsrewrite completes without errorf_activation_eligibility_epochpopulated for active/exited validators