Skip to content

fix(actpool): pick eviction victim across all worker shards#4848

Open
CoderZhi wants to merge 1 commit into
masterfrom
fix_actpool_xshard_eviction
Open

fix(actpool): pick eviction victim across all worker shards#4848
CoderZhi wants to merge 1 commit into
masterfrom
fix_actpool_xshard_eviction

Conversation

@CoderZhi

@CoderZhi CoderZhi commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

queueWorker.Handle evicted via worker.accountActs.PopPeek() — popping the lowest-priority head action from only the calling worker's shard. With _numWorker = 16, each shard holds ~6% of accounts. An attacker who saturates one shard can force honest senders in the other ~94% of shards to evict each other's actions; the attacker's own actions are never the per-shard minimum and never get popped.

This change replaces per-shard eviction with popLowestPriorityAcrossWorkers (actpool/actpool.go), which locks every worker mu in increasing index order (no other path holds more than one worker mu, so deadlock-free) and picks the global minimum across all shards using headLess — a helper mirroring accountPriorityQueue.Less (nil-next > settled > gas price) so the cross-shard victim matches what a single unified per-account heap would have evicted.

Handle is restructured so worker.mu is no longer held across the global pop (would invert lock order); putAction / accountActs take their own locks, so the change is safe.

This was originally bundled into #4844 alongside several unrelated panic-hardening fixes. Splitting it out for separate review since the change is non-trivial (multi-lock acquisition order, eviction semantics).

Tests (all in actpool/actpool_cross_shard_test.go)

  • TestActPool_popLowestPriorityAcrossWorkers — direct unit test, two senders in different shards, asserts the global lowest is evicted regardless of caller shard.
  • TestActPool_popLowestPriorityAcrossWorkers_empty — empty-pool no-op, must return nil rather than panic on empty priorityQueue.
  • TestHeadLess — truth table mirroring accountPriorityQueue.Less so a future "simplification" of either function can't silently diverge from global heap semantics.
  • TestActPool_crossShardEviction_e2e — full Add path: low-fee shard A fills the pool, high-fee shard B newcomer; cheap A action must be the victim.
  • TestActPool_crossShardEviction_rejectsLowerFeeNewcomerErrTxPoolOverflow branch: newcomer is itself the global minimum, must be rejected; incumbents survive.
  • TestActPool_crossShardEviction_concurrent — concurrent multi-shard stress test for deadlock detection under -race.

Test plan

  • go build ./actpool/...
  • go test ./actpool/ -run "TestActPool_popLowestPriorityAcrossWorkers|TestHeadLess|TestActPool_crossShardEviction" -count=1
  • CI: full ./actpool/... suite with -race

🤖 Generated with Claude Code

When the action pool hits MaxNumActsPerPool, queueWorker.Handle previously
called worker.accountActs.PopPeek(), which pops the lowest-priority head
action from only the calling worker's shard. With _numWorker = 16 each
shard holds ~6% of accounts, so an attacker who saturates one shard can
force every honest sender in the other ~94% of shards to evict each
other's actions — the attacker's own actions are never the per-shard
minimum and never get popped.

Replace per-shard eviction with popLowestPriorityAcrossWorkers, which
locks every worker mu in increasing index order (no other path holds
more than one worker mu, so deadlock-free) and picks the global minimum
across all shards using headLess — a helper that mirrors
accountPriorityQueue.Less (nil-next > settled > gas price) so the
cross-shard victim matches what a single unified per-account heap would
have evicted. Restructure Handle so worker.mu is no longer held across
the global pop (would invert lock order).

Tests:
- direct unit tests for the global pop (selection across shards, empty
  pool no-op).
- headLess truth table mirroring accountPriorityQueue.Less so a future
  "simplification" of either function can't silently diverge.
- end-to-end Add path: low-fee shard A fills the pool, high-fee shard B
  newcomer; the cheap A action must be the victim.
- ErrTxPoolOverflow branch: newcomer is itself the global minimum, must
  be rejected; incumbents survive.
- concurrent multi-shard stress test for deadlock detection under -race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CoderZhi CoderZhi requested a review from a team as a code owner June 3, 2026 05:14
@sonarqubecloud

sonarqubecloud Bot commented Jun 3, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
15.9% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant