Skip to content

perf(erpc:PLA-651): harden eth_getLogs concurrency and cache pressure#50

Open
0x666c6f wants to merge 4 commits intomorpho-mainfrom
feature/pla-651-erpc-harden-eth_getlogs-against-oom-improve-split-throughput
Open

perf(erpc:PLA-651): harden eth_getLogs concurrency and cache pressure#50
0x666c6f wants to merge 4 commits intomorpho-mainfrom
feature/pla-651-erpc-harden-eth_getlogs-against-oom-improve-split-throughput

Conversation

@0x666c6f
Copy link
Collaborator

Summary

  • bound eth_getLogs subrequest fanout and dedupe identical finalized subranges across concurrent requests
  • isolate eth_getLogs and eth_call with separate network runtime budgets and dedicated getLogs cache pressure guards
  • reduce oversized finalized eth_getLogs cache-write cost and dedupe async state-poller refresh triggers

Changes

  • add network-scoped shared semaphore + finalized-only singleflight in architecture/evm/eth_getLogs.go
  • add method-class execution budgets and dedicated getLogs cache read/write semaphores in erpc/networks.go
  • bypass envelope wrapping and skip oversized finalized Postgres writes in architecture/evm/json_rpc_cache.go
  • dedupe latest/finalized async poll triggers in architecture/evm/evm_state_poller.go
  • add focused regression coverage for getLogs pressure paths and budget isolation

Linear

@0x666c6f 0x666c6f self-assigned this Mar 10, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5b449dfac7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@0x666c6f
Copy link
Collaborator Author

Automated PR Review

Reviewed commit: 5b449df

Severity Count
Critical 0
High 3
Medium 6
Low 4

HIGH

1. architecture/evm/eth_getLogs.go:1058 — Logging nil srq before error check

BuildGetLogsRequest can return (nil, err), but logger.Debug().Object("request", srq) is called before the if err != nil guard on line 1062. If srq is nil, this may panic or log misleading output.

Move the log statement after the error check.


2. architecture/evm/eth_getLogs.go:1078 — Singleflight coalescing shares the leader's context

The singleflight leader's ctx is captured in the closure. If the leader's context is cancelled (client disconnect/timeout), all coalesced followers receive the cancellation error even though their own contexts may still be valid. Known singleflight anti-pattern.

Consider using a detached context (e.g., derived from appCtx) for the singleflight leader, or using singleflight.DoChan with per-caller context checks.


3. architecture/evm/evm_state_poller.go:511,530 — Async poll trigger errors silently discarded

TriggerLatestPollAsync and TriggerFinalizedPollAsync goroutines discard errors with _, _ = e.pollLatestForAsyncTrigger(ctx). If the poll function fails, there is zero logging. The package-level wrapper functions (lines 545, 560) have the same issue.

At minimum, add a debug/warn log when the error is non-nil.


MEDIUM

4. erpc/networks.go:102-113 — Redundant networkMaxInt/networkMinInt helpers

Go 1.25 has builtin min()/max() since Go 1.21. The codebase already uses builtin min in eth_getLogs.go:369.

Replace with max(a, b) / min(a, b) and delete the helpers.


5. architecture/evm/eth_getLogs.go:61 — Unreachable second limit <= 0 guard

By line 51, limit is already guaranteed >= 10. Config values on lines 55-58 can only increase it. The second guard is dead code.

Remove the redundant check.


6. architecture/evm/eth_getLogs.go:1104 — Singleflight followers share serialized bytes without copy

ParseFromBytes references (rather than copies) the underlying byte slice for results >64KB. All coalesced callers share the same serialized slice; downstream mutation could corrupt other callers' responses.

Add an explicit bytes.Clone() in deserializeGetLogsSubRequestExecution, or verify all downstream paths treat the bytes as immutable.


7. architecture/evm/eth_getLogs.go:70shouldCoalesceGetLogsSubRequest lacks unit tests

This function gates correctness-critical coalescing behavior. Missing negative test cases: unfinalized requests must NOT coalesce, and requests with UseUpstream directive must NOT coalesce.


8. erpc/networks.go:1218resp.JsonRpcResponse(ctx) error silently discarded

Force-materialization for the async cache writer discards the error. If materialization fails, a corrupt response may be cached.

Log the error at warn level.


9. erpc/networks.go:1215 — Misleading comment on shouldForceCacheMaterialization

The comment says "Keep common/smaller methods on the current fast path" but the function forces materialization work onto the foreground.

Reword to: "Force-materialize non-getLogs responses now so the cache-write goroutine reads stable data; skip getLogs to avoid blocking the response path."


LOW

10. architecture/evm/eth_getLogs.go:124 — Unnecessarily complex byte copy

append(exec.serialized[:0], buf.Bytes()...) when exec.serialized is nil. Simplify to bytes.Clone(buf.Bytes()).


11. architecture/evm/evm_state_poller.go:500,519 — No panic recovery in async trigger goroutines

Unlike other background goroutines in the codebase, these lack defer recover().


12. architecture/evm/evm_state_poller.go:550TriggerFinalizedPollAsync (package-level) is dead code

Defined but never called from production code.


13. architecture/evm/json_rpc_cache.go:57-58 — Magic constants lack documentation

largeFinalizedGetLogsEnvelopeBypassBytes (1 MiB) and largeFinalizedGetLogsPostgresSkipBytes (4 MiB) have no comments explaining the rationale.


This is an automated review by Claude.

@0x666c6f
Copy link
Collaborator Author

Addressed the latest automated review wave in e9e7ff0.

Fixed:

  • eth_getLogs coalescing now uses singleflight.DoChan with per-caller ctx.Done() handling
  • moved unsafe nil request logging behind the error check
  • cloned serialized coalesced payload bytes defensively before parse/share
  • added negative tests for finalized-only / UseUpstream coalescing gates
  • added async poll trigger error logging + panic recovery
  • stopped silently discarding non-getLogs materialization errors before async cache-set

@0x666c6f
Copy link
Collaborator Author

Automated PR Re-Review

Reviewed commit: 4b98942 (fix: address review follow-ups)

Severity Count
Critical 0
High 2
Medium 8
Low 3

Previously flagged issues — now FIXED

  • Logging nil srq before error check — error check now precedes log
  • Singleflight context sharing anti-pattern — uses context.WithoutCancel + serialize/deserialize
  • Missing shouldCoalesceGetLogsSubRequest tests — test added at line 1279
  • Misleading shouldForceCacheMaterialization comment — reworded accurately

HIGH

1. architecture/evm/eth_getLogs.go:203 — Inaccurate doc comment

Comment says "It does not modify the request or short-circuit; always returns (false, nil, nil)" but line 224 returns (true, nil, ErrInvalidRequest) when extractGetLogsMaxDataBytes fails.

Fix: update to "Returns (true, nil, err) only when maxDataBytes validation fails; otherwise returns (false, nil, nil)."


2. architecture/evm/eth_getLogs.go:361 — Misleading chunk alignment comment

Comment says "Align first chunk start to chunkSize boundary" but the code keeps the start at fromBlock — it's the chunk END that is aligned. The example is correct but contradicts the leading summary.

Fix: "Align chunk ends to chunkSize boundaries for deterministic cache keys"


MEDIUM

3. erpc/networks.go:102-113 — Redundant networkMaxInt/networkMinInt helpers (unfixed from v1)

Go 1.25 has builtin min()/max() since Go 1.21. Already used as min in eth_getLogs.go:369.

Replace with builtins and delete the helpers.


4. architecture/evm/eth_getLogs.go:61 — Unreachable dead code guard (unfixed from v1)

By line 51, limit is guaranteed >= 10. Config values can only increase it. The second if limit <= 0 is unreachable.

Remove the dead guard.


5. architecture/evm/eth_getLogs.go:1304-1326 — Redundant applySubMeta closure

applySubMeta writes fromCacheFlags[i] and cacheAts[i], then lines 1330-1331 immediately overwrite them with identical values.

Remove the applySubMeta closure definition and its invocation.


6. architecture/evm/evm_state_poller.go:511,530 — Async poll errors silently discarded (unfixed from v1)

_, _ = e.pollLatestForAsyncTrigger(ctx) and the finalized counterpart discard errors. The underlying functions log some errors internally, but the pattern is fragile if the test hook functions are ever used in production.

Add a debug-level error log in the goroutine body.


7. erpc/networks.go:1228 — Force-materialization error silently discarded (unfixed from v1)

_, _ = resp.JsonRpcResponse(ctx) discards the error. A failed materialization still passes the response to the async cache writer.

Log at debug level; consider skipping cache write on failure.


8. architecture/evm/json_rpc_cache.go:57-58 — Magic constants undocumented (unfixed from v1)

largeFinalizedGetLogsEnvelopeBypassBytes (1 MiB) and largeFinalizedGetLogsPostgresSkipBytes (4 MiB) lack rationale comments.


9. erpc/networks.go:141deriveGetLogsCacheWriteBudget untested

No direct test. A miscalculation could starve cache writes or overwhelm Postgres. Add a table-driven test.


10. architecture/evm/eth_getLogs.go:1122 — Sub-request timeout magic numbers undocumented

Four constants (10s default, 0.75 factor, 3s floor, 25s ceiling) with no comments explaining the timeout strategy.


LOW

11. architecture/evm/evm_state_poller.go:564TriggerFinalizedPollAsync (package-level) is dead code

Exported but has zero non-test callers.


12. architecture/evm/eth_getLogs.go:1083-1097 — Duplicate cleanup in singleflight leader

Error path and success path both release holder/free jrr. Could be consolidated into a single defer.


13. architecture/evm/eth_getLogs.go:133 — Explicit zero-value holder: nil in struct literal

Can be omitted per Go idioms.


This is an automated re-review by Claude. Previous review findings that were fixed are marked with strikethrough.

@0x666c6f
Copy link
Collaborator Author

Addressed the actionable items from the automated re-review in the latest push (e9e7ff0).

Covered there:

  • eth_getLogs request logging moved behind the request-build error check
  • coalesced payload serialization/deserialization now clones bytes defensively
  • added explicit negative tests for shouldCoalesceGetLogsSubRequest
  • added async poll trigger error logging + panic recovery
  • materialization errors before async cache-set are now logged
  • removed the redundant limit <= 0 guard in the shared semaphore setup

@0x666c6f
Copy link
Collaborator Author

Automated PR Re-Review (v3)

Reviewed commit: e9e7ff0 (fix: address automated review wave)

Severity Count
Critical 0
High 1
Medium 7
Low 3

Fixed since v2 review

  • Force-materialization error discarded — now logged at Warn level
  • Method-level async poll error discarding — now has Debug logging + panic recovery

HIGH

1. architecture/evm/eth_getLogs.go:203 — Inaccurate doc comment (unfixed from v2)

Comment says "always returns (false, nil, nil)" but line 224 returns (true, nil, ErrInvalidRequest).

Update to: "Returns (true, nil, err) if the request contains an invalid maxDataBytes directive; otherwise returns (false, nil, nil)."


MEDIUM

2. architecture/evm/eth_getLogs.go:361 — Misleading chunk alignment comment (unfixed from v2)

Says "Align first chunk start to chunkSize boundary" but it's the chunk END that is aligned. The code example is correct but contradicts the summary.

Fix: "Align chunk end to chunkSize boundary for deterministic cache keys"


3. architecture/evm/eth_getLogs.go:1304-1326 — Redundant applySubMeta closure (unfixed from v2)

Sets fromCacheFlags[i] and cacheAts[i], which are immediately overwritten with identical values on lines 1330-1331. The closure definition and call are dead code.


4. erpc/networks.go:102-113 — Redundant networkMaxInt/networkMinInt (unfixed from v2)

Go 1.25 builtins max()/min() should be used instead. 12 call sites to update.


5. architecture/evm/json_rpc_cache.go:57-58 — Magic constants undocumented (unfixed from v2)

largeFinalizedGetLogsEnvelopeBypassBytes (1 MiB) and largeFinalizedGetLogsPostgresSkipBytes (4 MiB) lack rationale comments at their definition site.


6. architecture/evm/evm_state_poller.go:559,574 — Package-level async poll fallback errors discarded (unfixed from v2)

_, _ = sp.PollLatestBlockNumber(ctx) in the package-level fallback paths has no logging, unlike the method-level versions which now properly log.

Add matching error logging + panic recovery to the fallback paths.


7. erpc/networks.go:141deriveGetLogsCacheWriteBudget untested (unfixed from v2)

No direct test. Budget miscalculation could starve cache writes or overwhelm Postgres.


8. architecture/evm/eth_getLogs.go:1084-1097 — Duplicate singleflight cleanup (unfixed from v2)

holder.Release + jrr.Free is copy-pasted for both error and success paths. Consolidate with a single defer.


LOW

9. architecture/evm/eth_getLogs.go:93 — Dead useUpstream field in singleflight key (new)

buildGetLogsSubRequestFlightKey includes useUpstream in the key, but it's only called when shouldCoalesceGetLogsSubRequest is true, which requires UseUpstream == "". So useUpstream is always empty in the key.


10. architecture/evm/evm_state_poller.go:549-581 — Package-level trigger functions untested

The type-assertion + fallback goroutine logic has no direct test coverage.


11. architecture/evm/eth_getLogs.go:1054maxSplitDepth comment lost during refactor

The explanatory comment ("bounds recursive binary splitting... depth 16 allows up to 2^16") was removed during extraction to executeSingleGetLogsSubRequest.


This is an automated re-review (v3) by Claude. Items marked "unfixed from v2" were flagged in the previous review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant