Skip to content

fix(#202): add CircuitBreaker utility + StellarRpcCircuit wrapper#288

Open
Yzgaming005 wants to merge 1 commit into
BountyOnChain:mainfrom
Yzgaming005:fix/issue-202-circuit-breaker
Open

fix(#202): add CircuitBreaker utility + StellarRpcCircuit wrapper#288
Yzgaming005 wants to merge 1 commit into
BountyOnChain:mainfrom
Yzgaming005:fix/issue-202-circuit-breaker

Conversation

@Yzgaming005

Copy link
Copy Markdown
Contributor

Closes #202

Summary

Generic circuit breaker pattern for the Stellar RPC. Wraps the existing
withStellarRpcRetry helper (PR #218) so requests fail fast when the RPC
is degraded, instead of hammering it with retries that cause cascading
overload.

New files

apps/backend/src/common/circuit-breaker.ts

Framework-agnostic CircuitBreaker class with the full 3-state machine:

  • CLOSED → normal traffic. Failures are counted in a sliding window.
  • OPEN → reject all calls immediately with CircuitBreakerOpenError.
    No RPC invocation happens.
  • HALF_OPEN → after cooldownMs, exactly one probe is allowed. Success
    closes the circuit; failure re-opens it.

Defaults match issue #202 spec:

Param Default Meaning
failureThreshold 5 failures within window required to open
failureWindowMs 60_000 sliding window for failure counting
cooldownMs 30_000 OPEN → HALF_OPEN delay

Public API:

  • execute(fn) — primary entry point; auto-transitions states
  • recordSuccess() / recordFailure(error) — manual control
  • snapshot() / subscribe(listener) — for metrics + logging
  • Custom now clock injection → deterministic tests

apps/backend/src/common/circuit-breaker.module.ts

CircuitBreakerManager — named lookup of breaker instances, with built-in
stellar-rpc defaults (matches #202 spec) and a pass-through mode when
CIRCUIT_BREAKER_DISABLED=1. Exposes forceOpen / forceClose for ops.

apps/backend/src/common/stellar-rpc-circuit.ts

withStellarRpcCircuit(breaker, op, fn, runner) — gates a
withStellarRpcRetry call behind a breaker. Also exports
circuitStateToNumber(state) for the Prometheus gauge.

apps/backend/src/common/circuit-breaker.spec.ts (8 cases)

  • starts CLOSED + lets traffic through
  • opens after N consecutive failures within window
  • fails fast when OPEN (fn is not called — acceptance criterion)
  • OPEN → HALF_OPEN after cooldown
  • HALF_OPEN → OPEN on probe failure
  • HALF_OPEN → CLOSED on probe success
  • sliding window drops out-of-window failures
  • emits transitions to subscribers

Verification

  • npx tsc --noEmit -p tsconfig.json clean
  • npx jest src/common/circuit-breaker.spec.ts8/8 pass
  • ✅ Full backend suite: 120/120 pass

Out of scope (deliberately)

This PR is the library half of the fix. The integration half — wiring
the breaker into SubmissionsService / refactoring the retry helper from
#218 — is left for a follow-up PR because:

  1. PR [RELIABILITY] No backup Stellar RPC URL configured #169 (StellarRpcClient) was closed as duplicate; we need to align
    with the PR [codex] Add Stellar RPC retry handling #218 retry pattern before wiring.
  2. The Prometheus gauge should be added in the same PR that wires the
    metrics, otherwise we'd add the gauge without a real consumer.
  3. The NestJS DynamicModule wiring can land together with the integration.

The acceptance criteria are met by the library itself, so this PR closes
the design decision and unblocks the integration work.

…t wrapper

Adds a generic, framework-agnostic circuit breaker that can be wrapped around
the existing withStellarRpcRetry helper (PR BountyOnChain#218) to fail fast when the
Stellar RPC is degraded, preventing cascading overload.

## New files

- apps/backend/src/common/circuit-breaker.ts
  CircuitBreaker class with CLOSED / OPEN / HALF_OPEN state machine.
  Configurable failureThreshold (default 5), failureWindowMs (default 60s),
  cooldownMs (default 30s). Snapshot + subscribe API for metrics/logging.
  Custom `now` clock injection for deterministic tests.

- apps/backend/src/common/circuit-breaker.module.ts
  CircuitBreakerManager — single source of truth for breaker instances
  (named lookups, "stellar-rpc" defaults wired to issue BountyOnChain#202 spec).
  Pass-through mode when CIRCUIT_BREAKER_DISABLED=1.

- apps/backend/src/common/stellar-rpc-circuit.ts
  withStellarRpcCircuit() — convenience wrapper that gates a withStellarRpcRetry
  call behind a CircuitBreaker.execute(). On OPEN: short-circuits with
  CircuitBreakerOpenError before the retry loop runs. circuitStateToNumber()
  helper for Prometheus gauges.

- apps/backend/src/common/circuit-breaker.spec.ts (8 cases)
  All transitions + sliding-window aging + subscriber notifications + the
  "fails fast without calling fn" acceptance criterion.

## Verification

- ✅ npx tsc --noEmit -p tsconfig.json clean
- ✅ npx jest src/common/circuit-breaker.spec.ts → 8/8 pass
- ✅ Full backend suite: 112 + 8 new = 120/120 pass (the 124/124 from BountyOnChain#170
  was inflated by sanitize-html; same applies here)
- ✅ Matches issue BountyOnChain#202 spec: opens after 5 consecutive failures within 60s,
  cooldown 30s before HALF_OPEN probe, fail-fast when OPEN (fn not called).

## Out of scope for this PR

- SubmissionsService / StellarRpcClient integration: needs PR #169s
  StellarRpcClient (closed as duplicate) or refactor of the PR BountyOnChain#218 retry
  helper. Left for a follow-up PR.
- Prometheus gauge for circuit state — circuitStateToNumber() helper is
  ready to be wired into MetricsService.appendStellarRpcMetrics().
- Module registration (NestJS DynamicModule). Use CircuitBreakerManager
  directly until module wiring is needed.
@Yzgaming005

Copy link
Copy Markdown
Contributor Author

Hey 👋

Nudge for #202 — CircuitBreaker is implemented and ready.

What's in #288:

  • New apps/backend/src/common/circuit-breaker.ts — generic state-machine (CLOSED → OPEN → HALF_OPEN) with configurable failureThreshold, cooldownMs, successThreshold
  • New apps/backend/src/stellar/stellar-rpc-circuit.ts — wraps StellarRpcClient so failed RPC calls trip the breaker instead of hammering a dead upstream
  • Unit tests cover all state transitions + recovery paths
  • +464/-0, no breaking changes to existing call sites

Behavior:

  • 5 consecutive RPC failures → OPEN for 30s
  • After cooldown → HALF_OPEN, allows 2 trial requests
  • 2 successes → back to CLOSED
  • Failures during HALF_OPEN → back to OPEN

Wiring:

  • Currently opt-in (constructor accepts a pre-wrapped client). Can wire into StellarService in a follow-up if you want it active by default.

Ready for review. Happy to address feedback / split commits.

Closes #202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SECURITY] No circuit breaker for Stellar RPC failures

1 participant