You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow-up from the PR #278 review thread on apps/provider/src/actions/Keys.ts:259.
Today MigrateKeysToAddressGroup only console.warns when TriggerAddressGroupMigrationSchedule() fails and still returns success: true. The recurring migration schedule (every 30s) self-discovers the migrated keys via remediationHistory and re-stakes them, so a failed trigger self-heals — but the operator has no visibility into whether the re-stake actually progressed. Throwing on trigger failure would wrongly report a failed migration (the DB migration did commit), so the right fix is observability, not a hard failure.
Proposal — track and surface the re-stake lifecycle per key
Replace the opaque "fire-and-forget trigger" with an explicit, operator-visible state:
Scheduled — migrated, queued for the recurring schedule to re-stake.
Retrying (#N) — last re-stake attempt failed; the schedule will try again. The operator sees both "it's scheduled" and "attempt N failed".
Succeeded — re-staked on-chain.
Failed — exhausted the attempt limit; stops auto-retrying, surfaced prominently + provider notification. A manual "retry" resets the counter and re-queues.
This reframes the trigger as a pure optimization (an early kick): the source of truth for "did the re-stake happen" is the recurring schedule's work, not the trigger call.
Open design questions (for team triage)
State storage — dedicated indexed migrationStatus + migrationAttempts columns on keys (easy to query "show me stuck keys") vs. encoding in remediationHistory. Leaning: status column for UI/queryability, remediationHistory for the per-attempt audit trail (error + timestamp).
Attempt limit N + cadence — fixed 30s vs. backoff. Each attempt costs gas; too low marks Failed on a transient blip, too high alerts late.
Terminal Failed handling — manual retry only, or auto-reset after a cooldown?
Context
Follow-up from the PR #278 review thread on
apps/provider/src/actions/Keys.ts:259.Today
MigrateKeysToAddressGrouponlyconsole.warns whenTriggerAddressGroupMigrationSchedule()fails and still returnssuccess: true. The recurring migration schedule (every 30s) self-discovers the migrated keys viaremediationHistoryand re-stakes them, so a failed trigger self-heals — but the operator has no visibility into whether the re-stake actually progressed. Throwing on trigger failure would wrongly report a failed migration (the DB migration did commit), so the right fix is observability, not a hard failure.Proposal — track and surface the re-stake lifecycle per key
Replace the opaque "fire-and-forget trigger" with an explicit, operator-visible state:
Scheduled— migrated, queued for the recurring schedule to re-stake.Retrying (#N)— last re-stake attempt failed; the schedule will try again. The operator sees both "it's scheduled" and "attempt N failed".Succeeded— re-staked on-chain.Failed— exhausted the attempt limit; stops auto-retrying, surfaced prominently + provider notification. A manual "retry" resets the counter and re-queues.This reframes the trigger as a pure optimization (an early kick): the source of truth for "did the re-stake happen" is the recurring schedule's work, not the trigger call.
Open design questions (for team triage)
migrationStatus+migrationAttemptscolumns onkeys(easy to query "show me stuck keys") vs. encoding inremediationHistory. Leaning: status column for UI/queryability,remediationHistoryfor the per-attempt audit trail (error + timestamp).N+ cadence — fixed 30s vs. backoff. Each attempt costs gas; too low marksFailedon a transient blip, too high alerts late.Failedhandling — manual retry only, or auto-reset after a cooldown?FOR UPDATE, ref PR fix: mark stake transactions correctly when they land more than one block after broadcast #278).Non-goal
Not blocking PR #278 — the current
console.warn+ self-heal is acceptable interim behavior.