Context
Follow-up to #2945 / PR #2986, surfaced by the pr-review-toolkit pass on that PR.
#2945 added per-tick withEffectSpan("atlas.scheduler.<label>", ...) to the 8 named periodic cleanup fibers (oauth_state_cleanup, rate_limit_cleanup, demo_rate_limit_cleanup, contact_rate_limit_cleanup, abuse_cleanup, dashboard_rate_limit_cleanup, conversation_rate_sweep, share_token_cleanup) in packages/api/src/lib/effect/layers.ts, matching the BYOT span landed in #2949.
But makeSchedulerLive (and the surrounding startup DAG) has other withFiberDeathLog-only periodic fibers with the identical observability gap — a hung-but-not-crashed fiber is invisible in traces (no per-tick span, and the catchAllCause death log never fires on a hang). The reviewers flagged:
sub_processor_publisher (~layers.ts:1565) — same shape as the 8, no per-tick span. Most direct consistency gap.
settings_refresh, onboarding_email, expert_scheduler — periodic fibers, also span-less.
Note: the CRM/email outbox flushers + their stall watchdogs (lead_outbox_flusher/lead_outbox_watchdog, email_outbox_flusher/email_outbox_watchdog) already have a heartbeat-count + stall-watchdog liveness treatment (layers.ts:1655+), so they are observable by a different mechanism — assess whether a span adds value there or is redundant.
Acceptance criteria
Dependencies
Context
Follow-up to #2945 / PR #2986, surfaced by the pr-review-toolkit pass on that PR.
#2945 added per-tick
withEffectSpan("atlas.scheduler.<label>", ...)to the 8 named periodic cleanup fibers (oauth_state_cleanup, rate_limit_cleanup, demo_rate_limit_cleanup, contact_rate_limit_cleanup, abuse_cleanup, dashboard_rate_limit_cleanup, conversation_rate_sweep, share_token_cleanup) inpackages/api/src/lib/effect/layers.ts, matching the BYOT span landed in #2949.But
makeSchedulerLive(and the surrounding startup DAG) has otherwithFiberDeathLog-only periodic fibers with the identical observability gap — a hung-but-not-crashed fiber is invisible in traces (no per-tick span, and thecatchAllCausedeath log never fires on a hang). The reviewers flagged:sub_processor_publisher(~layers.ts:1565) — same shape as the 8, no per-tick span. Most direct consistency gap.settings_refresh,onboarding_email,expert_scheduler— periodic fibers, also span-less.Note: the CRM/email outbox flushers + their stall watchdogs (
lead_outbox_flusher/lead_outbox_watchdog,email_outbox_flusher/email_outbox_watchdog) already have a heartbeat-count + stall-watchdog liveness treatment (layers.ts:1655+), so they are observable by a different mechanism — assess whether a span adds value there or is redundant.Acceptance criteria
sub_processor_publisheremits a per-tick span under theatlas.scheduler.<label>convention (additive towithFiberDeathLog)settings_refresh/onboarding_email/expert_schedulershould also get spans (likely yes) and wire them if soSCHEDULER_CLEANUP_SPAN_NAMES-style single-source-of-truth + test-pin pattern from feat(observability): per-tick spans on the 8 periodic cleanup fibers (#2945) #2986atlas.scheduler.prefixDependencies