From 295f5aeb06b64e815ce6495bf064e8d32959d1c2 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 01:40:13 +0000 Subject: [PATCH 01/29] docs: add step interceptor design spec Co-Authored-By: Claude Sonnet 4.5 --- .../2026-05-06-step-interceptor-design.md | 369 ++++++++++++++++++ 1 file changed, 369 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-06-step-interceptor-design.md diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md new file mode 100644 index 0000000..409879d --- /dev/null +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -0,0 +1,369 @@ +# Step Interceptor Design + +**Date:** 2026-05-06 +**Status:** Draft +**Scope:** go-workflow structured observability via two-layer interceptor system + +--- + +## Why + +Currently, observability in go-workflow requires users to wire `BeforeStep`/`AfterStep` callbacks +manually on individual steps. There is no structured way to observe all steps globally — no +lifecycle events, no attempt count, no timing, no retry visibility. + +In production, you need to answer: which step is running right now? How many retries has step X +done? How long did step Y take? None of these are answerable today without bespoke instrumentation. + +This design introduces a two-layer interceptor system that: +- Provides global, structured observability across all steps +- Subsumes and extends `BeforeStep`/`AfterStep` without replacing them +- Propagates automatically into nested `SubWorkflow`s +- Ships with built-in `EventSink` adapters for slog, OTel, Prometheus + +--- + +## Concepts + +### StepStatus vs EventType + +These are deliberately separate types serving different consumers. + +**`StepStatus`** is the state machine used by the orchestration engine. It is persistent and +queryable. The `Condition` system reads it to decide whether to run downstream steps. + +**`EventType`** is a stream of instantaneous observations for external consumers (logs, traces, +metrics). It is fire-and-forget. + +The key difference: `Running` is a single `StepStatus` that spans the entire retry loop, but +within it multiple `Started` and `Retrying` events occur. They cannot be merged without breaking +the `Condition` system. + +``` +StepStatus: Pending ──────────────────────────────► Running ──────────────► Succeeded + └──► Failed + └──► Canceled + └─────────────────────────────────────────────────────────────► Skipped + └─────────────────────────────────────────────────────────────► Canceled + +EventType: Scheduled Started Retrying Started Retrying Started Succeeded/Failed/Canceled + [attempt 0] [attempt 1] [attempt 2] +``` + +Mapping of EventType to where it is emitted: + +| EventType | StepStatus transition | Emitted in | +|-------------|-------------------------------|-------------------------| +| `Scheduled` | `Pending → scheduled` | StepInterceptor entry | +| `Started` | status stays `Running` | AttemptInterceptor entry| +| `Retrying` | status stays `Running` | `RetryOption.Notify` | +| `Succeeded` | `Running → Succeeded` | StepInterceptor exit | +| `Failed` | `Running → Failed` | StepInterceptor exit | +| `Canceled` | `Running/Pending → Canceled` | StepInterceptor exit | +| `Skipped` | `Pending → Skipped` | StepInterceptor exit | + +`Failed` is **only** a terminal event. It is never emitted for a single failed attempt inside a +retry loop — that is covered by `Retrying`. + +--- + +## Architecture + +### Two-Layer Interceptor Stack + +``` +StepInterceptor[0] + └── StepInterceptor[1] + └── [retry loop — Notify wired here] + └── AttemptInterceptor[0] + └── AttemptInterceptor[1] + └── [per-step BeforeStep callbacks] ← from StepConfig + └── step.Do(ctx) + └── [per-step AfterStep callbacks] +``` + +**StepInterceptor** wraps the entire lifecycle of a step including all retry attempts. It sees +the step exactly once: entry on `Scheduled`, exit on terminal status. It is the right place for +OTel spans (one span per step, not per attempt) and step-level metrics. + +**AttemptInterceptor** wraps each individual attempt (`Before → Do → After`). It sees every +attempt, including retried ones. It is the right place for per-attempt logging and attempt-level +tracing. + +**BeforeStep/AfterStep** (existing) remain unchanged. They are implicitly the innermost layer of +the AttemptInterceptor stack — always present, always closest to `Do`. Users do not need to +change how they use them. + +### stepExecution (internal) + +The current anonymous goroutine in `tick()` is replaced by a `stepExecution` struct that owns +the full step lifecycle: + +```go +type stepExecution struct { + w *Workflow + step Steper + state *State + name string // precomputed flow.String(step) + attempt uint64 // single source of truth for attempt count + onRetry func(WorkflowEvent) // assembled during chain build +} +``` + +`attempt` is the single source of truth shared between `AttemptInfo` and `RetryOption.Notify`. +It is incremented inside `wireNotify` after each failed attempt, before `Retrying` is emitted. + +### tick() simplification + +`tick()` is reduced to a single responsibility: **atomically claiming a step** to prevent +double-spawning. All other logic moves into `stepExecution.run()`. + +```go +// tick() — before +if w.lease() { + state.SetStatus(Running) // claim + status in one + go func() { ... runStep ... }() +} + +// tick() — after +if w.lease() { + state.SetStatus(scheduled) // claim only (private sentinel) + w.waitGroup.Add(1) + go (&stepExecution{...}).run(ctx) +} +``` + +`scheduled` is a private `StepStatus` sentinel. It is never exposed to users or visible in +`Condition` evaluation. Its only purpose is to prevent `tick()` from spawning the same step +twice. + +Condition evaluation moves into `stepExecution.run()`. This is safe because by the time a step +is eligible to run, all its upstreams are terminated — their status cannot change. + +--- + +## API + +### New Types + +```go +// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). +// info.TerminalReason is Pending for steps that will execute normally. +// For Skipped or Canceled steps, TerminalReason is set and next must not be called. +type StepInterceptor interface { + InterceptStep(ctx context.Context, info StepInfo, next func(ctx context.Context) error) error +} + +// AttemptInterceptor intercepts each individual attempt (Before → Do → After). +type AttemptInterceptor interface { + InterceptAttempt(ctx context.Context, info AttemptInfo, next func(ctx context.Context) error) error +} + +// StepInterceptorFunc is a function adapter for StepInterceptor. +type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(ctx context.Context) error) error + +// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. +type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(ctx context.Context) error) error + +// StepInfo is passed to StepInterceptor. +type StepInfo struct { + Step Steper + Name string // precomputed flow.String(step) + TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute +} + +// AttemptInfo is passed to AttemptInterceptor. +type AttemptInfo struct { + StepInfo + Attempt uint64 + Start time.Time +} + +// EventType identifies a step lifecycle event. +type EventType string + +const ( + Scheduled EventType = "Scheduled" + Started EventType = "Started" + Retrying EventType = "Retrying" + Succeeded EventType = "Succeeded" + Failed EventType = "Failed" + Canceled EventType = "Canceled" + Skipped EventType = "Skipped" +) + +// WorkflowEvent carries information about a step lifecycle event. +type WorkflowEvent struct { + Step Steper + Name string + Type EventType + Attempt uint64 + Err error + Duration time.Duration + BackoffDuration time.Duration // non-zero only for Retrying +} + +// InterceptorReceiver is implemented by steps that contain a sub-workflow +// and need to receive interceptors from the parent workflow. +type InterceptorReceiver interface { + PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) +} +``` + +### Workflow struct additions + +```go +type Workflow struct { + // ... existing fields unchanged ... + + // StepInterceptors are called once per step, wrapping the full retry lifecycle. + // Executed in order: [0] is outermost, [len-1] is innermost. + StepInterceptors []StepInterceptor + + // AttemptInterceptors are called once per attempt, inside the retry loop. + // Executed in order: [0] is outermost, [len-1] is innermost. + // BeforeStep/AfterStep callbacks are always the innermost layer. + AttemptInterceptors []AttemptInterceptor +} +``` + +### Built-in EventSink adapters + +```go +// NewStepEventSink returns a StepInterceptor that emits Scheduled, Succeeded, +// Failed, Canceled, Skipped, and Retrying events to sink. +// It also implements onRetry so Retrying events (from wireNotify) reach sink. +func NewStepEventSink(sink func(WorkflowEvent)) *StepEventSinkInterceptor + +// NewAttemptEventSink returns an AttemptInterceptor that emits Started events to sink. +func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor +``` + +Usage examples: + +```go +// Structured logging only +w := &flow.Workflow{ + StepInterceptors: []flow.StepInterceptor{ + flow.NewStepEventSink(func(e flow.WorkflowEvent) { + slog.Info("step event", + "step", e.Name, "type", e.Type, + "attempt", e.Attempt, "err", e.Err, "duration", e.Duration, + ) + }), + }, +} + +// OTel span per step + per-attempt detail +w := &flow.Workflow{ + StepInterceptors: []flow.StepInterceptor{myOtelStepInterceptor}, + AttemptInterceptors: []flow.AttemptInterceptor{flow.NewAttemptEventSink(mySink)}, +} + +// Fan-out: multiple sinks via closure +w := &flow.Workflow{ + StepInterceptors: []flow.StepInterceptor{ + flow.NewStepEventSink(func(e flow.WorkflowEvent) { + promSink(e) + slogSink(e) + }), + }, +} +``` + +--- + +## SubWorkflow Propagation + +`SubWorkflow` implements `InterceptorReceiver`. Before each call to `step.Do()`, `stepExecution` +checks whether the step implements this interface and injects the parent's interceptors: + +```go +// in stepExecution.runAttempt(), before step.Do() +if recv, ok := ex.step.(InterceptorReceiver); ok { + recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) +} +``` + +`SubWorkflow.PrependInterceptors` prepends parent interceptors before its own, so the execution +stack for inner steps is: + +``` +[parent StepInterceptors] → [child StepInterceptors] → retry → [parent AttemptInterceptors] → [child AttemptInterceptors] → Before → Do → After +``` + +This is injected on every attempt because `SubWorkflow.Reset()` clears the inner workflow before +each `BuildStep()` call. + +--- + +## Retrying Event: Why It Bypasses the Interceptor Chain + +`Retrying` fires inside `backoff.RetryNotifyWithTimer`'s Notify callback, which sits between two +consecutive `next()` calls. At that point the interceptor chain's call stack has unwound (the +previous `next()` returned an error) and the next `next()` hasn't been called yet. There is no +natural place to insert it into the chain. + +The solution: `stepExecution.wireNotify()` wraps `RetryOption.Notify` and calls `ex.onRetry` +directly. `ex.onRetry` is assembled during chain construction by collecting the `sink` function +from any `*StepEventSinkInterceptor` in `StepInterceptors`. + +``` +attempt N fails → backoff.Notify fires → ex.onRetry(Retrying{attempt=N}) → ex.attempt++ +``` + +This keeps `Retrying` aligned with the same `attempt` counter used by `AttemptInfo`. + +--- + +## Skipped / Canceled in StepInterceptor + +Steps that are Skipped or Canceled by their `Condition` still enter the `StepInterceptor` chain. +`StepInfo.TerminalReason` carries the reason. The contract is: + +- If `TerminalReason != Pending`, the interceptor **must not** call `next`. +- The interceptor should emit `Scheduled` then `Skipped`/`Canceled` and return nil. +- The built-in `NewStepEventSink` handles this correctly. + +Custom interceptors that call `next` when `TerminalReason != Pending` will cause a panic (the +`next` function asserts this precondition). + +--- + +## What Does Not Change + +- `BeforeStep` / `AfterStep` / `Input` / `Output` — API and behavior unchanged +- `StepConfig`, `StepOption`, `RetryOption` — unchanged +- `StepStatus` — no new exported values; `scheduled` is private +- `Condition` system — unchanged +- `SubWorkflow` embedding pattern — unchanged, just gains `PrependInterceptors` +- No breaking changes to existing workflow definitions + +--- + +## Files Affected + +| File | Change | +|------|--------| +| `workflow.go` | Add `StepInterceptors`, `AttemptInterceptors` fields; simplify `tick()`; add `stepExecution` | +| `step.go` | Add interceptor interfaces, info types, `InterceptorReceiver` | +| `event.go` | New file: `EventType`, `WorkflowEvent`, `NewStepEventSink`, `NewAttemptEventSink` | +| `wrap.go` | `SubWorkflow` implements `InterceptorReceiver` | +| `retry.go` | Minor: expose `attempt` increment so `stepExecution` can own it | + +--- + +## Open Questions + +None. All questions from the brainstorm have been resolved: + +| Question | Resolution | +|----------|------------| +| EventSink vs Interceptor | Interceptor; EventSink becomes a built-in adapter | +| Per-step vs per-attempt | Both layers; different use cases | +| Skipped/Canceled visibility | Enter StepInterceptor chain via TerminalReason | +| SubWorkflow propagation | PrependInterceptors on InterceptorReceiver | +| Retrying event delivery | wireNotify + onRetry, bypasses chain by design | +| attempt counter ownership | stepExecution owns it; single source of truth | +| BeforeStep/AfterStep fate | Unchanged; implicit innermost AttemptInterceptor layer | +| Breaking changes | None | From c18c5b679886633b040d055a5e13eacf6d18abb3 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 02:03:22 +0000 Subject: [PATCH 02/29] docs: revise interceptor design spec based on review - Interceptor and BeforeStep/AfterStep are orthogonal (workflow-level vs step-level) - Remove precomputed Name from StepInfo/AttemptInfo/WorkflowEvent; Step pointer is the identifier - NewStepEventSink returns StepInterceptor (interface); retryNotifier is package-private - Remove retry.go from Files Affected (no changes needed) Co-Authored-By: Claude Sonnet 4.5 --- .../2026-05-06-step-interceptor-design.md | 48 ++++++++++++------- 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md index 409879d..1f44324 100644 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -17,7 +17,7 @@ done? How long did step Y take? None of these are answerable today without bespo This design introduces a two-layer interceptor system that: - Provides global, structured observability across all steps -- Subsumes and extends `BeforeStep`/`AfterStep` without replacing them +- Is orthogonal to `BeforeStep`/`AfterStep` — they serve different scopes and both are preserved - Propagates automatically into nested `SubWorkflow`s - Ships with built-in `EventSink` adapters for slog, OTel, Prometheus @@ -90,9 +90,11 @@ OTel spans (one span per step, not per attempt) and step-level metrics. attempt, including retried ones. It is the right place for per-attempt logging and attempt-level tracing. -**BeforeStep/AfterStep** (existing) remain unchanged. They are implicitly the innermost layer of -the AttemptInterceptor stack — always present, always closest to `Do`. Users do not need to -change how they use them. +**BeforeStep/AfterStep** (existing) are a different mechanism from Interceptors. Interceptors are +workflow-level and apply globally to all steps. BeforeStep/AfterStep are step-level and are +configured per-step via `StepConfig`. They are orthogonal: in the execution stack, Interceptors +execute on the outside, BeforeStep/AfterStep execute on the inside — but conceptually they belong +to different layers of the system and serve different purposes. Users configure them independently. ### stepExecution (internal) @@ -101,12 +103,11 @@ the full step lifecycle: ```go type stepExecution struct { - w *Workflow - step Steper - state *State - name string // precomputed flow.String(step) - attempt uint64 // single source of truth for attempt count - onRetry func(WorkflowEvent) // assembled during chain build + w *Workflow + step Steper + state *State + attempt uint64 // single source of truth for attempt count + onRetry func(WorkflowEvent) // assembled during chain build } ``` @@ -166,9 +167,13 @@ type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(ctx type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(ctx context.Context) error) error // StepInfo is passed to StepInterceptor. +// Step is the canonical identifier — it is the same pointer used as the map key +// in Workflow, stable for the lifetime of the workflow definition. +// Callers that need a human-readable name can call flow.String(info.Step). +// No name is precomputed by the framework; different sinks may have different +// naming preferences (short name, fully-qualified type, etc.). type StepInfo struct { Step Steper - Name string // precomputed flow.String(step) TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute } @@ -195,7 +200,6 @@ const ( // WorkflowEvent carries information about a step lifecycle event. type WorkflowEvent struct { Step Steper - Name string Type EventType Attempt uint64 Err error @@ -232,8 +236,10 @@ type Workflow struct { ```go // NewStepEventSink returns a StepInterceptor that emits Scheduled, Succeeded, // Failed, Canceled, Skipped, and Retrying events to sink. -// It also implements onRetry so Retrying events (from wireNotify) reach sink. -func NewStepEventSink(sink func(WorkflowEvent)) *StepEventSinkInterceptor +// The returned value also implements a package-private retryNotifier interface +// so that stepExecution can deliver Retrying events (which bypass the chain) to sink. +// This implementation detail is not visible to callers. +func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor // NewAttemptEventSink returns an AttemptInterceptor that emits Started events to sink. func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor @@ -247,7 +253,7 @@ w := &flow.Workflow{ StepInterceptors: []flow.StepInterceptor{ flow.NewStepEventSink(func(e flow.WorkflowEvent) { slog.Info("step event", - "step", e.Name, "type", e.Type, + "step", flow.String(e.Step), "type", e.Type, "attempt", e.Attempt, "err", e.Err, "duration", e.Duration, ) }), @@ -312,6 +318,10 @@ from any `*StepEventSinkInterceptor` in `StepInterceptors`. attempt N fails → backoff.Notify fires → ex.onRetry(Retrying{attempt=N}) → ex.attempt++ ``` +`ex.onRetry` is assembled during chain construction by collecting the package-private `retryNotifier` +interface from any interceptor in `StepInterceptors` that implements it. The concrete type returned +by `NewStepEventSink` implements this interface; custom interceptors do not need to. + This keeps `Retrying` aligned with the same `attempt` counter used by `AttemptInfo`. --- @@ -349,7 +359,6 @@ Custom interceptors that call `next` when `TerminalReason != Pending` will cause | `step.go` | Add interceptor interfaces, info types, `InterceptorReceiver` | | `event.go` | New file: `EventType`, `WorkflowEvent`, `NewStepEventSink`, `NewAttemptEventSink` | | `wrap.go` | `SubWorkflow` implements `InterceptorReceiver` | -| `retry.go` | Minor: expose `attempt` increment so `stepExecution` can own it | --- @@ -363,7 +372,10 @@ None. All questions from the brainstorm have been resolved: | Per-step vs per-attempt | Both layers; different use cases | | Skipped/Canceled visibility | Enter StepInterceptor chain via TerminalReason | | SubWorkflow propagation | PrependInterceptors on InterceptorReceiver | -| Retrying event delivery | wireNotify + onRetry, bypasses chain by design | +| Retrying event delivery | wireNotify + onRetry (private retryNotifier), bypasses chain by design | | attempt counter ownership | stepExecution owns it; single source of truth | -| BeforeStep/AfterStep fate | Unchanged; implicit innermost AttemptInterceptor layer | +| BeforeStep/AfterStep fate | Unchanged; orthogonal to Interceptors (step-level vs workflow-level) | +| Step identifier / name | No precomputed name; Step pointer is the identifier; callers call flow.String() | +| NewStepEventSink return type | Returns StepInterceptor (interface); retryNotifier is package-private | +| retry.go changes | None needed; stepExecution.attempt is independent | | Breaking changes | None | From 24898ecca71336bc810521c8ce03099bd8514bf5 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:19:55 +0000 Subject: [PATCH 03/29] docs: remove Start from AttemptInfo Interceptors that need timing should call time.Now() themselves. Start had ambiguous semantics (chain entry vs Do entry) and added no value that a one-liner couldn't provide. Co-Authored-By: Claude Sonnet 4.5 --- docs/superpowers/specs/2026-05-06-step-interceptor-design.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md index 1f44324..f2ba1f8 100644 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -178,10 +178,10 @@ type StepInfo struct { } // AttemptInfo is passed to AttemptInterceptor. +// Interceptors that need timing should record time.Now() at the top of InterceptAttempt. type AttemptInfo struct { StepInfo Attempt uint64 - Start time.Time } // EventType identifies a step lifecycle event. From c1ee7abc43a56fd90df9b867e3b6a5762487ea3a Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:26:25 +0000 Subject: [PATCH 04/29] docs: add step interceptor implementation plan Co-Authored-By: Claude Sonnet 4.5 --- .../plans/2026-05-06-step-interceptor.md | 1059 +++++++++++++++++ 1 file changed, 1059 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-06-step-interceptor.md diff --git a/docs/superpowers/plans/2026-05-06-step-interceptor.md b/docs/superpowers/plans/2026-05-06-step-interceptor.md new file mode 100644 index 0000000..4571d4e --- /dev/null +++ b/docs/superpowers/plans/2026-05-06-step-interceptor.md @@ -0,0 +1,1059 @@ +# Step Interceptor Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Add a two-layer interceptor system (`StepInterceptor` + `AttemptInterceptor`) to go-workflow, enabling structured global observability with built-in `EventSink` adapters. + +**Architecture:** Introduce `event.go` for public types, refactor `workflow.go` to extract `stepExecution` (replacing the anonymous goroutine in `tick()`), and add `InterceptorReceiver` to `SubWorkflow` for nested propagation. `BeforeStep`/`AfterStep` remain unchanged as step-level configuration; interceptors are workflow-level and orthogonal. + +**Tech Stack:** Go 1.23, `github.com/stretchr/testify`, `github.com/benbjohnson/clock` + +--- + +## File Map + +| File | Action | Responsibility | +|------|--------|----------------| +| `event.go` | **Create** | `EventType`, `WorkflowEvent`, `StepInterceptor`, `AttemptInterceptor`, `StepInterceptorFunc`, `AttemptInterceptorFunc`, `StepInfo`, `AttemptInfo`, `InterceptorReceiver`, `NewStepEventSink`, `NewAttemptEventSink`, private `retryNotifier` | +| `event_test.go` | **Create** | Tests for `NewStepEventSink` and `NewAttemptEventSink` | +| `workflow.go` | **Modify** | Add `StepInterceptors`/`AttemptInterceptors` fields; introduce `stepExecution`; simplify `tick()`; add `wireNotify` | +| `workflow_test.go` | **Modify** | Integration tests for interceptor ordering, SubWorkflow propagation, Retrying events | +| `wrap.go` | **Modify** | `SubWorkflow` implements `InterceptorReceiver` | +| `wrap_test.go` | **Modify** | Tests for interceptor propagation through SubWorkflow | + +--- + +## Task 1: Define public types in `event.go` + +**Files:** +- Create: `event.go` +- Create: `event_test.go` + +- [ ] **Step 1: Write the failing test** + +```go +// event_test.go +package flow + +import ( + "testing" + "github.com/stretchr/testify/assert" +) + +func TestEventTypeConstants(t *testing.T) { + // Verify all constants exist and are distinct + types := []EventType{Scheduled, Started, Retrying, Succeeded, Failed, Canceled, Skipped} + seen := map[EventType]bool{} + for _, et := range types { + assert.False(t, seen[et], "duplicate EventType: %q", et) + seen[et] = true + } +} + +func TestStepInterceptorFunc(t *testing.T) { + called := false + var ic StepInterceptor = StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + called = true + return next(ctx) + }) + _ = ic.InterceptStep(context.Background(), StepInfo{}, func(ctx context.Context) error { return nil }) + assert.True(t, called) +} + +func TestAttemptInterceptorFunc(t *testing.T) { + called := false + var ic AttemptInterceptor = AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + called = true + return next(ctx) + }) + _ = ic.InterceptAttempt(context.Background(), AttemptInfo{}, func(ctx context.Context) error { return nil }) + assert.True(t, called) +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +go test ./... -run "TestEventType|TestStepInterceptorFunc|TestAttemptInterceptorFunc" -v +``` + +Expected: FAIL — types not defined. + +- [ ] **Step 3: Write `event.go`** + +```go +package flow + +import ( + "context" + "time" +) + +// EventType identifies a step lifecycle event. +type EventType string + +const ( + Scheduled EventType = "Scheduled" + Started EventType = "Started" + Retrying EventType = "Retrying" + Succeeded EventType = "Succeeded" + Failed EventType = "Failed" + Canceled EventType = "Canceled" + Skipped EventType = "Skipped" +) + +// WorkflowEvent carries information about a step lifecycle event. +type WorkflowEvent struct { + Step Steper + Type EventType + Attempt uint64 + Err error + Duration time.Duration + BackoffDuration time.Duration // non-zero only for Retrying +} + +// StepInfo is passed to StepInterceptor. +// Step is the canonical identifier — same pointer used as map key in Workflow. +// Callers that need a human-readable name can call flow.String(info.Step). +type StepInfo struct { + Step Steper + TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute +} + +// AttemptInfo is passed to AttemptInterceptor. +// Interceptors that need timing should record time.Now() at the top of InterceptAttempt. +type AttemptInfo struct { + StepInfo + Attempt uint64 +} + +// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). +// If info.TerminalReason != Pending, next must not be called — the step will not execute. +// Return nil in that case after observing the event. +type StepInterceptor interface { + InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error +} + +// AttemptInterceptor intercepts each individual attempt (Before → Do → After). +type AttemptInterceptor interface { + InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +} + +// StepInterceptorFunc is a function adapter for StepInterceptor. +type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error + +func (f StepInterceptorFunc) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { + return f(ctx, info, next) +} + +// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. +type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error + +func (f AttemptInterceptorFunc) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + return f(ctx, info, next) +} + +// InterceptorReceiver is implemented by steps that contain a sub-workflow. +// stepExecution calls PrependInterceptors before each attempt so that +// parent interceptors wrap child interceptors. +type InterceptorReceiver interface { + PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) +} + +// retryNotifier is a package-private interface implemented by the concrete +// type returned by NewStepEventSink. stepExecution uses it to deliver +// Retrying events (which bypass the interceptor chain) to the sink. +type retryNotifier interface { + onRetry(WorkflowEvent) +} +``` + +Note: `event.go` also needs `"time"` in imports — add it. + +- [ ] **Step 4: Run tests to verify they pass** + +```bash +go test ./... -run "TestEventType|TestStepInterceptorFunc|TestAttemptInterceptorFunc" -v +``` + +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add event.go event_test.go +git commit -m "feat: add interceptor public types and EventType constants" +``` + +--- + +## Task 2: Built-in `NewStepEventSink` and `NewAttemptEventSink` + +**Files:** +- Modify: `event.go` +- Modify: `event_test.go` + +- [ ] **Step 1: Write failing tests** + +```go +// event_test.go — add these tests + +func TestNewStepEventSink_SucceededStep(t *testing.T) { + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + step := NoOp("a") + info := StepInfo{Step: step, TerminalReason: Pending} + err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { + return nil + }) + + assert.NoError(t, err) + assert.Len(t, events, 2) + assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, step, events[0].Step) + assert.Equal(t, Succeeded, events[1].Type) + assert.NotZero(t, events[1].Duration) +} + +func TestNewStepEventSink_FailedStep(t *testing.T) { + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + step := NoOp("a") + boom := errors.New("boom") + info := StepInfo{Step: step, TerminalReason: Pending} + err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { + return boom + }) + + assert.Equal(t, boom, err) + assert.Len(t, events, 2) + assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, Failed, events[1].Type) + assert.Equal(t, boom, events[1].Err) +} + +func TestNewStepEventSink_SkippedStep(t *testing.T) { + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + step := NoOp("a") + info := StepInfo{Step: step, TerminalReason: Skipped} + nextCalled := false + err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { + nextCalled = true + return nil + }) + + assert.NoError(t, err) + assert.False(t, nextCalled, "next must not be called for Skipped") + assert.Len(t, events, 2) + assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, Skipped, events[1].Type) +} + +func TestNewStepEventSink_OnRetry(t *testing.T) { + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + rn, ok := sink.(retryNotifier) + assert.True(t, ok, "NewStepEventSink should implement retryNotifier") + + boom := errors.New("boom") + rn.onRetry(WorkflowEvent{Type: Retrying, Attempt: 0, Err: boom, BackoffDuration: time.Second}) + + assert.Len(t, events, 1) + assert.Equal(t, Retrying, events[0].Type) + assert.Equal(t, boom, events[0].Err) +} + +func TestNewAttemptEventSink_EmitsStarted(t *testing.T) { + var events []WorkflowEvent + sink := NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + step := NoOp("a") + info := AttemptInfo{StepInfo: StepInfo{Step: step}, Attempt: 2} + err := sink.InterceptAttempt(context.Background(), info, func(ctx context.Context) error { + return nil + }) + + assert.NoError(t, err) + assert.Len(t, events, 1) + assert.Equal(t, Started, events[0].Type) + assert.Equal(t, uint64(2), events[0].Attempt) + assert.Equal(t, step, events[0].Step) +} +``` + +- [ ] **Step 2: Run tests to verify they fail** + +```bash +go test ./... -run "TestNewStepEventSink|TestNewAttemptEventSink" -v +``` + +Expected: FAIL — functions not defined. + +- [ ] **Step 3: Implement `NewStepEventSink` and `NewAttemptEventSink` in `event.go`** + +```go +// Add to event.go: + +// terminalEventType maps an error to the corresponding terminal EventType. +func terminalEventType(err error) EventType { + if err == nil { + return Succeeded + } + switch StatusFromError(err) { + case Canceled: + return Canceled + case Skipped: + return Skipped + default: + return Failed + } +} + +// stepEventSink is the concrete type returned by NewStepEventSink. +// It implements both StepInterceptor and the package-private retryNotifier. +type stepEventSink struct { + sink func(WorkflowEvent) +} + +// NewStepEventSink returns a StepInterceptor that emits Scheduled, then a terminal +// event (Succeeded/Failed/Canceled/Skipped) for every step. It also receives +// Retrying events via the package-private retryNotifier interface. +func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor { + return &stepEventSink{sink: sink} +} + +func (s *stepEventSink) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { + s.sink(WorkflowEvent{Step: info.Step, Type: Scheduled}) + + if info.TerminalReason != Pending { + s.sink(WorkflowEvent{Step: info.Step, Type: EventType(info.TerminalReason)}) + return nil + } + + start := time.Now() + err := next(ctx) + s.sink(WorkflowEvent{ + Step: info.Step, + Type: terminalEventType(err), + Err: err, + Duration: time.Since(start), + }) + return err +} + +func (s *stepEventSink) onRetry(e WorkflowEvent) { s.sink(e) } + +// NewAttemptEventSink returns an AttemptInterceptor that emits a Started event +// for each attempt. +func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor { + return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + sink(WorkflowEvent{ + Step: info.Step, + Type: Started, + Attempt: info.Attempt, + }) + return next(ctx) + }) +} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +```bash +go test ./... -run "TestNewStepEventSink|TestNewAttemptEventSink" -v +``` + +Expected: PASS. + +- [ ] **Step 5: Commit** + +```bash +git add event.go event_test.go +git commit -m "feat: add NewStepEventSink and NewAttemptEventSink" +``` + +--- + +## Task 3: Introduce `stepExecution` and refactor `tick()` + +This is the largest refactor. We replace the anonymous goroutine in `tick()` with a +`stepExecution` struct. `makeDoForStep` is deleted; its logic moves into +`stepExecution.runAttempt`. + +**Files:** +- Modify: `workflow.go` +- Modify: `workflow_test.go` + +- [ ] **Step 1: Write failing tests for `stepExecution` behavior** + +```go +// workflow_test.go — add these tests + +func TestStepExecution_BasicSuccess(t *testing.T) { + t.Parallel() + var events []WorkflowEvent + step := NoOp("a") + w := &Workflow{ + StepInterceptors: []StepInterceptor{ + NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), + }, + } + w.Add(Step(step)) + err := w.Do(context.Background()) + assert.NoError(t, err) + assert.Equal(t, []EventType{Scheduled, Succeeded}, eventTypes(events)) +} + +func TestStepExecution_StepInterceptorOrder(t *testing.T) { + t.Parallel() + var order []string + makeIC := func(name string) StepInterceptor { + return StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + order = append(order, name+":before") + err := next(ctx) + order = append(order, name+":after") + return err + }) + } + w := &Workflow{ + StepInterceptors: []StepInterceptor{makeIC("A"), makeIC("B")}, + } + w.Add(Step(NoOp("s"))) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []string{"A:before", "B:before", "B:after", "A:after"}, order) +} + +func TestStepExecution_AttemptInterceptorOrder(t *testing.T) { + t.Parallel() + var order []string + makeIC := func(name string) AttemptInterceptor { + return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + order = append(order, name+":before") + err := next(ctx) + order = append(order, name+":after") + return err + }) + } + w := &Workflow{ + AttemptInterceptors: []AttemptInterceptor{makeIC("X"), makeIC("Y")}, + } + w.Add(Step(NoOp("s"))) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []string{"X:before", "Y:before", "Y:after", "X:after"}, order) +} + +func TestStepExecution_SkippedStep(t *testing.T) { + t.Parallel() + var events []WorkflowEvent + step := NoOp("a") + w := &Workflow{ + StepInterceptors: []StepInterceptor{ + NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), + }, + } + w.Add(Step(step).When(func(_ context.Context, _ map[Steper]StepResult) StepStatus { + return Skipped + })) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []EventType{Scheduled, Skipped}, eventTypes(events)) +} + +func TestStepExecution_RetryingEvent(t *testing.T) { + t.Parallel() + var events []WorkflowEvent + boom := errors.New("boom") + attempts := 0 + step := Func("s", func(ctx context.Context) error { + attempts++ + if attempts < 3 { + return boom + } + return nil + }) + w := &Workflow{ + StepInterceptors: []StepInterceptor{ + NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), + }, + AttemptInterceptors: []AttemptInterceptor{ + NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }), + }, + } + w.Add(Step(step).Retry(func(o *RetryOption) { + o.Attempts = 3 + o.Backoff = &backoff.ZeroBackOff{} + })) + assert.NoError(t, w.Do(context.Background())) + types := eventTypes(events) + // Scheduled, Started(0), Retrying(0), Started(1), Retrying(1), Started(2), Succeeded + assert.Equal(t, []EventType{ + Scheduled, + Started, Retrying, + Started, Retrying, + Started, Succeeded, + }, types) + assert.Equal(t, []EventType{ + Scheduled, + Started, Retrying, + Started, Retrying, + Started, Succeeded, + }, types) +} + +// helper +func eventTypes(events []WorkflowEvent) []EventType { + types := make([]EventType, len(events)) + for i, e := range events { + types[i] = e.Type + } + return types +} +``` + +- [ ] **Step 2: Run tests to verify they fail** + +```bash +go test ./... -run "TestStepExecution" -v +``` + +Expected: FAIL — `StepInterceptors` field not defined. + +- [ ] **Step 3: Add fields to `Workflow` struct** + +In `workflow.go`, add two fields to the `Workflow` struct: + +```go +type Workflow struct { + MaxConcurrency int + DontPanic bool + SkipAsError bool + Clock clock.Clock + DefaultOption *StepOption + StepInterceptors []StepInterceptor // per-step global interceptors + AttemptInterceptors []AttemptInterceptor // per-attempt global interceptors + + StepBuilder + steps map[Steper]*State + statusChange *sync.Cond + leaseBucket chan struct{} + waitGroup sync.WaitGroup + isRunning sync.Mutex +} +``` + +- [ ] **Step 4: Add `stepExecution` struct and `scheduled` sentinel** + +Add after the `Workflow` struct in `workflow.go`: + +```go +// scheduled is a private StepStatus sentinel used by tick() to atomically +// claim a step and prevent double-spawning. It is never exposed to users. +const scheduled StepStatus = "scheduled" + +// stepExecution owns the full lifecycle of a single step run. +type stepExecution struct { + w *Workflow + step Steper + state *State + attempt uint64 // single source of truth shared by AttemptInfo and wireNotify + onRetry func(WorkflowEvent) // assembled from StepInterceptors that implement retryNotifier +} +``` + +- [ ] **Step 5: Implement `stepExecution.run()`** + +Add to `workflow.go`: + +```go +func (ex *stepExecution) run(ctx context.Context) { + defer ex.w.waitGroup.Done() + + // Evaluate condition now (safe: all upstreams are terminated at this point). + ups := ex.w.UpstreamOf(ex.step) + option := ex.state.Option() + cond := DefaultCondition + if option != nil && option.Condition != nil { + cond = option.Condition + } + + terminalReason := Pending + if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { + terminalReason = nextStatus + } + + info := StepInfo{Step: ex.step, TerminalReason: terminalReason} + + // Build StepInterceptor chain; also collect retryNotifiers for wireNotify. + var retrySinks []func(WorkflowEvent) + stepNext := ex.executeWithRetry + for i := len(ex.w.StepInterceptors) - 1; i >= 0; i-- { + ic := ex.w.StepInterceptors[i] + if rn, ok := ic.(retryNotifier); ok { + retrySinks = append(retrySinks, rn.onRetry) + } + next := stepNext + icLocal := ic + stepNext = func(ctx context.Context) error { + return icLocal.InterceptStep(ctx, info, next) + } + } + ex.onRetry = func(e WorkflowEvent) { + for _, s := range retrySinks { + s(e) + } + } + + var status StepStatus + var err error + + if terminalReason != Pending { + // Skipped or Canceled: run the chain (interceptors observe it), but + // executeWithRetry will never be called because chain was built with + // terminalReason set. The chain returns nil. + err = stepNext(ctx) + status = terminalReason + } else { + ex.state.SetStatus(Running) + err = stepNext(ctx) + status = statusFromError(err) + if status == Failed { + switch { + case DefaultIsCanceled(err), + errors.Is(err, context.Canceled), + errors.Is(err, context.DeadlineExceeded): + status = Canceled + } + } + } + + ex.state.SetStatus(status) + ex.state.SetError(err) + ex.w.unlease() + ex.w.signalStatusChange() +} +``` + +- [ ] **Step 6: Implement `stepExecution.executeWithRetry` and `stepExecution.runAttempt`** + +```go +// executeWithRetry is the bottom of the StepInterceptor chain. +// It wires Retrying events and drives the retry loop. +func (ex *stepExecution) executeWithRetry(ctx context.Context) error { + option := ex.state.Option() + ex.wireNotify(option) + + // Build AttemptInterceptor chain; innermost is runAttempt (Before→Do→After). + attemptChain := ex.buildAttemptChain() + + var notAfter time.Time + if option != nil && option.Timeout != nil { + notAfter = ex.w.Clock.Now().Add(*option.Timeout) + var cancel func() + ctx, cancel = ex.w.Clock.WithDeadline(ctx, notAfter) + defer cancel() + } + + return ex.w.retry(option.RetryOption)(ctx, attemptChain, notAfter) +} + +func (ex *stepExecution) buildAttemptChain() func(context.Context) error { + // Innermost: per-step Before callbacks → Do → After callbacks. + chain := func(ctx context.Context) error { + return ex.runAttempt(ctx) + } + for i := len(ex.w.AttemptInterceptors) - 1; i >= 0; i-- { + ic := ex.w.AttemptInterceptors[i] + next := chain + icLocal := ic + chain = func(ctx context.Context) error { + info := AttemptInfo{ + StepInfo: StepInfo{Step: ex.step}, + Attempt: ex.attempt, + } + return icLocal.InterceptAttempt(ctx, info, next) + } + } + return chain +} + +func (ex *stepExecution) runAttempt(ctx context.Context) error { + defer func() { ex.attempt++ }() + + // Propagate interceptors to SubWorkflow if applicable. + if recv, ok := ex.step.(InterceptorReceiver); ok { + recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) + } + + do := func(fn func() error) error { return fn() } + if ex.w.DontPanic { + do = catchPanicAsError + } + + var ctxStep context.Context + err := do(func() error { + ctxBefore, errBefore := ex.state.Before(ctx, ex.step, do) + ctxStep = ctxBefore + return errBefore + }) + if err != nil { + return ErrBeforeStep{err} + } + err = do(func() error { return ex.step.Do(ctxStep) }) + return do(func() error { return ex.state.After(ctxStep, ex.step, err) }) +} + +func (ex *stepExecution) wireNotify(option *StepOption) { + if option == nil || option.RetryOption == nil { + return + } + userNotify := option.RetryOption.Notify + option.RetryOption.Notify = func(err error, d time.Duration) { + e := WorkflowEvent{ + Step: ex.step, + Type: Retrying, + Attempt: ex.attempt, + Err: err, + BackoffDuration: d, + } + ex.attempt++ + if ex.onRetry != nil { + ex.onRetry(e) + } + if userNotify != nil { + userNotify(err, d) + } + } +} +``` + +Note: add a `statusFromError` helper in `workflow.go` (replaces the inline `StatusFromError` call): + +```go +func statusFromError(err error) StepStatus { + if err == nil { + return Succeeded + } + if s := StatusFromError(err); s != Failed { + return s + } + return Failed +} +``` + +- [ ] **Step 7: Simplify `tick()`** + +Replace the entire goroutine block in `tick()`: + +```go +// Before (remove this block): +state.SetStatus(Running) +w.waitGroup.Add(1) +go func(ctx context.Context, step Steper, state *State) { + // ... entire anonymous goroutine body ... +}(ctx, step, state) + +// After: +state.SetStatus(scheduled) +w.waitGroup.Add(1) +ex := &stepExecution{w: w, step: step, state: state} +go ex.run(ctx) +``` + +Also remove the `makeDoForStep` method from `workflow.go` entirely (its logic is now in `stepExecution.runAttempt`). + +And update `runStep` — it is now unused; remove it. Its timeout and retry logic moved into `executeWithRetry`. + +- [ ] **Step 8: Run all tests** + +```bash +go test ./... -v 2>&1 | tail -30 +``` + +Expected: All existing tests PASS, new `TestStepExecution_*` tests PASS. + +- [ ] **Step 9: Commit** + +```bash +git add workflow.go workflow_test.go event.go event_test.go +git commit -m "feat: introduce stepExecution, add StepInterceptors/AttemptInterceptors to Workflow" +``` + +--- + +## Task 4: `SubWorkflow` implements `InterceptorReceiver` + +**Files:** +- Modify: `wrap.go` +- Modify: `wrap_test.go` + +- [ ] **Step 1: Write failing test** + +```go +// wrap_test.go — add this test + +func TestSubWorkflow_InterceptorPropagation(t *testing.T) { + t.Parallel() + + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { + events = append(events, e) + }) + + innerStep := NoOp("inner") + type mySubStep struct{ SubWorkflow } + sub := &mySubStep{} + sub.Add(Step(innerStep)) + + w := &Workflow{ + StepInterceptors: []StepInterceptor{sink}, + } + w.Add(Step(sub)) + + assert.NoError(t, w.Do(context.Background())) + + // Expect events for both outer step (sub) and inner step (innerStep) + types := eventTypes(events) + assert.Contains(t, types, Scheduled) + assert.Contains(t, types, Succeeded) + // There should be at least 4 events: Scheduled+Succeeded for sub, Scheduled+Succeeded for innerStep + assert.GreaterOrEqual(t, len(events), 4) + // All events should have a non-nil Step + for _, e := range events { + assert.NotNil(t, e.Step) + } +} + +func TestSubWorkflow_ChildInterceptorPreserved(t *testing.T) { + t.Parallel() + + var parentEvents []WorkflowEvent + var childEvents []WorkflowEvent + + parentSink := NewStepEventSink(func(e WorkflowEvent) { parentEvents = append(parentEvents, e) }) + childSink := NewStepEventSink(func(e WorkflowEvent) { childEvents = append(childEvents, e) }) + + innerStep := NoOp("inner") + type mySubStep struct{ SubWorkflow } + sub := &mySubStep{} + sub.Add(Step(innerStep)) + // child-only interceptor + sub.w.StepInterceptors = []StepInterceptor{childSink} + + w := &Workflow{ + StepInterceptors: []StepInterceptor{parentSink}, + } + w.Add(Step(sub)) + + assert.NoError(t, w.Do(context.Background())) + + // Parent sees outer step + inner step (propagated) + assert.GreaterOrEqual(t, len(parentEvents), 4) + // Child sees inner step only + assert.GreaterOrEqual(t, len(childEvents), 2) +} +``` + +- [ ] **Step 2: Run tests to verify they fail** + +```bash +go test ./... -run "TestSubWorkflow_Interceptor" -v +``` + +Expected: FAIL — `SubWorkflow` does not implement `InterceptorReceiver`. + +- [ ] **Step 3: Implement `PrependInterceptors` on `SubWorkflow`** + +Add to `wrap.go`: + +```go +// PrependInterceptors implements InterceptorReceiver. +// Parent interceptors are prepended so they execute outside child interceptors. +func (s *SubWorkflow) PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) { + s.w.StepInterceptors = append(step, s.w.StepInterceptors...) + s.w.AttemptInterceptors = append(attempt, s.w.AttemptInterceptors...) +} +``` + +- [ ] **Step 4: Run tests to verify they pass** + +```bash +go test ./... -run "TestSubWorkflow_Interceptor" -v +``` + +Expected: PASS. + +- [ ] **Step 5: Run full test suite** + +```bash +go test ./... +``` + +Expected: All PASS. + +- [ ] **Step 6: Commit** + +```bash +git add wrap.go wrap_test.go +git commit -m "feat: SubWorkflow implements InterceptorReceiver for interceptor propagation" +``` + +--- + +## Task 5: Verify `retry()` integration with `wireNotify` + +This task tests the full retry + Retrying event pipeline end-to-end with real backoff. + +**Files:** +- Modify: `workflow_test.go` + +- [ ] **Step 1: Write failing test** + +```go +// workflow_test.go — add this test + +func TestStepExecution_RetryingEventAttemptNumbers(t *testing.T) { + t.Parallel() + + var events []WorkflowEvent + mu := sync.Mutex{} + record := func(e WorkflowEvent) { + mu.Lock() + events = append(events, e) + mu.Unlock() + } + + callCount := 0 + step := Func("flaky", func(ctx context.Context) error { + callCount++ + if callCount < 3 { + return errors.New("not yet") + } + return nil + }) + + w := &Workflow{ + StepInterceptors: []StepInterceptor{NewStepEventSink(record)}, + AttemptInterceptors: []AttemptInterceptor{NewAttemptEventSink(record)}, + } + w.Add(Step(step).Retry(func(o *RetryOption) { + o.Attempts = 5 + o.Backoff = &backoff.ZeroBackOff{} + })) + + assert.NoError(t, w.Do(context.Background())) + + types := eventTypes(events) + assert.Equal(t, []EventType{ + Scheduled, // StepInterceptor entry + Started, // attempt 0 + Retrying, // attempt 0 failed + Started, // attempt 1 + Retrying, // attempt 1 failed + Started, // attempt 2 succeeds + Succeeded, // StepInterceptor exit + }, types) + + // Verify attempt numbers in Retrying events + retryingEvents := filterEvents(events, Retrying) + assert.Equal(t, uint64(0), retryingEvents[0].Attempt) + assert.Equal(t, uint64(1), retryingEvents[1].Attempt) + + // Verify attempt numbers in Started events + startedEvents := filterEvents(events, Started) + assert.Equal(t, uint64(0), startedEvents[0].Attempt) + assert.Equal(t, uint64(1), startedEvents[1].Attempt) + assert.Equal(t, uint64(2), startedEvents[2].Attempt) +} + +// helpers (add once, reuse across tests) +func filterEvents(events []WorkflowEvent, t EventType) []WorkflowEvent { + var rv []WorkflowEvent + for _, e := range events { + if e.Type == t { + rv = append(rv, e) + } + } + return rv +} +``` + +- [ ] **Step 2: Run test to verify it fails** + +```bash +go test ./... -run "TestStepExecution_RetryingEventAttemptNumbers" -v +``` + +Expected: FAIL. + +- [ ] **Step 3: Run test to verify it passes (no code change needed)** + +This test should pass once Task 3 is complete. If it fails, there is a bug in `wireNotify` or attempt counter ordering — debug `stepExecution.wireNotify`. + +```bash +go test ./... -run "TestStepExecution_RetryingEventAttemptNumbers" -v +``` + +Expected: PASS. + +- [ ] **Step 4: Commit** + +```bash +git add workflow_test.go +git commit -m "test: verify Retrying event attempt numbers are correctly sequenced" +``` + +--- + +## Task 6: Final integration and cleanup + +**Files:** +- Modify: `workflow_test.go` +- Modify: `event_test.go` + +- [ ] **Step 1: Run full test suite including example package** + +```bash +go test ./... +``` + +Expected: All PASS with no race conditions. + +- [ ] **Step 2: Run with race detector** + +```bash +go test -race ./... +``` + +Expected: All PASS, no data race detected. + +- [ ] **Step 3: Verify zero-cost when no interceptors are set** + +```go +// workflow_test.go — add this test +func TestWorkflow_NoInterceptors_NoAlloc(t *testing.T) { + t.Parallel() + // Workflows without interceptors must not regress existing behavior. + step := NoOp("a") + w := &Workflow{} + w.Add(Step(step)) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, Succeeded, w.StateOf(step).GetStatus()) +} +``` + +```bash +go test ./... -run "TestWorkflow_NoInterceptors_NoAlloc" -v +``` + +Expected: PASS. + +- [ ] **Step 4: Final commit** + +```bash +git add -u +git commit -m "test: final integration tests and race detector clean" +``` From 11a1b439219b15ba88e59d657db55a09b0c8e5a2 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:30:19 +0000 Subject: [PATCH 05/29] feat: add interceptor public types and EventType constants --- event.go | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++ event_test.go | 38 +++++++++++++++++++++++ 2 files changed, 121 insertions(+) create mode 100644 event.go create mode 100644 event_test.go diff --git a/event.go b/event.go new file mode 100644 index 0000000..970e22c --- /dev/null +++ b/event.go @@ -0,0 +1,83 @@ +package flow + +import ( + "context" + "time" +) + +// EventType identifies a step lifecycle event. +// It reuses the same underlying type as StepStatus so that StepStatus constants +// (Succeeded, Failed, Canceled, Skipped) are directly usable as EventType values. +type EventType = StepStatus + +const ( + Scheduled EventType = "Scheduled" + Started EventType = "Started" + Retrying EventType = "Retrying" + // Succeeded, Failed, Canceled, Skipped are inherited from StepStatus in condition.go +) + +// WorkflowEvent carries information about a step lifecycle event. +type WorkflowEvent struct { + Step Steper + Type EventType + Attempt uint64 + Err error + Duration time.Duration + BackoffDuration time.Duration // non-zero only for Retrying +} + +// StepInfo is passed to StepInterceptor. +// Step is the canonical identifier — same pointer used as map key in Workflow. +// Callers that need a human-readable name can call flow.String(info.Step). +type StepInfo struct { + Step Steper + TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute +} + +// AttemptInfo is passed to AttemptInterceptor. +// Interceptors that need timing should record time.Now() at the top of InterceptAttempt. +type AttemptInfo struct { + StepInfo + Attempt uint64 +} + +// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). +// If info.TerminalReason != Pending, next must not be called — the step will not execute. +// Return nil in that case after observing the event. +type StepInterceptor interface { + InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error +} + +// AttemptInterceptor intercepts each individual attempt (Before → Do → After). +type AttemptInterceptor interface { + InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +} + +// StepInterceptorFunc is a function adapter for StepInterceptor. +type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error + +func (f StepInterceptorFunc) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { + return f(ctx, info, next) +} + +// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. +type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error + +func (f AttemptInterceptorFunc) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + return f(ctx, info, next) +} + +// InterceptorReceiver is implemented by steps that contain a sub-workflow. +// stepExecution calls PrependInterceptors before each attempt so that +// parent interceptors wrap child interceptors. +type InterceptorReceiver interface { + PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) +} + +// retryNotifier is a package-private interface implemented by the concrete +// type returned by NewStepEventSink. stepExecution uses it to deliver +// Retrying events (which bypass the interceptor chain) to the sink. +type retryNotifier interface { + onRetry(WorkflowEvent) +} diff --git a/event_test.go b/event_test.go new file mode 100644 index 0000000..db06698 --- /dev/null +++ b/event_test.go @@ -0,0 +1,38 @@ +package flow + +import ( + "context" + "testing" + + "github.com/stretchr/testify/assert" +) + +func TestEventTypeConstants(t *testing.T) { + // Verify all constants exist and are distinct + types := []EventType{Scheduled, Started, Retrying, Succeeded, Failed, Canceled, Skipped} + seen := map[EventType]bool{} + for _, et := range types { + assert.False(t, seen[et], "duplicate EventType: %q", et) + seen[et] = true + } +} + +func TestStepInterceptorFunc(t *testing.T) { + called := false + var ic StepInterceptor = StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + called = true + return next(ctx) + }) + _ = ic.InterceptStep(context.Background(), StepInfo{}, func(ctx context.Context) error { return nil }) + assert.True(t, called) +} + +func TestAttemptInterceptorFunc(t *testing.T) { + called := false + var ic AttemptInterceptor = AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + called = true + return next(ctx) + }) + _ = ic.InterceptAttempt(context.Background(), AttemptInfo{}, func(ctx context.Context) error { return nil }) + assert.True(t, called) +} From be8f7c85d9394a74fc504cb23bc0a8dde4db7ebb Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:34:32 +0000 Subject: [PATCH 06/29] feat: add NewStepEventSink and NewAttemptEventSink --- event.go | 60 ++++++++++++++++++++++++++++++++++ event_test.go | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 149 insertions(+) diff --git a/event.go b/event.go index 970e22c..d704d76 100644 --- a/event.go +++ b/event.go @@ -81,3 +81,63 @@ type InterceptorReceiver interface { type retryNotifier interface { onRetry(WorkflowEvent) } + +// terminalEventType maps an error to the corresponding terminal EventType. +func terminalEventType(err error) EventType { + if err == nil { + return Succeeded + } + switch StatusFromError(err) { + case Canceled: + return Canceled + case Skipped: + return Skipped + default: + return Failed + } +} + +// stepEventSink is the concrete type returned by NewStepEventSink. +type stepEventSink struct { + sink func(WorkflowEvent) +} + +// NewStepEventSink returns a StepInterceptor that emits Scheduled then a terminal +// event (Succeeded/Failed/Canceled/Skipped) for every step. +func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor { + return &stepEventSink{sink: sink} +} + +func (s *stepEventSink) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { + s.sink(WorkflowEvent{Step: info.Step, Type: Scheduled}) + + if info.TerminalReason != Pending { + s.sink(WorkflowEvent{Step: info.Step, Type: info.TerminalReason}) + return nil + } + + start := time.Now() + err := next(ctx) + s.sink(WorkflowEvent{ + Step: info.Step, + Type: terminalEventType(err), + Err: err, + Duration: time.Since(start), + }) + return err +} + +func (s *stepEventSink) onRetry(e WorkflowEvent) { s.sink(e) } + +// NewAttemptEventSink returns an AttemptInterceptor that emits a Started event +// for each attempt. +func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor { + return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + sink(WorkflowEvent{ + Step: info.Step, + Type: Started, + Attempt: info.Attempt, + }) + return next(ctx) + }) +} diff --git a/event_test.go b/event_test.go index db06698..89c6f19 100644 --- a/event_test.go +++ b/event_test.go @@ -2,7 +2,9 @@ package flow import ( "context" + "errors" "testing" + "time" "github.com/stretchr/testify/assert" ) @@ -36,3 +38,90 @@ func TestAttemptInterceptorFunc(t *testing.T) { _ = ic.InterceptAttempt(context.Background(), AttemptInfo{}, func(ctx context.Context) error { return nil }) assert.True(t, called) } + +func TestNewStepEventSink_SucceededStep(t *testing.T) { + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + step := NoOp("a") + info := StepInfo{Step: step, TerminalReason: Pending} + err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { + return nil + }) + + assert.NoError(t, err) + assert.Len(t, events, 2) + assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, step, events[0].Step) + assert.Equal(t, Succeeded, events[1].Type) + assert.NotZero(t, events[1].Duration) +} + +func TestNewStepEventSink_FailedStep(t *testing.T) { + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + step := NoOp("a") + boom := errors.New("boom") + info := StepInfo{Step: step, TerminalReason: Pending} + err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { + return boom + }) + + assert.Equal(t, boom, err) + assert.Len(t, events, 2) + assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, Failed, events[1].Type) + assert.Equal(t, boom, events[1].Err) +} + +func TestNewStepEventSink_SkippedStep(t *testing.T) { + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + step := NoOp("a") + info := StepInfo{Step: step, TerminalReason: Skipped} + nextCalled := false + err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { + nextCalled = true + return nil + }) + + assert.NoError(t, err) + assert.False(t, nextCalled, "next must not be called for Skipped") + assert.Len(t, events, 2) + assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, Skipped, events[1].Type) +} + +func TestNewStepEventSink_OnRetry(t *testing.T) { + var events []WorkflowEvent + sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + rn, ok := sink.(retryNotifier) + assert.True(t, ok, "NewStepEventSink should implement retryNotifier") + + boom := errors.New("boom") + rn.onRetry(WorkflowEvent{Type: Retrying, Attempt: 0, Err: boom, BackoffDuration: time.Second}) + + assert.Len(t, events, 1) + assert.Equal(t, Retrying, events[0].Type) + assert.Equal(t, boom, events[0].Err) +} + +func TestNewAttemptEventSink_EmitsStarted(t *testing.T) { + var events []WorkflowEvent + sink := NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }) + + step := NoOp("a") + info := AttemptInfo{StepInfo: StepInfo{Step: step}, Attempt: 2} + err := sink.InterceptAttempt(context.Background(), info, func(ctx context.Context) error { + return nil + }) + + assert.NoError(t, err) + assert.Len(t, events, 1) + assert.Equal(t, Started, events[0].Type) + assert.Equal(t, uint64(2), events[0].Attempt) + assert.Equal(t, step, events[0].Step) +} From 8c34a166028545f8ed356457646a73bb4501376c Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:40:30 +0000 Subject: [PATCH 07/29] feat: introduce stepExecution, add StepInterceptors/AttemptInterceptors to Workflow Co-Authored-By: Claude Sonnet 4.6 --- workflow.go | 248 ++++++++++++++++++++++++++++++++--------------- workflow_test.go | 118 ++++++++++++++++++++++ 2 files changed, 290 insertions(+), 76 deletions(-) diff --git a/workflow.go b/workflow.go index a455a25..bb2b0c1 100644 --- a/workflow.go +++ b/workflow.go @@ -44,6 +44,9 @@ type Workflow struct { Clock clock.Clock // Clock for retry and unit test DefaultOption *StepOption // DefaultOption is the default option for all Steps + StepInterceptors []StepInterceptor // per-step global interceptors + AttemptInterceptors []AttemptInterceptor // per-attempt global interceptors + StepBuilder // StepBuilder to call BuildStep() for Steps steps map[Steper]*State // the internal states of Steps @@ -303,6 +306,19 @@ func (w *Workflow) Do(ctx context.Context) error { } const scanned StepStatus = "scanned" // a private status for preflight + +// scheduled is a private StepStatus sentinel used by tick() to atomically +// claim a step and prevent double-spawning. Never exposed to users. +const scheduled StepStatus = "scheduled" + +type stepExecution struct { + w *Workflow + step Steper + state *State + attempt uint64 + onRetry func(WorkflowEvent) +} + func isAllUpstreamScanned(ups map[Steper]StepResult) bool { for _, up := range ups { if up.Status != scanned { @@ -377,58 +393,12 @@ func (w *Workflow) tick(ctx context.Context) bool { if isAnyUpstreamNotTerminated(ups) { continue } - option := state.Option() - cond := DefaultCondition - if option != nil && option.Condition != nil { - cond = option.Condition - } - // if condition is evaluated to terminate - if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { - state.SetStatus(nextStatus) - w.waitGroup.Add(1) - go func() { - defer w.waitGroup.Done() - w.signalStatusChange() // it locks w.statusChange.L, so we need another goroutine - }() - continue - } // kick off the Step if w.lease() { - state.SetStatus(Running) + state.SetStatus(scheduled) w.waitGroup.Add(1) - go func(ctx context.Context, step Steper, state *State) { - defer w.waitGroup.Done() - - var err error - status := Failed - defer func() { - state.SetStatus(status) - state.SetError(err) - // Release the lease BEFORE signalling, so that when the main - // loop wakes up in tick() it can immediately acquire a new lease. - // Previously unlease() was a separate earlier defer (LIFO), meaning - // signal fired first → tick() saw a full bucket → went back to - // Wait() → deadlock when MaxConcurrency=1 with chained steps. - w.unlease() - w.signalStatusChange() - }() - - err = w.runStep(ctx, step, state) - if err == nil { - status = Succeeded - return - } - status = StatusFromError(err) - if status == Failed { // do some extra checks - switch { - case - DefaultIsCanceled(err), - errors.Is(err, context.Canceled), - errors.Is(err, context.DeadlineExceeded): - status = Canceled - } - } - }(ctx, step, state) + ex := &stepExecution{w: w, step: step, state: state} + go ex.run(ctx) } } return false @@ -440,43 +410,169 @@ func (w *Workflow) signalStatusChange() { w.statusChange.Signal() } -func (w *Workflow) runStep(ctx context.Context, step Steper, state *State) error { - // set Step-level timeout for the Step +func (ex *stepExecution) run(ctx context.Context) { + defer ex.w.waitGroup.Done() + + ups := ex.w.UpstreamOf(ex.step) + option := ex.state.Option() + cond := DefaultCondition + if option != nil && option.Condition != nil { + cond = option.Condition + } + + terminalReason := Pending + if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { + terminalReason = nextStatus + } + + info := StepInfo{Step: ex.step, TerminalReason: terminalReason} + + // Build StepInterceptor chain; collect retryNotifiers. + // The innermost next is executeWithRetry for normal steps; a no-op for terminal steps + // (interceptors that observe terminalReason should not call next). + var retrySinks []func(WorkflowEvent) + var stepNext func(context.Context) error + if terminalReason == Pending { + stepNext = ex.executeWithRetry + } else { + stepNext = func(ctx context.Context) error { return nil } + } + for i := len(ex.w.StepInterceptors) - 1; i >= 0; i-- { + ic := ex.w.StepInterceptors[i] + if rn, ok := ic.(retryNotifier); ok { + retrySinks = append(retrySinks, rn.onRetry) + } + next := stepNext + icLocal := ic + stepNext = func(ctx context.Context) error { + return icLocal.InterceptStep(ctx, info, next) + } + } + ex.onRetry = func(e WorkflowEvent) { + for _, s := range retrySinks { + s(e) + } + } + + var status StepStatus + var err error + + if terminalReason != Pending { + err = stepNext(ctx) + status = terminalReason + } else { + ex.state.SetStatus(Running) + err = stepNext(ctx) + status = statusFromError(err) + if status == Failed { + switch { + case DefaultIsCanceled(err), + errors.Is(err, context.Canceled), + errors.Is(err, context.DeadlineExceeded): + status = Canceled + } + } + } + + ex.state.SetStatus(status) + ex.state.SetError(err) + ex.w.unlease() + ex.w.signalStatusChange() +} + +func (ex *stepExecution) executeWithRetry(ctx context.Context) error { + option := ex.state.Option() + ex.wireNotify(option) + + attemptChain := ex.buildAttemptChain() + var notAfter time.Time - option := state.Option() if option != nil && option.Timeout != nil { - notAfter = w.Clock.Now().Add(*option.Timeout) + notAfter = ex.w.Clock.Now().Add(*option.Timeout) var cancel func() - ctx, cancel = w.Clock.WithDeadline(ctx, notAfter) + ctx, cancel = ex.w.Clock.WithDeadline(ctx, notAfter) defer cancel() } - // run the Step with or without retry - do := w.makeDoForStep(step, state) - return w.retry(option.RetryOption)(ctx, do, notAfter) + + return ex.w.retry(option.RetryOption)(ctx, attemptChain, notAfter) } -// makeDoForStep is panic-free from Step's Do and Input. -func (w *Workflow) makeDoForStep(step Steper, state *State) func(ctx context.Context) error { - return func(root context.Context) error { - do := func(fn func() error) error { return fn() } - if w.DontPanic { - do = catchPanicAsError +func (ex *stepExecution) buildAttemptChain() func(context.Context) error { + chain := func(ctx context.Context) error { + return ex.runAttempt(ctx) + } + for i := len(ex.w.AttemptInterceptors) - 1; i >= 0; i-- { + ic := ex.w.AttemptInterceptors[i] + next := chain + icLocal := ic + chain = func(ctx context.Context) error { + info := AttemptInfo{ + StepInfo: StepInfo{Step: ex.step}, + Attempt: ex.attempt, + } + return icLocal.InterceptAttempt(ctx, info, next) } - // call Before callbacks - var ctxStep context.Context - err := do(func() error { - ctxBefore, errBefore := state.Before(root, step, do) // pass do to Before to guard each Before callback - ctxStep = ctxBefore // use the context returned by Before for the following Do - return errBefore - }) - if err != nil { - err = ErrBeforeStep{err} - } else { // only call step.Do if all Before callbacks succeed - err = do(func() error { return step.Do(ctxStep) }) // step.Do will not change ctxStep + } + return chain +} + +func (ex *stepExecution) runAttempt(ctx context.Context) error { + defer func() { ex.attempt++ }() + + if recv, ok := ex.step.(InterceptorReceiver); ok { + recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) + } + + do := func(fn func() error) error { return fn() } + if ex.w.DontPanic { + do = catchPanicAsError + } + + var ctxStep context.Context + err := do(func() error { + ctxBefore, errBefore := ex.state.Before(ctx, ex.step, do) + ctxStep = ctxBefore + return errBefore + }) + if err != nil { + err = ErrBeforeStep{err} + } else { + err = do(func() error { return ex.step.Do(ctxStep) }) + } + return do(func() error { return ex.state.After(ctxStep, ex.step, err) }) +} + +func (ex *stepExecution) wireNotify(option *StepOption) { + if option == nil || option.RetryOption == nil { + return + } + userNotify := option.RetryOption.Notify + option.RetryOption.Notify = func(err error, d time.Duration) { + e := WorkflowEvent{ + Step: ex.step, + Type: Retrying, + Attempt: ex.attempt, + Err: err, + BackoffDuration: d, } - // call After callbacks, will use the ctxStep for After callbacks - return do(func() error { return state.After(ctxStep, step, err) }) + ex.attempt++ + if ex.onRetry != nil { + ex.onRetry(e) + } + if userNotify != nil { + userNotify(err, d) + } + } +} + +func statusFromError(err error) StepStatus { + if err == nil { + return Succeeded + } + if s := StatusFromError(err); s != Failed { + return s } + return Failed } func (w *Workflow) lease() bool { diff --git a/workflow_test.go b/workflow_test.go index ae5168f..ca073b8 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -2,12 +2,14 @@ package flow import ( "context" + "errors" "fmt" "sync" "sync/atomic" "testing" "time" + "github.com/cenkalti/backoff/v4" "github.com/stretchr/testify/assert" ) @@ -303,3 +305,119 @@ func TestMaxConcurrencyDeadlockStress(t *testing.T) { } wg.Wait() } + +func TestStepExecution_BasicSuccess(t *testing.T) { + t.Parallel() + var events []WorkflowEvent + step := NoOp("a") + w := &Workflow{ + StepInterceptors: []StepInterceptor{ + NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), + }, + } + w.Add(Step(step)) + err := w.Do(context.Background()) + assert.NoError(t, err) + assert.Equal(t, []EventType{Scheduled, Succeeded}, eventTypes(events)) +} + +func TestStepExecution_StepInterceptorOrder(t *testing.T) { + t.Parallel() + var order []string + makeIC := func(name string) StepInterceptor { + return StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + order = append(order, name+":before") + err := next(ctx) + order = append(order, name+":after") + return err + }) + } + w := &Workflow{ + StepInterceptors: []StepInterceptor{makeIC("A"), makeIC("B")}, + } + w.Add(Step(NoOp("s"))) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []string{"A:before", "B:before", "B:after", "A:after"}, order) +} + +func TestStepExecution_AttemptInterceptorOrder(t *testing.T) { + t.Parallel() + var order []string + makeIC := func(name string) AttemptInterceptor { + return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + order = append(order, name+":before") + err := next(ctx) + order = append(order, name+":after") + return err + }) + } + w := &Workflow{ + AttemptInterceptors: []AttemptInterceptor{makeIC("X"), makeIC("Y")}, + } + w.Add(Step(NoOp("s"))) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []string{"X:before", "Y:before", "Y:after", "X:after"}, order) +} + +func TestStepExecution_SkippedStep(t *testing.T) { + t.Parallel() + var events []WorkflowEvent + step := NoOp("a") + w := &Workflow{ + StepInterceptors: []StepInterceptor{ + NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), + }, + } + w.Add(Step(step).When(func(_ context.Context, _ map[Steper]StepResult) StepStatus { + return Skipped + })) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []EventType{Scheduled, Skipped}, eventTypes(events)) +} + +func TestStepExecution_RetryingEvent(t *testing.T) { + t.Parallel() + var events []WorkflowEvent + mu := sync.Mutex{} + record := func(e WorkflowEvent) { + mu.Lock() + events = append(events, e) + mu.Unlock() + } + boom := errors.New("boom") + attempts := 0 + step := Func("s", func(ctx context.Context) error { + attempts++ + if attempts < 3 { + return boom + } + return nil + }) + w := &Workflow{ + StepInterceptors: []StepInterceptor{ + NewStepEventSink(record), + }, + AttemptInterceptors: []AttemptInterceptor{ + NewAttemptEventSink(record), + }, + } + w.Add(Step(step).Retry(func(o *RetryOption) { + o.Attempts = 3 + o.Backoff = &backoff.ZeroBackOff{} + })) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []EventType{ + Scheduled, + Started, Retrying, + Started, Retrying, + Started, Succeeded, + }, eventTypes(events)) +} + +func eventTypes(events []WorkflowEvent) []EventType { + types := make([]EventType, len(events)) + for i, e := range events { + types[i] = e.Type + } + return types +} From 15133134177047ec11d7022b6ed8fd29692112f1 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:42:21 +0000 Subject: [PATCH 08/29] feat: SubWorkflow implements InterceptorReceiver for interceptor propagation --- workflow.go | 7 +++++ wrap_test.go | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 82 insertions(+) diff --git a/workflow.go b/workflow.go index bb2b0c1..0bb78fb 100644 --- a/workflow.go +++ b/workflow.go @@ -646,3 +646,10 @@ func (s *SubWorkflow) Do(ctx context.Context) error { return s.w.Do(ctx) } // Reset resets the sub-workflow to ready for BuildStep() func (s *SubWorkflow) Reset() { s.w = Workflow{} } + +// PrependInterceptors implements InterceptorReceiver. +// Parent workflow interceptors are prepended so they execute outside child interceptors. +func (s *SubWorkflow) PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) { + s.w.StepInterceptors = append(step, s.w.StepInterceptors...) + s.w.AttemptInterceptors = append(attempt, s.w.AttemptInterceptors...) +} diff --git a/wrap_test.go b/wrap_test.go index df695bb..07bc1a2 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -3,6 +3,7 @@ package flow import ( "context" "strings" + "sync" "testing" "github.com/stretchr/testify/assert" @@ -150,3 +151,77 @@ func TestBuildStep(t *testing.T) { assert.Equal(t, []string{"Reset", "BuildStep"}, s.calls) }) } + +func TestSubWorkflow_InterceptorPropagation(t *testing.T) { + t.Parallel() + + var events []WorkflowEvent + mu := sync.Mutex{} + sink := NewStepEventSink(func(e WorkflowEvent) { + mu.Lock() + events = append(events, e) + mu.Unlock() + }) + + innerStep := NoOp("inner") + type mySubStep struct{ SubWorkflow } + sub := &mySubStep{} + sub.Add(Step(innerStep)) + + w := &Workflow{ + StepInterceptors: []StepInterceptor{sink}, + } + w.Add(Step(sub)) + + assert.NoError(t, w.Do(context.Background())) + + types := make([]EventType, len(events)) + for i, e := range events { + types[i] = e.Type + } + // At least 4 events: Scheduled+Succeeded for sub, Scheduled+Succeeded for innerStep + assert.GreaterOrEqual(t, len(events), 4) + assert.Contains(t, types, Scheduled) + assert.Contains(t, types, Succeeded) + for _, e := range events { + assert.NotNil(t, e.Step) + } +} + +func TestSubWorkflow_ChildInterceptorPreserved(t *testing.T) { + t.Parallel() + + var parentEvents []WorkflowEvent + var childEvents []WorkflowEvent + pmu := sync.Mutex{} + cmu := sync.Mutex{} + + parentSink := NewStepEventSink(func(e WorkflowEvent) { + pmu.Lock() + parentEvents = append(parentEvents, e) + pmu.Unlock() + }) + childSink := NewStepEventSink(func(e WorkflowEvent) { + cmu.Lock() + childEvents = append(childEvents, e) + cmu.Unlock() + }) + + innerStep := NoOp("inner") + type mySubStep struct{ SubWorkflow } + sub := &mySubStep{} + sub.Add(Step(innerStep)) + sub.w.StepInterceptors = []StepInterceptor{childSink} + + w := &Workflow{ + StepInterceptors: []StepInterceptor{parentSink}, + } + w.Add(Step(sub)) + + assert.NoError(t, w.Do(context.Background())) + + // Parent sees outer step (sub) + inner step (propagated) = at least 4 events + assert.GreaterOrEqual(t, len(parentEvents), 4) + // Child sees inner step only = at least 2 events + assert.GreaterOrEqual(t, len(childEvents), 2) +} From b6a24c18cf0cd94b8e668b60ef504cf9f10ce62d Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:46:44 +0000 Subject: [PATCH 09/29] test: verify Retrying event attempt numbers are correctly sequenced Also fix wireNotify to use ex.attempt-1 for Retrying events (since runAttempt's defer has already incremented ex.attempt by the time Notify fires) and remove the now-duplicate ex.attempt++ from wireNotify. Co-Authored-By: Claude Sonnet 4.6 --- workflow.go | 5 ++-- workflow_test.go | 61 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+), 2 deletions(-) diff --git a/workflow.go b/workflow.go index 0bb78fb..8430b9b 100644 --- a/workflow.go +++ b/workflow.go @@ -548,14 +548,15 @@ func (ex *stepExecution) wireNotify(option *StepOption) { } userNotify := option.RetryOption.Notify option.RetryOption.Notify = func(err error, d time.Duration) { + // ex.attempt has already been incremented by runAttempt's defer, + // so subtract 1 to get the attempt number that just failed. e := WorkflowEvent{ Step: ex.step, Type: Retrying, - Attempt: ex.attempt, + Attempt: ex.attempt - 1, Err: err, BackoffDuration: d, } - ex.attempt++ if ex.onRetry != nil { ex.onRetry(e) } diff --git a/workflow_test.go b/workflow_test.go index ca073b8..9ffab16 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -421,3 +421,64 @@ func eventTypes(events []WorkflowEvent) []EventType { } return types } + +func TestStepExecution_RetryingEventAttemptNumbers(t *testing.T) { + t.Parallel() + + var events []WorkflowEvent + mu := sync.Mutex{} + record := func(e WorkflowEvent) { + mu.Lock() + events = append(events, e) + mu.Unlock() + } + + callCount := 0 + step := Func("flaky", func(ctx context.Context) error { + callCount++ + if callCount < 3 { + return errors.New("not yet") + } + return nil + }) + + w := &Workflow{ + StepInterceptors: []StepInterceptor{NewStepEventSink(record)}, + AttemptInterceptors: []AttemptInterceptor{NewAttemptEventSink(record)}, + } + w.Add(Step(step).Retry(func(o *RetryOption) { + o.Attempts = 5 + o.Backoff = &backoff.ZeroBackOff{} + })) + + assert.NoError(t, w.Do(context.Background())) + + assert.Equal(t, []EventType{ + Scheduled, + Started, // attempt 0 + Retrying, // attempt 0 failed + Started, // attempt 1 + Retrying, // attempt 1 failed + Started, // attempt 2 succeeds + Succeeded, + }, eventTypes(events)) + + retryingEvents := filterEvents(events, Retrying) + assert.Equal(t, uint64(0), retryingEvents[0].Attempt) + assert.Equal(t, uint64(1), retryingEvents[1].Attempt) + + startedEvents := filterEvents(events, Started) + assert.Equal(t, uint64(0), startedEvents[0].Attempt) + assert.Equal(t, uint64(1), startedEvents[1].Attempt) + assert.Equal(t, uint64(2), startedEvents[2].Attempt) +} + +func filterEvents(events []WorkflowEvent, et EventType) []WorkflowEvent { + var rv []WorkflowEvent + for _, e := range events { + if e.Type == et { + rv = append(rv, e) + } + } + return rv +} From d44b88b36f28400c1f73c071e2155f31bfdbabd6 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:47:53 +0000 Subject: [PATCH 10/29] test: add no-interceptor regression test and final race check --- workflow_test.go | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/workflow_test.go b/workflow_test.go index 9ffab16..d47afe1 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -473,6 +473,16 @@ func TestStepExecution_RetryingEventAttemptNumbers(t *testing.T) { assert.Equal(t, uint64(2), startedEvents[2].Attempt) } +func TestWorkflow_NoInterceptors_NoRegression(t *testing.T) { + t.Parallel() + // Workflows without interceptors must not regress existing behaviour. + step := NoOp("a") + w := &Workflow{} + w.Add(Step(step)) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, Succeeded, w.StateOf(step).GetStatus()) +} + func filterEvents(events []WorkflowEvent, et EventType) []WorkflowEvent { var rv []WorkflowEvent for _, e := range events { From 685d5ab7d8aba50689065ee35263325eaf2e335d Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:52:08 +0000 Subject: [PATCH 11/29] fix: make EventType a distinct named type, use prefixed terminal constants --- event.go | 41 +++++++++++++++++++++++++++++------------ event_test.go | 8 ++++---- workflow_test.go | 8 ++++---- wrap_test.go | 2 +- 4 files changed, 38 insertions(+), 21 deletions(-) diff --git a/event.go b/event.go index d704d76..afc901a 100644 --- a/event.go +++ b/event.go @@ -6,15 +6,16 @@ import ( ) // EventType identifies a step lifecycle event. -// It reuses the same underlying type as StepStatus so that StepStatus constants -// (Succeeded, Failed, Canceled, Skipped) are directly usable as EventType values. -type EventType = StepStatus +type EventType string const ( - Scheduled EventType = "Scheduled" - Started EventType = "Started" - Retrying EventType = "Retrying" - // Succeeded, Failed, Canceled, Skipped are inherited from StepStatus in condition.go + Scheduled EventType = "Scheduled" + Started EventType = "Started" + Retrying EventType = "Retrying" + EventSucceeded EventType = "Succeeded" + EventFailed EventType = "Failed" + EventCanceled EventType = "Canceled" + EventSkipped EventType = "Skipped" ) // WorkflowEvent carries information about a step lifecycle event. @@ -85,15 +86,31 @@ type retryNotifier interface { // terminalEventType maps an error to the corresponding terminal EventType. func terminalEventType(err error) EventType { if err == nil { - return Succeeded + return EventSucceeded } switch StatusFromError(err) { case Canceled: - return Canceled + return EventCanceled case Skipped: - return Skipped + return EventSkipped default: - return Failed + return EventFailed + } +} + +// terminalStepStatusToEventType converts a terminal StepStatus to its EventType counterpart. +func terminalStepStatusToEventType(s StepStatus) EventType { + switch s { + case Succeeded: + return EventSucceeded + case Failed: + return EventFailed + case Canceled: + return EventCanceled + case Skipped: + return EventSkipped + default: + return EventFailed } } @@ -112,7 +129,7 @@ func (s *stepEventSink) InterceptStep(ctx context.Context, info StepInfo, next f s.sink(WorkflowEvent{Step: info.Step, Type: Scheduled}) if info.TerminalReason != Pending { - s.sink(WorkflowEvent{Step: info.Step, Type: info.TerminalReason}) + s.sink(WorkflowEvent{Step: info.Step, Type: terminalStepStatusToEventType(info.TerminalReason)}) return nil } diff --git a/event_test.go b/event_test.go index 89c6f19..b628f1f 100644 --- a/event_test.go +++ b/event_test.go @@ -11,7 +11,7 @@ import ( func TestEventTypeConstants(t *testing.T) { // Verify all constants exist and are distinct - types := []EventType{Scheduled, Started, Retrying, Succeeded, Failed, Canceled, Skipped} + types := []EventType{Scheduled, Started, Retrying, EventSucceeded, EventFailed, EventCanceled, EventSkipped} seen := map[EventType]bool{} for _, et := range types { assert.False(t, seen[et], "duplicate EventType: %q", et) @@ -53,7 +53,7 @@ func TestNewStepEventSink_SucceededStep(t *testing.T) { assert.Len(t, events, 2) assert.Equal(t, Scheduled, events[0].Type) assert.Equal(t, step, events[0].Step) - assert.Equal(t, Succeeded, events[1].Type) + assert.Equal(t, EventSucceeded, events[1].Type) assert.NotZero(t, events[1].Duration) } @@ -71,7 +71,7 @@ func TestNewStepEventSink_FailedStep(t *testing.T) { assert.Equal(t, boom, err) assert.Len(t, events, 2) assert.Equal(t, Scheduled, events[0].Type) - assert.Equal(t, Failed, events[1].Type) + assert.Equal(t, EventFailed, events[1].Type) assert.Equal(t, boom, events[1].Err) } @@ -91,7 +91,7 @@ func TestNewStepEventSink_SkippedStep(t *testing.T) { assert.False(t, nextCalled, "next must not be called for Skipped") assert.Len(t, events, 2) assert.Equal(t, Scheduled, events[0].Type) - assert.Equal(t, Skipped, events[1].Type) + assert.Equal(t, EventSkipped, events[1].Type) } func TestNewStepEventSink_OnRetry(t *testing.T) { diff --git a/workflow_test.go b/workflow_test.go index d47afe1..7a9948e 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -318,7 +318,7 @@ func TestStepExecution_BasicSuccess(t *testing.T) { w.Add(Step(step)) err := w.Do(context.Background()) assert.NoError(t, err) - assert.Equal(t, []EventType{Scheduled, Succeeded}, eventTypes(events)) + assert.Equal(t, []EventType{Scheduled, EventSucceeded}, eventTypes(events)) } func TestStepExecution_StepInterceptorOrder(t *testing.T) { @@ -372,7 +372,7 @@ func TestStepExecution_SkippedStep(t *testing.T) { return Skipped })) assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, []EventType{Scheduled, Skipped}, eventTypes(events)) + assert.Equal(t, []EventType{Scheduled, EventSkipped}, eventTypes(events)) } func TestStepExecution_RetryingEvent(t *testing.T) { @@ -410,7 +410,7 @@ func TestStepExecution_RetryingEvent(t *testing.T) { Scheduled, Started, Retrying, Started, Retrying, - Started, Succeeded, + Started, EventSucceeded, }, eventTypes(events)) } @@ -460,7 +460,7 @@ func TestStepExecution_RetryingEventAttemptNumbers(t *testing.T) { Started, // attempt 1 Retrying, // attempt 1 failed Started, // attempt 2 succeeds - Succeeded, + EventSucceeded, }, eventTypes(events)) retryingEvents := filterEvents(events, Retrying) diff --git a/wrap_test.go b/wrap_test.go index 07bc1a2..dbecd91 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -182,7 +182,7 @@ func TestSubWorkflow_InterceptorPropagation(t *testing.T) { // At least 4 events: Scheduled+Succeeded for sub, Scheduled+Succeeded for innerStep assert.GreaterOrEqual(t, len(events), 4) assert.Contains(t, types, Scheduled) - assert.Contains(t, types, Succeeded) + assert.Contains(t, types, EventSucceeded) for _, e := range events { assert.NotNil(t, e.Step) } From 3329239d46d2bedd4cd78c3897eeacf5fd0cc693 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 04:56:39 +0000 Subject: [PATCH 12/29] fix: PrependInterceptors once per step, not per attempt; add regression test Co-Authored-By: Claude Sonnet 4.6 --- workflow.go | 12 ++++++++---- wrap_test.go | 37 +++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+), 4 deletions(-) diff --git a/workflow.go b/workflow.go index 8430b9b..b4c9ee9 100644 --- a/workflow.go +++ b/workflow.go @@ -482,6 +482,12 @@ func (ex *stepExecution) run(ctx context.Context) { func (ex *stepExecution) executeWithRetry(ctx context.Context) error { option := ex.state.Option() + + // Propagate interceptors to SubWorkflow once — before the retry loop starts. + if recv, ok := ex.step.(InterceptorReceiver); ok { + recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) + } + ex.wireNotify(option) attemptChain := ex.buildAttemptChain() @@ -519,10 +525,6 @@ func (ex *stepExecution) buildAttemptChain() func(context.Context) error { func (ex *stepExecution) runAttempt(ctx context.Context) error { defer func() { ex.attempt++ }() - if recv, ok := ex.step.(InterceptorReceiver); ok { - recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) - } - do := func(fn func() error) error { return fn() } if ex.w.DontPanic { do = catchPanicAsError @@ -546,6 +548,8 @@ func (ex *stepExecution) wireNotify(option *StepOption) { if option == nil || option.RetryOption == nil { return } + // option is a fresh copy from State.Option() each run — safe to mutate. + // State.Option() allocates new StepOption and RetryOption on every call. userNotify := option.RetryOption.Notify option.RetryOption.Notify = func(err error, d time.Duration) { // ex.attempt has already been incremented by runAttempt's defer, diff --git a/wrap_test.go b/wrap_test.go index dbecd91..fc2824f 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -2,10 +2,13 @@ package flow import ( "context" + "errors" "strings" "sync" + "sync/atomic" "testing" + "github.com/cenkalti/backoff/v4" "github.com/stretchr/testify/assert" ) @@ -225,3 +228,37 @@ func TestSubWorkflow_ChildInterceptorPreserved(t *testing.T) { // Child sees inner step only = at least 2 events assert.GreaterOrEqual(t, len(childEvents), 2) } + +func TestSubWorkflow_InterceptorNotDuplicatedOnRetry(t *testing.T) { + t.Parallel() + + var count atomic.Int32 + sink := StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + count.Add(1) + return next(ctx) + }) + + attempts := 0 + inner := Func("inner", func(ctx context.Context) error { + attempts++ + if attempts < 2 { + return errors.New("fail once") + } + return nil + }) + + type mySubStep struct{ SubWorkflow } + sub := &mySubStep{} + sub.Add(Step(inner).Retry(func(o *RetryOption) { + o.Attempts = 3 + o.Backoff = &backoff.ZeroBackOff{} + })) + + w := &Workflow{StepInterceptors: []StepInterceptor{sink}} + w.Add(Step(sub)) + assert.NoError(t, w.Do(context.Background())) + + // parent interceptor must fire exactly twice: + // once for the outer sub step, once for the inner step (regardless of retry count). + assert.Equal(t, int32(2), count.Load()) +} From 90740847c1b069b3e6807d446cf88d45ea0011c6 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 05:11:48 +0000 Subject: [PATCH 13/29] chore: fill and archive openspec change structured-event-sink Co-Authored-By: Claude Sonnet 4.5 --- .../.openspec.yaml | 2 + .../design.md | 109 ++++++++++++++ .../proposal.md | 75 ++++++++++ .../specs/step-interceptor/spec.md | 137 ++++++++++++++++++ .../2026-05-06-structured-event-sink/tasks.md | 33 +++++ 5 files changed, 356 insertions(+) create mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/.openspec.yaml create mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/design.md create mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/proposal.md create mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/specs/step-interceptor/spec.md create mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/tasks.md diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/.openspec.yaml b/openspec/changes/archive/2026-05-06-structured-event-sink/.openspec.yaml new file mode 100644 index 0000000..905325f --- /dev/null +++ b/openspec/changes/archive/2026-05-06-structured-event-sink/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-05-04 diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/design.md b/openspec/changes/archive/2026-05-06-structured-event-sink/design.md new file mode 100644 index 0000000..e7885a8 --- /dev/null +++ b/openspec/changes/archive/2026-05-06-structured-event-sink/design.md @@ -0,0 +1,109 @@ +# Step Interceptor Design + +## Summary + +The original proposal called for a simple `EventSink func(WorkflowEvent)` field on `Workflow`. +During design exploration, this evolved into a two-layer interceptor system that is strictly more +powerful: EventSink becomes a built-in adapter on top of the interceptor API. + +The key insight from studying Temporal's Go SDK: a global observability hook is most useful when +it wraps the full execution lifecycle (like a `WorkerInterceptor`), not just fires events. This +allows users to implement OTel spans, Prometheus histograms, and structured logging with a single +consistent API. + +## Design Decisions + +### Interceptor vs EventSink + +`StepInterceptor` and `AttemptInterceptor` replace the proposed `EventSink func(WorkflowEvent)`. +`NewStepEventSink` and `NewAttemptEventSink` are built-in adapters that implement these interfaces +and emit `WorkflowEvent`s — users who only want structured events use these adapters and never +interact with the interceptor interfaces directly. + +### Two Layers + +- **`StepInterceptor`**: wraps the full step lifecycle (all retry attempts). One invocation per step. + Right place for OTel spans, step-level metrics. +- **`AttemptInterceptor`**: wraps each individual attempt (`Before → Do → After`). Right place + for per-attempt logging, attempt-level tracing. + +### BeforeStep/AfterStep are orthogonal + +Interceptors are workflow-level; `BeforeStep`/`AfterStep` are step-level (per-step `StepConfig`). +They execute in different layers of the stack and are configured independently. No changes to the +existing `BeforeStep`/`AfterStep` API. + +### StepStatus vs EventType + +These are deliberately separate types: +- `StepStatus` is the orchestration engine's state machine, used by `Condition` evaluation. +- `EventType` is an observation stream for external consumers. `Running` has no `EventType` + equivalent — within it, multiple `Started` and `Retrying` events fire. + +### Retrying event delivery + +`Retrying` fires inside `backoff.RetryNotifyWithTimer`'s Notify callback — between two consecutive +`next()` calls, outside the interceptor chain. It is delivered via `wireNotify`, a side-channel +that assembles `ex.onRetry` from interceptors implementing the package-private `retryNotifier` +interface. The concrete type returned by `NewStepEventSink` implements this interface. + +### SubWorkflow propagation + +`SubWorkflow` implements `InterceptorReceiver`. `stepExecution` calls `PrependInterceptors` in +`executeWithRetry` (once per step, not per attempt) so parent interceptors wrap child interceptors. + +### stepExecution refactor + +The anonymous goroutine in `tick()` is extracted into a `stepExecution` struct. `tick()` becomes +a single-responsibility function: atomically claim a step with a private `scheduled` sentinel. +All lifecycle logic (condition evaluation, interceptor chain assembly, retry, event delivery) +moves into `stepExecution.run()`. + +## API Surface + +```go +// New on Workflow struct +StepInterceptors []StepInterceptor +AttemptInterceptors []AttemptInterceptor + +// New interfaces +type StepInterceptor interface { + InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error +} +type AttemptInterceptor interface { + InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +} + +// Function adapters +type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error +type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error + +// Info types +type StepInfo struct { + Step Steper + TerminalReason StepStatus // Pending = will execute normally +} +type AttemptInfo struct { + StepInfo + Attempt uint64 +} + +// Event types +type EventType string // Scheduled / Started / Retrying / EventSucceeded / EventFailed / EventCanceled / EventSkipped +type WorkflowEvent struct { Step Steper; Type EventType; Attempt uint64; Err error; Duration, BackoffDuration time.Duration } + +// Built-in adapters +func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor +func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor + +// SubWorkflow propagation +type InterceptorReceiver interface { + PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) +} +``` + +## No Breaking Changes + +All existing APIs (`BeforeStep`, `AfterStep`, `Condition`, `RetryOption`, `SubWorkflow` embedding) +are unchanged. The new fields on `Workflow` are zero-value safe — workflows without interceptors +behave identically to before. diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/proposal.md b/openspec/changes/archive/2026-05-06-structured-event-sink/proposal.md new file mode 100644 index 0000000..89e3412 --- /dev/null +++ b/openspec/changes/archive/2026-05-06-structured-event-sink/proposal.md @@ -0,0 +1,75 @@ +## Why + +Currently, observability in go-workflow relies on users wiring up `BeforeStep`/`AfterStep` +callbacks manually on every step they care about. There is no structured way to observe +all steps globally — no lifecycle events, no attempt count, no timing, no retry visibility. + +In production, you need to answer: which step is running right now? How many retries has step X +done? How long did step Y take? None of these are answerable today without bespoke instrumentation. + +Temporal exposes a full Event History for every workflow execution. go-workflow should offer a +lightweight equivalent: a global `EventSink` that receives structured events for every step +lifecycle transition. + +## What Changes + +- A `WorkflowEvent` struct capturing step identity, event type, attempt, error, and timestamp. +- An `EventSink` interface (or a simple `func`) that the `Workflow` calls on every transition. +- A field on `Workflow` to register the sink. + +## Capabilities + +### New Capabilities + +- **Structured events**: every meaningful transition emits a `WorkflowEvent`: + - `Scheduled` — step is ready to run (all upstreams terminated, condition evaluated to Running) + - `Started` — goroutine launched, `Do()` about to be called + - `Retrying` — `Do()` returned an error, backoff is sleeping before next attempt + - `Succeeded` / `Failed` / `Canceled` / `Skipped` — terminal transitions + - `HeartbeatReceived` — if heartbeat feature lands (see heartbeat-and-liveness change) + +- **EventSink integration**: `Workflow.EventSink` is a function `func(WorkflowEvent)` (or an + interface). Simple function type avoids an extra abstraction layer and is trivially composable + (fan-out = call multiple funcs). + +- **Zero-cost when unset**: if `EventSink` is nil, no allocations occur on the hot path. + +- **Out-of-box adapters (separate package or examples)**: + - `slog` adapter — logs each event as a structured log line + - OpenTelemetry span adapter — wraps each step attempt in a trace span + - Prometheus metrics adapter — increments counters and records histograms + +### Example sketch + +```go +w := &flow.Workflow{ + EventSink: func(e flow.WorkflowEvent) { + slog.Info("step event", + "step", flow.String(e.Step), + "event", e.Type, + "attempt", e.Attempt, + "err", e.Err, + "duration", e.Duration, + ) + }, +} +``` + +### Open Questions + +- `WorkflowEvent.Step` is a `Steper` (interface/pointer). For logging we need a stable string + name. Should `WorkflowEvent` also carry a pre-computed `StepName string` (from `flow.String()`)? + Probably yes, to avoid callers doing it themselves. + +- Should `Retrying` carry the backoff duration so callers can log "retrying in 2s"? + +- Should the sink be called synchronously on the step goroutine, or dispatched async? + Synchronous is simpler and predictable; async risks hiding slow sinks. + +## Impact + +- New `WorkflowEvent` struct and `EventType` constants. +- `Workflow` struct — add `EventSink func(WorkflowEvent)` field. +- `workflow.go` — call sink at each status transition (in `tick` and `runStep`). +- New spec: `openspec/specs/event-sink/spec.md`. +- No breaking changes. diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/specs/step-interceptor/spec.md b/openspec/changes/archive/2026-05-06-structured-event-sink/specs/step-interceptor/spec.md new file mode 100644 index 0000000..0eb3e9d --- /dev/null +++ b/openspec/changes/archive/2026-05-06-structured-event-sink/specs/step-interceptor/spec.md @@ -0,0 +1,137 @@ +# Step Interceptor Spec + +## Overview + +Two-layer interceptor system for global structured observability in go-workflow. +Registered on `Workflow`; applies to all steps automatically. + +## Types + +### StepInterceptor + +Wraps the full lifecycle of one step execution (all retry attempts). + +```go +type StepInterceptor interface { + InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error +} +type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error +``` + +- Called once per step regardless of retry count +- `info.TerminalReason != Pending` means step is Skipped/Canceled; **must not** call `next` +- `next` calls into the retry loop → AttemptInterceptors → BeforeStep → Do → AfterStep + +### AttemptInterceptor + +Wraps each individual attempt (`BeforeStep → Do → AfterStep`). + +```go +type AttemptInterceptor interface { + InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +} +type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +``` + +- Called once per attempt (including retried attempts) +- `info.Attempt` is 0-indexed; increments after each attempt + +### StepInfo / AttemptInfo + +```go +type StepInfo struct { + Step Steper // canonical identifier (same pointer as Workflow map key) + TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not +} +type AttemptInfo struct { + StepInfo + Attempt uint64 +} +``` + +Callers wanting a human-readable name call `flow.String(info.Step)`. + +### EventType / WorkflowEvent + +```go +type EventType string +const ( + Scheduled EventType = "Scheduled" + Started EventType = "Started" + Retrying EventType = "Retrying" + EventSucceeded EventType = "Succeeded" + EventFailed EventType = "Failed" + EventCanceled EventType = "Canceled" + EventSkipped EventType = "Skipped" +) + +type WorkflowEvent struct { + Step Steper + Type EventType + Attempt uint64 + Err error + Duration time.Duration + BackoffDuration time.Duration // non-zero only for Retrying +} +``` + +`EventType` is a distinct named type from `StepStatus`. Terminal `EventType` constants are +prefixed with `Event` to avoid redeclaration conflicts with `StepStatus` constants. + +### InterceptorReceiver + +```go +type InterceptorReceiver interface { + PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) +} +``` + +Steps embedding `SubWorkflow` implement this interface. `stepExecution` calls it in +`executeWithRetry` (once per step, before the retry loop) to propagate parent interceptors. + +## Workflow Integration + +```go +type Workflow struct { + // ... existing fields ... + StepInterceptors []StepInterceptor // [0] outermost, [len-1] innermost + AttemptInterceptors []AttemptInterceptor // [0] outermost; BeforeStep/AfterStep always innermost +} +``` + +Zero-value safe: nil slices mean no interceptors; existing behaviour is unchanged. + +## Built-in Adapters + +```go +func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor +func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor +``` + +`NewStepEventSink` emits: `Scheduled` → (Retrying events via side-channel) → terminal event. +`NewAttemptEventSink` emits: `Started` per attempt. + +`NewStepEventSink` also implements the package-private `retryNotifier` interface so `wireNotify` +can deliver `Retrying` events (which bypass the chain) to the sink. + +## Execution Stack + +``` +StepInterceptor[0] + └── StepInterceptor[1] + └── [retry loop — Notify wired to onRetry] + └── AttemptInterceptor[0] + └── AttemptInterceptor[1] + └── BeforeStep callbacks (from StepConfig) + └── step.Do(ctx) + └── AfterStep callbacks +``` + +## Invariants + +- `StepInterceptor` fires exactly once per step execution +- `AttemptInterceptor` fires exactly once per attempt +- `Retrying` event `Attempt` field matches the attempt that just failed (0-indexed) +- `SubWorkflow` parent interceptors execute outside child interceptors +- `PrependInterceptors` called once per step (in `executeWithRetry`), not per attempt +- `State.Option()` allocates a fresh `*StepOption` + `*RetryOption` each call — `wireNotify` mutations are safe and do not persist across `Reset()`+`Do()` runs diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/tasks.md b/openspec/changes/archive/2026-05-06-structured-event-sink/tasks.md new file mode 100644 index 0000000..b5b3bed --- /dev/null +++ b/openspec/changes/archive/2026-05-06-structured-event-sink/tasks.md @@ -0,0 +1,33 @@ +# Tasks: structured-event-sink + +## Implementation + +- [x] Define public types in `event.go` (`EventType`, `WorkflowEvent`, `StepInterceptor`, `AttemptInterceptor`, `StepInterceptorFunc`, `AttemptInterceptorFunc`, `StepInfo`, `AttemptInfo`, `InterceptorReceiver`, `retryNotifier`) +- [x] Implement `NewStepEventSink` and `NewAttemptEventSink` in `event.go` +- [x] Add `StepInterceptors`/`AttemptInterceptors` fields to `Workflow` struct +- [x] Introduce `stepExecution` struct; simplify `tick()` to only claim step via `scheduled` sentinel +- [x] Implement `stepExecution.run()`, `executeWithRetry()`, `buildAttemptChain()`, `runAttempt()`, `wireNotify()` +- [x] Delete `makeDoForStep()` and `runStep()` from `workflow.go` +- [x] Implement `SubWorkflow.PrependInterceptors` in `wrap.go` + +## Tests + +- [x] Unit tests for `EventType` constants and `StepInterceptorFunc`/`AttemptInterceptorFunc` adapters +- [x] Unit tests for `NewStepEventSink` (Succeeded, Failed, Skipped, OnRetry) +- [x] Unit tests for `NewAttemptEventSink` (Started event) +- [x] Integration test: basic step success with StepInterceptor +- [x] Integration test: StepInterceptor chain ordering (A→B→B→A) +- [x] Integration test: AttemptInterceptor chain ordering (X→Y→Y→X) +- [x] Integration test: Skipped step enters interceptor chain with TerminalReason +- [x] Integration test: Retrying events with correct attempt numbers +- [x] Integration test: SubWorkflow interceptor propagation +- [x] Integration test: child interceptor preserved alongside parent +- [x] Integration test: `PrependInterceptors` not duplicated on retry (`TestSubWorkflow_InterceptorNotDuplicatedOnRetry`) +- [x] Regression test: zero-interceptor workflow unchanged +- [x] Race detector clean (`go test -race ./...`) + +## Bug Fixes (found during review) + +- [x] Fix C1: `PrependInterceptors` moved from `runAttempt` (per-attempt) to `executeWithRetry` (once per step) +- [x] Fix wireNotify timing: `Retrying.Attempt` uses `ex.attempt - 1` (defer in `runAttempt` fires before Notify) +- [x] Fix `EventType` to be a distinct named type (`type EventType string`), not a type alias From bfcd1d7f87d2499bb2b3b9c2c36954a456dcc58d Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 05:46:51 +0000 Subject: [PATCH 14/29] fix: address Codex review issues in interceptor implementation - wireNotify: deep-copy RetryOption before mutating Notify to prevent shared-pointer mutation when Workflow.DefaultOption carries a RetryOption - buildAttemptChain: move attempt++ to wrapper around full interceptor chain so counter always advances even if AttemptInterceptor short-circuits, preventing underflow (uint64 wrap) in wireNotify's attempt-1 expression - PrependInterceptors: use make+copy instead of append to avoid aliasing parent's backing array and to ensure idempotency across Reset+Do cycles - event.go: update InterceptorReceiver comment to reflect once-per-step (not per-attempt) injection point Co-Authored-By: Claude Sonnet 4.6 --- event.go | 4 ++-- workflow.go | 34 +++++++++++++++++++++++++--------- 2 files changed, 27 insertions(+), 11 deletions(-) diff --git a/event.go b/event.go index afc901a..b32bff4 100644 --- a/event.go +++ b/event.go @@ -70,8 +70,8 @@ func (f AttemptInterceptorFunc) InterceptAttempt(ctx context.Context, info Attem } // InterceptorReceiver is implemented by steps that contain a sub-workflow. -// stepExecution calls PrependInterceptors before each attempt so that -// parent interceptors wrap child interceptors. +// stepExecution calls PrependInterceptors once (in executeWithRetry, before the retry loop) +// so that parent interceptors wrap child interceptors for the entire step lifetime. type InterceptorReceiver interface { PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) } diff --git a/workflow.go b/workflow.go index b4c9ee9..0f585a6 100644 --- a/workflow.go +++ b/workflow.go @@ -519,12 +519,16 @@ func (ex *stepExecution) buildAttemptChain() func(context.Context) error { return icLocal.InterceptAttempt(ctx, info, next) } } - return chain + // Wrap the full attempt chain (including interceptors) so ex.attempt is always + // incremented after each attempt regardless of whether interceptors short-circuit. + inner := chain + return func(ctx context.Context) error { + defer func() { ex.attempt++ }() + return inner(ctx) + } } func (ex *stepExecution) runAttempt(ctx context.Context) error { - defer func() { ex.attempt++ }() - do := func(fn func() error) error { return fn() } if ex.w.DontPanic { do = catchPanicAsError @@ -548,12 +552,17 @@ func (ex *stepExecution) wireNotify(option *StepOption) { if option == nil || option.RetryOption == nil { return } - // option is a fresh copy from State.Option() each run — safe to mutate. - // State.Option() allocates new StepOption and RetryOption on every call. + // Deep-copy RetryOption before mutating its Notify field. + // option is a fresh StepOption from State.Option(), but its RetryOption pointer + // may be shared (e.g. when Workflow.DefaultOption carries a RetryOption) — a + // shallow copy of StepOption does not copy the pointed-to RetryOption. + ro := *option.RetryOption + option.RetryOption = &ro + userNotify := option.RetryOption.Notify option.RetryOption.Notify = func(err error, d time.Duration) { - // ex.attempt has already been incremented by runAttempt's defer, - // so subtract 1 to get the attempt number that just failed. + // ex.attempt has already been incremented by the buildAttemptChain wrapper's + // defer, so subtract 1 to get the attempt number that just failed. e := WorkflowEvent{ Step: ex.step, Type: Retrying, @@ -655,6 +664,13 @@ func (s *SubWorkflow) Reset() { s.w = Workflow{} } // PrependInterceptors implements InterceptorReceiver. // Parent workflow interceptors are prepended so they execute outside child interceptors. func (s *SubWorkflow) PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) { - s.w.StepInterceptors = append(step, s.w.StepInterceptors...) - s.w.AttemptInterceptors = append(attempt, s.w.AttemptInterceptors...) + combined := make([]StepInterceptor, len(step)+len(s.w.StepInterceptors)) + copy(combined, step) + copy(combined[len(step):], s.w.StepInterceptors) + s.w.StepInterceptors = combined + + combinedA := make([]AttemptInterceptor, len(attempt)+len(s.w.AttemptInterceptors)) + copy(combinedA, attempt) + copy(combinedA[len(attempt):], s.w.AttemptInterceptors) + s.w.AttemptInterceptors = combinedA } From 7fe98f1d2a55df52fc3dc70e6fee635e8bfed1b1 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 06:28:33 +0000 Subject: [PATCH 15/29] refactor: move Retrying event ownership from StepInterceptor to AttemptInterceptor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit StepInterceptor should be unaware of retry internals — it sees only Scheduled and a terminal event. Retrying conceptually belongs to the attempt layer alongside Started. - stepEventSink: remove onRetry; no longer implements retryNotifier - attemptEventSink: new concrete type backing NewAttemptEventSink; implements both AttemptInterceptor and retryNotifier (Started + Retrying to same sink) - workflow.go run(): collect retryNotifiers from AttemptInterceptors, not StepInterceptors - event_test.go: assert StepEventSink does NOT implement retryNotifier; assert AttemptEventSink does - spec: update event ownership table, layer descriptions, Retrying section, usage examples, and open questions resolution Co-Authored-By: Claude Sonnet 4.6 --- .../2026-05-06-step-interceptor-design.md | 104 ++++++++++-------- event.go | 31 ++++-- event_test.go | 12 +- workflow.go | 17 ++- 4 files changed, 102 insertions(+), 62 deletions(-) diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md index f2ba1f8..517f485 100644 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -52,18 +52,25 @@ EventType: Scheduled Started Retrying Started Retrying Started S Mapping of EventType to where it is emitted: -| EventType | StepStatus transition | Emitted in | -|-------------|-------------------------------|-------------------------| -| `Scheduled` | `Pending → scheduled` | StepInterceptor entry | -| `Started` | status stays `Running` | AttemptInterceptor entry| -| `Retrying` | status stays `Running` | `RetryOption.Notify` | -| `Succeeded` | `Running → Succeeded` | StepInterceptor exit | -| `Failed` | `Running → Failed` | StepInterceptor exit | -| `Canceled` | `Running/Pending → Canceled` | StepInterceptor exit | -| `Skipped` | `Pending → Skipped` | StepInterceptor exit | +| EventType | StepStatus transition | Emitted in | +|-------------|-------------------------------|-------------------------------------| +| `Scheduled` | `Pending → scheduled` | StepInterceptor entry | +| `Started` | status stays `Running` | AttemptInterceptor entry | +| `Retrying` | status stays `Running` | `RetryOption.Notify` side-channel → AttemptInterceptor sink | +| `Succeeded` | `Running → Succeeded` | StepInterceptor exit | +| `Failed` | `Running → Failed` | StepInterceptor exit | +| `Canceled` | `Running/Pending → Canceled` | StepInterceptor exit | +| `Skipped` | `Pending → Skipped` | StepInterceptor exit | + +**Ownership of events by layer:** + +- `StepInterceptor` sees only: `Scheduled` + one terminal (`Succeeded`/`Failed`/`Canceled`/`Skipped`). + It is not aware of how many retries occurred. +- `AttemptInterceptor` sees: `Started` (per attempt) + `Retrying` (per failed attempt, via side-channel). + It owns the full picture of attempt-level activity. `Failed` is **only** a terminal event. It is never emitted for a single failed attempt inside a -retry loop — that is covered by `Retrying`. +retry loop — that is covered by `Retrying`, which belongs to the attempt layer. --- @@ -83,12 +90,16 @@ StepInterceptor[0] ``` **StepInterceptor** wraps the entire lifecycle of a step including all retry attempts. It sees -the step exactly once: entry on `Scheduled`, exit on terminal status. It is the right place for -OTel spans (one span per step, not per attempt) and step-level metrics. +the step exactly once: entry on `Scheduled`, exit on terminal status. It has no visibility into +individual retry attempts — it does not receive `Retrying` or `Started` events. It is the right +place for OTel spans (one span per step, not per attempt) and step-level metrics. **AttemptInterceptor** wraps each individual attempt (`Before → Do → After`). It sees every -attempt, including retried ones. It is the right place for per-attempt logging and attempt-level -tracing. +attempt, including retried ones. It also receives `Retrying` events (via the `retryNotifier` +side-channel) so it has the complete picture of attempt-level activity: when each attempt starts +(`Started`), why it failed and how long to wait (`Retrying`), and the final outcome (via the +enclosing `StepInterceptor`). It is the right place for per-attempt logging, attempt-level +tracing, and retry observability. **BeforeStep/AfterStep** (existing) are a different mechanism from Interceptors. Interceptors are workflow-level and apply globally to all steps. BeforeStep/AfterStep are step-level and are @@ -234,46 +245,48 @@ type Workflow struct { ### Built-in EventSink adapters ```go -// NewStepEventSink returns a StepInterceptor that emits Scheduled, Succeeded, -// Failed, Canceled, Skipped, and Retrying events to sink. +// NewStepEventSink returns a StepInterceptor that emits Scheduled and a terminal +// event (Succeeded/Failed/Canceled/Skipped) for every step. +// It is not aware of individual retry attempts. +func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor + +// NewAttemptEventSink returns an AttemptInterceptor that emits Started events per +// attempt and Retrying events after each failed attempt (before backoff). // The returned value also implements a package-private retryNotifier interface // so that stepExecution can deliver Retrying events (which bypass the chain) to sink. // This implementation detail is not visible to callers. -func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor - -// NewAttemptEventSink returns an AttemptInterceptor that emits Started events to sink. func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor ``` Usage examples: ```go -// Structured logging only +// Step-level only (no retry detail) w := &flow.Workflow{ StepInterceptors: []flow.StepInterceptor{ flow.NewStepEventSink(func(e flow.WorkflowEvent) { slog.Info("step event", "step", flow.String(e.Step), "type", e.Type, - "attempt", e.Attempt, "err", e.Err, "duration", e.Duration, + "err", e.Err, "duration", e.Duration, ) }), }, } -// OTel span per step + per-attempt detail +// Full observability: step-level spans + attempt-level detail (Started + Retrying) w := &flow.Workflow{ StepInterceptors: []flow.StepInterceptor{myOtelStepInterceptor}, AttemptInterceptors: []flow.AttemptInterceptor{flow.NewAttemptEventSink(mySink)}, } // Fan-out: multiple sinks via closure +sink := flow.NewAttemptEventSink(func(e flow.WorkflowEvent) { + promSink(e) + slogSink(e) +}) w := &flow.Workflow{ - StepInterceptors: []flow.StepInterceptor{ - flow.NewStepEventSink(func(e flow.WorkflowEvent) { - promSink(e) - slogSink(e) - }), - }, + StepInterceptors: []flow.StepInterceptor{flow.NewStepEventSink(mySink)}, + AttemptInterceptors: []flow.AttemptInterceptor{sink}, } ``` @@ -281,11 +294,12 @@ w := &flow.Workflow{ ## SubWorkflow Propagation -`SubWorkflow` implements `InterceptorReceiver`. Before each call to `step.Do()`, `stepExecution` -checks whether the step implements this interface and injects the parent's interceptors: +`SubWorkflow` implements `InterceptorReceiver`. Once in `executeWithRetry` (before the retry loop +starts), `stepExecution` checks whether the step implements this interface and injects the parent's +interceptors: ```go -// in stepExecution.runAttempt(), before step.Do() +// in stepExecution.executeWithRetry(), once before the retry loop if recv, ok := ex.step.(InterceptorReceiver); ok { recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) } @@ -298,8 +312,8 @@ stack for inner steps is: [parent StepInterceptors] → [child StepInterceptors] → retry → [parent AttemptInterceptors] → [child AttemptInterceptors] → Before → Do → After ``` -This is injected on every attempt because `SubWorkflow.Reset()` clears the inner workflow before -each `BuildStep()` call. +This is injected once per step execution (not per attempt) because `executeWithRetry` runs once +per step, outside the retry loop. --- @@ -311,18 +325,19 @@ previous `next()` returned an error) and the next `next()` hasn't been called ye natural place to insert it into the chain. The solution: `stepExecution.wireNotify()` wraps `RetryOption.Notify` and calls `ex.onRetry` -directly. `ex.onRetry` is assembled during chain construction by collecting the `sink` function -from any `*StepEventSinkInterceptor` in `StepInterceptors`. +directly. `ex.onRetry` is assembled during chain construction by collecting the package-private +`retryNotifier` interface from any interceptor in `AttemptInterceptors` that implements it. The +concrete type returned by `NewAttemptEventSink` implements this interface; custom interceptors do +not need to. ``` -attempt N fails → backoff.Notify fires → ex.onRetry(Retrying{attempt=N}) → ex.attempt++ +attempt N fails → buildAttemptChain wrapper: ex.attempt++ + → backoff.Notify fires → ex.onRetry(Retrying{attempt=N}) + → AttemptInterceptor sink receives Retrying ``` -`ex.onRetry` is assembled during chain construction by collecting the package-private `retryNotifier` -interface from any interceptor in `StepInterceptors` that implements it. The concrete type returned -by `NewStepEventSink` implements this interface; custom interceptors do not need to. - -This keeps `Retrying` aligned with the same `attempt` counter used by `AttemptInfo`. +This keeps `Retrying` semantically co-located with `Started` — both belong to the attempt layer +and reach the same sink. --- @@ -372,10 +387,11 @@ None. All questions from the brainstorm have been resolved: | Per-step vs per-attempt | Both layers; different use cases | | Skipped/Canceled visibility | Enter StepInterceptor chain via TerminalReason | | SubWorkflow propagation | PrependInterceptors on InterceptorReceiver | -| Retrying event delivery | wireNotify + onRetry (private retryNotifier), bypasses chain by design | -| attempt counter ownership | stepExecution owns it; single source of truth | +| Retrying event ownership | Belongs to AttemptInterceptor layer (not StepInterceptor); delivered via wireNotify + retryNotifier side-channel | +| attempt counter ownership | stepExecution owns it; single source of truth; incremented in buildAttemptChain wrapper | | BeforeStep/AfterStep fate | Unchanged; orthogonal to Interceptors (step-level vs workflow-level) | | Step identifier / name | No precomputed name; Step pointer is the identifier; callers call flow.String() | -| NewStepEventSink return type | Returns StepInterceptor (interface); retryNotifier is package-private | +| NewAttemptEventSink return type | Returns AttemptInterceptor (interface); retryNotifier is package-private | +| NewStepEventSink return type | Returns StepInterceptor (interface); does not implement retryNotifier | | retry.go changes | None needed; stepExecution.attempt is independent | | Breaking changes | None | diff --git a/event.go b/event.go index b32bff4..2d827f8 100644 --- a/event.go +++ b/event.go @@ -77,7 +77,7 @@ type InterceptorReceiver interface { } // retryNotifier is a package-private interface implemented by the concrete -// type returned by NewStepEventSink. stepExecution uses it to deliver +// type returned by NewAttemptEventSink. stepExecution uses it to deliver // Retrying events (which bypass the interceptor chain) to the sink. type retryNotifier interface { onRetry(WorkflowEvent) @@ -121,6 +121,7 @@ type stepEventSink struct { // NewStepEventSink returns a StepInterceptor that emits Scheduled then a terminal // event (Succeeded/Failed/Canceled/Skipped) for every step. +// It is not aware of individual retry attempts; use NewAttemptEventSink for that. func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor { return &stepEventSink{sink: sink} } @@ -144,17 +145,27 @@ func (s *stepEventSink) InterceptStep(ctx context.Context, info StepInfo, next f return err } -func (s *stepEventSink) onRetry(e WorkflowEvent) { s.sink(e) } +// attemptEventSink is the concrete type returned by NewAttemptEventSink. +// It implements both AttemptInterceptor and retryNotifier so that Started and +// Retrying events are delivered to the same sink function. +type attemptEventSink struct { + sink func(WorkflowEvent) +} // NewAttemptEventSink returns an AttemptInterceptor that emits a Started event -// for each attempt. +// for each attempt and a Retrying event after each failed attempt (before backoff). +// Retrying carries the failure error and the backoff duration. func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor { - return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - sink(WorkflowEvent{ - Step: info.Step, - Type: Started, - Attempt: info.Attempt, - }) - return next(ctx) + return &attemptEventSink{sink: sink} +} + +func (s *attemptEventSink) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + s.sink(WorkflowEvent{ + Step: info.Step, + Type: Started, + Attempt: info.Attempt, }) + return next(ctx) } + +func (s *attemptEventSink) onRetry(e WorkflowEvent) { s.sink(e) } diff --git a/event_test.go b/event_test.go index b628f1f..205877f 100644 --- a/event_test.go +++ b/event_test.go @@ -94,12 +94,18 @@ func TestNewStepEventSink_SkippedStep(t *testing.T) { assert.Equal(t, EventSkipped, events[1].Type) } -func TestNewStepEventSink_OnRetry(t *testing.T) { +func TestNewStepEventSink_OnRetry_NotImplemented(t *testing.T) { + sink := NewStepEventSink(func(e WorkflowEvent) {}) + _, ok := sink.(retryNotifier) + assert.False(t, ok, "NewStepEventSink must NOT implement retryNotifier") +} + +func TestNewAttemptEventSink_OnRetry(t *testing.T) { var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) + sink := NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }) rn, ok := sink.(retryNotifier) - assert.True(t, ok, "NewStepEventSink should implement retryNotifier") + assert.True(t, ok, "NewAttemptEventSink should implement retryNotifier") boom := errors.New("boom") rn.onRetry(WorkflowEvent{Type: Retrying, Attempt: 0, Err: boom, BackoffDuration: time.Second}) diff --git a/workflow.go b/workflow.go index ae31490..0138957 100644 --- a/workflow.go +++ b/workflow.go @@ -427,10 +427,9 @@ func (ex *stepExecution) run(ctx context.Context) { info := StepInfo{Step: ex.step, TerminalReason: terminalReason} - // Build StepInterceptor chain; collect retryNotifiers. + // Build StepInterceptor chain. // The innermost next is executeWithRetry for normal steps; a no-op for terminal steps // (interceptors that observe terminalReason should not call next). - var retrySinks []func(WorkflowEvent) var stepNext func(context.Context) error if terminalReason == Pending { stepNext = ex.executeWithRetry @@ -439,15 +438,23 @@ func (ex *stepExecution) run(ctx context.Context) { } for i := len(ex.w.StepInterceptors) - 1; i >= 0; i-- { ic := ex.w.StepInterceptors[i] - if rn, ok := ic.(retryNotifier); ok { - retrySinks = append(retrySinks, rn.onRetry) - } next := stepNext icLocal := ic stepNext = func(ctx context.Context) error { return icLocal.InterceptStep(ctx, info, next) } } + + // Collect retryNotifiers from AttemptInterceptors. + // Retrying events fire inside RetryOption.Notify (between two next() calls), + // where the interceptor chain is unwound — they are delivered via a side-channel + // to any AttemptInterceptor that implements retryNotifier. + var retrySinks []func(WorkflowEvent) + for _, ic := range ex.w.AttemptInterceptors { + if rn, ok := ic.(retryNotifier); ok { + retrySinks = append(retrySinks, rn.onRetry) + } + } ex.onRetry = func(e WorkflowEvent) { for _, s := range retrySinks { s(e) From 46ffc0d3dd5a41ae66b91c6731e0584f1f9f3ec4 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 06:51:50 +0000 Subject: [PATCH 16/29] refactor: remove redundant private statusFromError, use public StatusFromError The private statusFromError was a thin wrapper around the public StatusFromError with identical behavior. Remove it and call StatusFromError directly. Co-Authored-By: Claude Sonnet 4.6 --- workflow.go | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/workflow.go b/workflow.go index 0138957..ab4d438 100644 --- a/workflow.go +++ b/workflow.go @@ -470,7 +470,7 @@ func (ex *stepExecution) run(ctx context.Context) { } else { ex.state.SetStatus(Running) err = stepNext(ctx) - status = statusFromError(err) + status = StatusFromError(err) if status == Failed { switch { case DefaultIsCanceled(err), @@ -591,16 +591,6 @@ func (ex *stepExecution) wireNotify(option *StepOption) { } } -func statusFromError(err error) StepStatus { - if err == nil { - return Succeeded - } - if s := StatusFromError(err); s != Failed { - return s - } - return Failed -} - func (w *Workflow) lease() bool { if w.leaseBucket == nil { return true From c62a6e994edd66bec3430b3ab4104205545c9ae2 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 12:50:51 +0000 Subject: [PATCH 17/29] refactor: add Event prefix to all EventType constants for consistency Scheduled/Started/Retrying had no prefix while EventSucceeded/EventFailed/ EventCanceled/EventSkipped did (to avoid collision with StepStatus constants). Rename to EventScheduled/EventStarted/EventRetrying for uniform naming. Co-Authored-By: Claude Sonnet 4.6 --- event.go | 10 +++++----- event_test.go | 14 +++++++------- workflow.go | 2 +- workflow_test.go | 28 ++++++++++++++-------------- wrap_test.go | 4 ++-- 5 files changed, 29 insertions(+), 29 deletions(-) diff --git a/event.go b/event.go index 2d827f8..c9cf991 100644 --- a/event.go +++ b/event.go @@ -9,9 +9,9 @@ import ( type EventType string const ( - Scheduled EventType = "Scheduled" - Started EventType = "Started" - Retrying EventType = "Retrying" + EventScheduled EventType = "Scheduled" + EventStarted EventType = "Started" + EventRetrying EventType = "Retrying" EventSucceeded EventType = "Succeeded" EventFailed EventType = "Failed" EventCanceled EventType = "Canceled" @@ -127,7 +127,7 @@ func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor { } func (s *stepEventSink) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { - s.sink(WorkflowEvent{Step: info.Step, Type: Scheduled}) + s.sink(WorkflowEvent{Step: info.Step, Type: EventScheduled}) if info.TerminalReason != Pending { s.sink(WorkflowEvent{Step: info.Step, Type: terminalStepStatusToEventType(info.TerminalReason)}) @@ -162,7 +162,7 @@ func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor { func (s *attemptEventSink) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { s.sink(WorkflowEvent{ Step: info.Step, - Type: Started, + Type: EventStarted, Attempt: info.Attempt, }) return next(ctx) diff --git a/event_test.go b/event_test.go index 205877f..eae4fc9 100644 --- a/event_test.go +++ b/event_test.go @@ -11,7 +11,7 @@ import ( func TestEventTypeConstants(t *testing.T) { // Verify all constants exist and are distinct - types := []EventType{Scheduled, Started, Retrying, EventSucceeded, EventFailed, EventCanceled, EventSkipped} + types := []EventType{EventScheduled, EventStarted, EventRetrying, EventSucceeded, EventFailed, EventCanceled, EventSkipped} seen := map[EventType]bool{} for _, et := range types { assert.False(t, seen[et], "duplicate EventType: %q", et) @@ -51,7 +51,7 @@ func TestNewStepEventSink_SucceededStep(t *testing.T) { assert.NoError(t, err) assert.Len(t, events, 2) - assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, EventScheduled, events[0].Type) assert.Equal(t, step, events[0].Step) assert.Equal(t, EventSucceeded, events[1].Type) assert.NotZero(t, events[1].Duration) @@ -70,7 +70,7 @@ func TestNewStepEventSink_FailedStep(t *testing.T) { assert.Equal(t, boom, err) assert.Len(t, events, 2) - assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, EventScheduled, events[0].Type) assert.Equal(t, EventFailed, events[1].Type) assert.Equal(t, boom, events[1].Err) } @@ -90,7 +90,7 @@ func TestNewStepEventSink_SkippedStep(t *testing.T) { assert.NoError(t, err) assert.False(t, nextCalled, "next must not be called for Skipped") assert.Len(t, events, 2) - assert.Equal(t, Scheduled, events[0].Type) + assert.Equal(t, EventScheduled, events[0].Type) assert.Equal(t, EventSkipped, events[1].Type) } @@ -108,10 +108,10 @@ func TestNewAttemptEventSink_OnRetry(t *testing.T) { assert.True(t, ok, "NewAttemptEventSink should implement retryNotifier") boom := errors.New("boom") - rn.onRetry(WorkflowEvent{Type: Retrying, Attempt: 0, Err: boom, BackoffDuration: time.Second}) + rn.onRetry(WorkflowEvent{Type: EventRetrying, Attempt: 0, Err: boom, BackoffDuration: time.Second}) assert.Len(t, events, 1) - assert.Equal(t, Retrying, events[0].Type) + assert.Equal(t, EventRetrying, events[0].Type) assert.Equal(t, boom, events[0].Err) } @@ -127,7 +127,7 @@ func TestNewAttemptEventSink_EmitsStarted(t *testing.T) { assert.NoError(t, err) assert.Len(t, events, 1) - assert.Equal(t, Started, events[0].Type) + assert.Equal(t, EventStarted, events[0].Type) assert.Equal(t, uint64(2), events[0].Attempt) assert.Equal(t, step, events[0].Step) } diff --git a/workflow.go b/workflow.go index ab4d438..7735d77 100644 --- a/workflow.go +++ b/workflow.go @@ -577,7 +577,7 @@ func (ex *stepExecution) wireNotify(option *StepOption) { // defer, so subtract 1 to get the attempt number that just failed. e := WorkflowEvent{ Step: ex.step, - Type: Retrying, + Type: EventRetrying, Attempt: ex.attempt - 1, Err: err, BackoffDuration: d, diff --git a/workflow_test.go b/workflow_test.go index 7a9948e..7f1c117 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -318,7 +318,7 @@ func TestStepExecution_BasicSuccess(t *testing.T) { w.Add(Step(step)) err := w.Do(context.Background()) assert.NoError(t, err) - assert.Equal(t, []EventType{Scheduled, EventSucceeded}, eventTypes(events)) + assert.Equal(t, []EventType{EventScheduled, EventSucceeded}, eventTypes(events)) } func TestStepExecution_StepInterceptorOrder(t *testing.T) { @@ -372,7 +372,7 @@ func TestStepExecution_SkippedStep(t *testing.T) { return Skipped })) assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, []EventType{Scheduled, EventSkipped}, eventTypes(events)) + assert.Equal(t, []EventType{EventScheduled, EventSkipped}, eventTypes(events)) } func TestStepExecution_RetryingEvent(t *testing.T) { @@ -407,10 +407,10 @@ func TestStepExecution_RetryingEvent(t *testing.T) { })) assert.NoError(t, w.Do(context.Background())) assert.Equal(t, []EventType{ - Scheduled, - Started, Retrying, - Started, Retrying, - Started, EventSucceeded, + EventScheduled, + EventStarted, EventRetrying, + EventStarted, EventRetrying, + EventStarted, EventSucceeded, }, eventTypes(events)) } @@ -454,20 +454,20 @@ func TestStepExecution_RetryingEventAttemptNumbers(t *testing.T) { assert.NoError(t, w.Do(context.Background())) assert.Equal(t, []EventType{ - Scheduled, - Started, // attempt 0 - Retrying, // attempt 0 failed - Started, // attempt 1 - Retrying, // attempt 1 failed - Started, // attempt 2 succeeds + EventScheduled, + EventStarted, // attempt 0 + EventRetrying, // attempt 0 failed + EventStarted, // attempt 1 + EventRetrying, // attempt 1 failed + EventStarted, // attempt 2 succeeds EventSucceeded, }, eventTypes(events)) - retryingEvents := filterEvents(events, Retrying) + retryingEvents := filterEvents(events, EventRetrying) assert.Equal(t, uint64(0), retryingEvents[0].Attempt) assert.Equal(t, uint64(1), retryingEvents[1].Attempt) - startedEvents := filterEvents(events, Started) + startedEvents := filterEvents(events, EventStarted) assert.Equal(t, uint64(0), startedEvents[0].Attempt) assert.Equal(t, uint64(1), startedEvents[1].Attempt) assert.Equal(t, uint64(2), startedEvents[2].Attempt) diff --git a/wrap_test.go b/wrap_test.go index fc2824f..a6368b1 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -182,9 +182,9 @@ func TestSubWorkflow_InterceptorPropagation(t *testing.T) { for i, e := range events { types[i] = e.Type } - // At least 4 events: Scheduled+Succeeded for sub, Scheduled+Succeeded for innerStep + // At least 4 events: EventScheduled+Succeeded for sub, EventScheduled+Succeeded for innerStep assert.GreaterOrEqual(t, len(events), 4) - assert.Contains(t, types, Scheduled) + assert.Contains(t, types, EventScheduled) assert.Contains(t, types, EventSucceeded) for _, e := range events { assert.NotNil(t, e.Step) From ffd0ee1f4c271aa27d6f71be9abafa2516bc7a11 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 13:07:26 +0000 Subject: [PATCH 18/29] refactor: remove Retrying event and retryNotifier side-channel MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit AttemptInterceptor already receives the failure error when InterceptAttempt returns — the only unique information Retrying added was BackoffDuration, which is of limited value (derivable from timestamps, or known from static config). Removing it simplifies the design considerably: - Drop EventRetrying constant and BackoffDuration field from WorkflowEvent - Remove retryNotifier interface, wireNotify, onRetry from stepExecution - NewAttemptEventSink reverts to a simple AttemptInterceptorFunc - No side-channel machinery needed Update spec to reflect clean two-layer model with no side-channels. Co-Authored-By: Claude Sonnet 4.6 --- .../2026-05-06-step-interceptor-design.md | 160 ++++++------------ event.go | 50 ++---- event_test.go | 24 +-- workflow.go | 49 ------ workflow_test.go | 77 ++------- 5 files changed, 84 insertions(+), 276 deletions(-) diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md index 517f485..485eddd 100644 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -10,7 +10,7 @@ Currently, observability in go-workflow requires users to wire `BeforeStep`/`AfterStep` callbacks manually on individual steps. There is no structured way to observe all steps globally — no -lifecycle events, no attempt count, no timing, no retry visibility. +lifecycle events, no attempt count, no timing. In production, you need to answer: which step is running right now? How many retries has step X done? How long did step Y take? None of these are answerable today without bespoke instrumentation. @@ -36,8 +36,8 @@ queryable. The `Condition` system reads it to decide whether to run downstream s metrics). It is fire-and-forget. The key difference: `Running` is a single `StepStatus` that spans the entire retry loop, but -within it multiple `Started` and `Retrying` events occur. They cannot be merged without breaking -the `Condition` system. +within it multiple `EventStarted` events occur. They cannot be merged without breaking the +`Condition` system. ``` StepStatus: Pending ──────────────────────────────► Running ──────────────► Succeeded @@ -46,31 +46,30 @@ StepStatus: Pending ─────────────────── └─────────────────────────────────────────────────────────────► Skipped └─────────────────────────────────────────────────────────────► Canceled -EventType: Scheduled Started Retrying Started Retrying Started Succeeded/Failed/Canceled - [attempt 0] [attempt 1] [attempt 2] +EventType: EventScheduled EventStarted EventStarted EventStarted EventSucceeded/EventFailed/EventCanceled + [attempt 0] [attempt 1] [attempt 2] ``` Mapping of EventType to where it is emitted: -| EventType | StepStatus transition | Emitted in | -|-------------|-------------------------------|-------------------------------------| -| `Scheduled` | `Pending → scheduled` | StepInterceptor entry | -| `Started` | status stays `Running` | AttemptInterceptor entry | -| `Retrying` | status stays `Running` | `RetryOption.Notify` side-channel → AttemptInterceptor sink | -| `Succeeded` | `Running → Succeeded` | StepInterceptor exit | -| `Failed` | `Running → Failed` | StepInterceptor exit | -| `Canceled` | `Running/Pending → Canceled` | StepInterceptor exit | -| `Skipped` | `Pending → Skipped` | StepInterceptor exit | +| EventType | StepStatus transition | Emitted in | +|------------------|-------------------------------|--------------------------| +| `EventScheduled` | `Pending → scheduled` | StepInterceptor entry | +| `EventStarted` | status stays `Running` | AttemptInterceptor entry | +| `EventSucceeded` | `Running → Succeeded` | StepInterceptor exit | +| `EventFailed` | `Running → Failed` | StepInterceptor exit | +| `EventCanceled` | `Running/Pending → Canceled` | StepInterceptor exit | +| `EventSkipped` | `Pending → Skipped` | StepInterceptor exit | **Ownership of events by layer:** -- `StepInterceptor` sees only: `Scheduled` + one terminal (`Succeeded`/`Failed`/`Canceled`/`Skipped`). +- `StepInterceptor` sees only: `EventScheduled` + one terminal event. It is not aware of how many retries occurred. -- `AttemptInterceptor` sees: `Started` (per attempt) + `Retrying` (per failed attempt, via side-channel). - It owns the full picture of attempt-level activity. +- `AttemptInterceptor` sees: `EventStarted` per attempt. The failure error for each + attempt is available when `InterceptAttempt` returns. -`Failed` is **only** a terminal event. It is never emitted for a single failed attempt inside a -retry loop — that is covered by `Retrying`, which belongs to the attempt layer. +`EventFailed` is **only** a terminal event. Individual attempt failures within a retry loop +are not separately named — they are observable via the error returned from `InterceptAttempt`. --- @@ -81,7 +80,7 @@ retry loop — that is covered by `Retrying`, which belongs to the attempt layer ``` StepInterceptor[0] └── StepInterceptor[1] - └── [retry loop — Notify wired here] + └── [retry loop] └── AttemptInterceptor[0] └── AttemptInterceptor[1] └── [per-step BeforeStep callbacks] ← from StepConfig @@ -90,16 +89,13 @@ StepInterceptor[0] ``` **StepInterceptor** wraps the entire lifecycle of a step including all retry attempts. It sees -the step exactly once: entry on `Scheduled`, exit on terminal status. It has no visibility into -individual retry attempts — it does not receive `Retrying` or `Started` events. It is the right -place for OTel spans (one span per step, not per attempt) and step-level metrics. +the step exactly once: entry on `EventScheduled`, exit on terminal status. It has no visibility +into individual retry attempts. It is the right place for OTel spans (one span per step, not per +attempt) and step-level metrics. **AttemptInterceptor** wraps each individual attempt (`Before → Do → After`). It sees every -attempt, including retried ones. It also receives `Retrying` events (via the `retryNotifier` -side-channel) so it has the complete picture of attempt-level activity: when each attempt starts -(`Started`), why it failed and how long to wait (`Retrying`), and the final outcome (via the -enclosing `StepInterceptor`). It is the right place for per-attempt logging, attempt-level -tracing, and retry observability. +attempt, including retried ones. The failure error for each attempt is available on return. +It is the right place for per-attempt logging, attempt-level tracing, and retry observability. **BeforeStep/AfterStep** (existing) are a different mechanism from Interceptors. Interceptors are workflow-level and apply globally to all steps. BeforeStep/AfterStep are step-level and are @@ -114,16 +110,15 @@ the full step lifecycle: ```go type stepExecution struct { - w *Workflow - step Steper - state *State - attempt uint64 // single source of truth for attempt count - onRetry func(WorkflowEvent) // assembled during chain build + w *Workflow + step Steper + state *State + attempt uint64 // single source of truth for attempt count } ``` -`attempt` is the single source of truth shared between `AttemptInfo` and `RetryOption.Notify`. -It is incremented inside `wireNotify` after each failed attempt, before `Retrying` is emitted. +`attempt` is incremented in a wrapper inside `buildAttemptChain` that surrounds the full +interceptor chain, so it always advances regardless of whether interceptors short-circuit. ### tick() simplification @@ -178,11 +173,6 @@ type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(ctx type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(ctx context.Context) error) error // StepInfo is passed to StepInterceptor. -// Step is the canonical identifier — it is the same pointer used as the map key -// in Workflow, stable for the lifetime of the workflow definition. -// Callers that need a human-readable name can call flow.String(info.Step). -// No name is precomputed by the framework; different sinks may have different -// naming preferences (short name, fully-qualified type, etc.). type StepInfo struct { Step Steper TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute @@ -199,23 +189,21 @@ type AttemptInfo struct { type EventType string const ( - Scheduled EventType = "Scheduled" - Started EventType = "Started" - Retrying EventType = "Retrying" - Succeeded EventType = "Succeeded" - Failed EventType = "Failed" - Canceled EventType = "Canceled" - Skipped EventType = "Skipped" + EventScheduled EventType = "Scheduled" + EventStarted EventType = "Started" + EventSucceeded EventType = "Succeeded" + EventFailed EventType = "Failed" + EventCanceled EventType = "Canceled" + EventSkipped EventType = "Skipped" ) // WorkflowEvent carries information about a step lifecycle event. type WorkflowEvent struct { - Step Steper - Type EventType - Attempt uint64 - Err error - Duration time.Duration - BackoffDuration time.Duration // non-zero only for Retrying + Step Steper + Type EventType + Attempt uint64 + Err error + Duration time.Duration } // InterceptorReceiver is implemented by steps that contain a sub-workflow @@ -245,23 +233,20 @@ type Workflow struct { ### Built-in EventSink adapters ```go -// NewStepEventSink returns a StepInterceptor that emits Scheduled and a terminal -// event (Succeeded/Failed/Canceled/Skipped) for every step. +// NewStepEventSink returns a StepInterceptor that emits EventScheduled and a terminal +// event (EventSucceeded/EventFailed/EventCanceled/EventSkipped) for every step. // It is not aware of individual retry attempts. func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor -// NewAttemptEventSink returns an AttemptInterceptor that emits Started events per -// attempt and Retrying events after each failed attempt (before backoff). -// The returned value also implements a package-private retryNotifier interface -// so that stepExecution can deliver Retrying events (which bypass the chain) to sink. -// This implementation detail is not visible to callers. +// NewAttemptEventSink returns an AttemptInterceptor that emits an EventStarted event +// for each attempt. The failure error (if any) is available when InterceptAttempt returns. func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor ``` Usage examples: ```go -// Step-level only (no retry detail) +// Step-level only w := &flow.Workflow{ StepInterceptors: []flow.StepInterceptor{ flow.NewStepEventSink(func(e flow.WorkflowEvent) { @@ -273,21 +258,11 @@ w := &flow.Workflow{ }, } -// Full observability: step-level spans + attempt-level detail (Started + Retrying) +// Full observability: step-level spans + per-attempt detail w := &flow.Workflow{ StepInterceptors: []flow.StepInterceptor{myOtelStepInterceptor}, AttemptInterceptors: []flow.AttemptInterceptor{flow.NewAttemptEventSink(mySink)}, } - -// Fan-out: multiple sinks via closure -sink := flow.NewAttemptEventSink(func(e flow.WorkflowEvent) { - promSink(e) - slogSink(e) -}) -w := &flow.Workflow{ - StepInterceptors: []flow.StepInterceptor{flow.NewStepEventSink(mySink)}, - AttemptInterceptors: []flow.AttemptInterceptor{sink}, -} ``` --- @@ -317,42 +292,15 @@ per step, outside the retry loop. --- -## Retrying Event: Why It Bypasses the Interceptor Chain - -`Retrying` fires inside `backoff.RetryNotifyWithTimer`'s Notify callback, which sits between two -consecutive `next()` calls. At that point the interceptor chain's call stack has unwound (the -previous `next()` returned an error) and the next `next()` hasn't been called yet. There is no -natural place to insert it into the chain. - -The solution: `stepExecution.wireNotify()` wraps `RetryOption.Notify` and calls `ex.onRetry` -directly. `ex.onRetry` is assembled during chain construction by collecting the package-private -`retryNotifier` interface from any interceptor in `AttemptInterceptors` that implements it. The -concrete type returned by `NewAttemptEventSink` implements this interface; custom interceptors do -not need to. - -``` -attempt N fails → buildAttemptChain wrapper: ex.attempt++ - → backoff.Notify fires → ex.onRetry(Retrying{attempt=N}) - → AttemptInterceptor sink receives Retrying -``` - -This keeps `Retrying` semantically co-located with `Started` — both belong to the attempt layer -and reach the same sink. - ---- - ## Skipped / Canceled in StepInterceptor Steps that are Skipped or Canceled by their `Condition` still enter the `StepInterceptor` chain. `StepInfo.TerminalReason` carries the reason. The contract is: - If `TerminalReason != Pending`, the interceptor **must not** call `next`. -- The interceptor should emit `Scheduled` then `Skipped`/`Canceled` and return nil. +- The interceptor should emit `EventScheduled` then `EventSkipped`/`EventCanceled` and return nil. - The built-in `NewStepEventSink` handles this correctly. -Custom interceptors that call `next` when `TerminalReason != Pending` will cause a panic (the -`next` function asserts this precondition). - --- ## What Does Not Change @@ -371,8 +319,7 @@ Custom interceptors that call `next` when `TerminalReason != Pending` will cause | File | Change | |------|--------| | `workflow.go` | Add `StepInterceptors`, `AttemptInterceptors` fields; simplify `tick()`; add `stepExecution` | -| `step.go` | Add interceptor interfaces, info types, `InterceptorReceiver` | -| `event.go` | New file: `EventType`, `WorkflowEvent`, `NewStepEventSink`, `NewAttemptEventSink` | +| `event.go` | New file: `EventType`, `WorkflowEvent`, interceptor interfaces, `NewStepEventSink`, `NewAttemptEventSink` | | `wrap.go` | `SubWorkflow` implements `InterceptorReceiver` | --- @@ -387,11 +334,10 @@ None. All questions from the brainstorm have been resolved: | Per-step vs per-attempt | Both layers; different use cases | | Skipped/Canceled visibility | Enter StepInterceptor chain via TerminalReason | | SubWorkflow propagation | PrependInterceptors on InterceptorReceiver | -| Retrying event ownership | Belongs to AttemptInterceptor layer (not StepInterceptor); delivered via wireNotify + retryNotifier side-channel | -| attempt counter ownership | stepExecution owns it; single source of truth; incremented in buildAttemptChain wrapper | +| Retrying event | Removed; individual attempt failures observable via error returned from InterceptAttempt | +| attempt counter ownership | stepExecution owns it; incremented in buildAttemptChain wrapper | | BeforeStep/AfterStep fate | Unchanged; orthogonal to Interceptors (step-level vs workflow-level) | | Step identifier / name | No precomputed name; Step pointer is the identifier; callers call flow.String() | -| NewAttemptEventSink return type | Returns AttemptInterceptor (interface); retryNotifier is package-private | -| NewStepEventSink return type | Returns StepInterceptor (interface); does not implement retryNotifier | -| retry.go changes | None needed; stepExecution.attempt is independent | +| EventType naming | All constants prefixed with `Event` for consistency | +| retry.go changes | None needed | | Breaking changes | None | diff --git a/event.go b/event.go index c9cf991..04c32eb 100644 --- a/event.go +++ b/event.go @@ -11,7 +11,6 @@ type EventType string const ( EventScheduled EventType = "Scheduled" EventStarted EventType = "Started" - EventRetrying EventType = "Retrying" EventSucceeded EventType = "Succeeded" EventFailed EventType = "Failed" EventCanceled EventType = "Canceled" @@ -20,12 +19,11 @@ const ( // WorkflowEvent carries information about a step lifecycle event. type WorkflowEvent struct { - Step Steper - Type EventType - Attempt uint64 - Err error - Duration time.Duration - BackoffDuration time.Duration // non-zero only for Retrying + Step Steper + Type EventType + Attempt uint64 + Err error + Duration time.Duration } // StepInfo is passed to StepInterceptor. @@ -76,13 +74,6 @@ type InterceptorReceiver interface { PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) } -// retryNotifier is a package-private interface implemented by the concrete -// type returned by NewAttemptEventSink. stepExecution uses it to deliver -// Retrying events (which bypass the interceptor chain) to the sink. -type retryNotifier interface { - onRetry(WorkflowEvent) -} - // terminalEventType maps an error to the corresponding terminal EventType. func terminalEventType(err error) EventType { if err == nil { @@ -145,27 +136,16 @@ func (s *stepEventSink) InterceptStep(ctx context.Context, info StepInfo, next f return err } -// attemptEventSink is the concrete type returned by NewAttemptEventSink. -// It implements both AttemptInterceptor and retryNotifier so that Started and -// Retrying events are delivered to the same sink function. -type attemptEventSink struct { - sink func(WorkflowEvent) -} - -// NewAttemptEventSink returns an AttemptInterceptor that emits a Started event -// for each attempt and a Retrying event after each failed attempt (before backoff). -// Retrying carries the failure error and the backoff duration. +// NewAttemptEventSink returns an AttemptInterceptor that emits an EventStarted +// event for each attempt. The attempt's error (if any) is available when +// InterceptAttempt returns. func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor { - return &attemptEventSink{sink: sink} -} - -func (s *attemptEventSink) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - s.sink(WorkflowEvent{ - Step: info.Step, - Type: EventStarted, - Attempt: info.Attempt, + return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + sink(WorkflowEvent{ + Step: info.Step, + Type: EventStarted, + Attempt: info.Attempt, + }) + return next(ctx) }) - return next(ctx) } - -func (s *attemptEventSink) onRetry(e WorkflowEvent) { s.sink(e) } diff --git a/event_test.go b/event_test.go index eae4fc9..bf24184 100644 --- a/event_test.go +++ b/event_test.go @@ -4,14 +4,13 @@ import ( "context" "errors" "testing" - "time" "github.com/stretchr/testify/assert" ) func TestEventTypeConstants(t *testing.T) { // Verify all constants exist and are distinct - types := []EventType{EventScheduled, EventStarted, EventRetrying, EventSucceeded, EventFailed, EventCanceled, EventSkipped} + types := []EventType{EventScheduled, EventStarted, EventSucceeded, EventFailed, EventCanceled, EventSkipped} seen := map[EventType]bool{} for _, et := range types { assert.False(t, seen[et], "duplicate EventType: %q", et) @@ -94,27 +93,6 @@ func TestNewStepEventSink_SkippedStep(t *testing.T) { assert.Equal(t, EventSkipped, events[1].Type) } -func TestNewStepEventSink_OnRetry_NotImplemented(t *testing.T) { - sink := NewStepEventSink(func(e WorkflowEvent) {}) - _, ok := sink.(retryNotifier) - assert.False(t, ok, "NewStepEventSink must NOT implement retryNotifier") -} - -func TestNewAttemptEventSink_OnRetry(t *testing.T) { - var events []WorkflowEvent - sink := NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - rn, ok := sink.(retryNotifier) - assert.True(t, ok, "NewAttemptEventSink should implement retryNotifier") - - boom := errors.New("boom") - rn.onRetry(WorkflowEvent{Type: EventRetrying, Attempt: 0, Err: boom, BackoffDuration: time.Second}) - - assert.Len(t, events, 1) - assert.Equal(t, EventRetrying, events[0].Type) - assert.Equal(t, boom, events[0].Err) -} - func TestNewAttemptEventSink_EmitsStarted(t *testing.T) { var events []WorkflowEvent sink := NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }) diff --git a/workflow.go b/workflow.go index 7735d77..8cdbe59 100644 --- a/workflow.go +++ b/workflow.go @@ -316,7 +316,6 @@ type stepExecution struct { step Steper state *State attempt uint64 - onRetry func(WorkflowEvent) } func isAllUpstreamScanned(ups map[Steper]StepResult) bool { @@ -445,22 +444,6 @@ func (ex *stepExecution) run(ctx context.Context) { } } - // Collect retryNotifiers from AttemptInterceptors. - // Retrying events fire inside RetryOption.Notify (between two next() calls), - // where the interceptor chain is unwound — they are delivered via a side-channel - // to any AttemptInterceptor that implements retryNotifier. - var retrySinks []func(WorkflowEvent) - for _, ic := range ex.w.AttemptInterceptors { - if rn, ok := ic.(retryNotifier); ok { - retrySinks = append(retrySinks, rn.onRetry) - } - } - ex.onRetry = func(e WorkflowEvent) { - for _, s := range retrySinks { - s(e) - } - } - var status StepStatus var err error @@ -500,8 +483,6 @@ func (ex *stepExecution) executeWithRetry(ctx context.Context) error { recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) } - ex.wireNotify(option) - attemptChain := ex.buildAttemptChain() var notAfter time.Time @@ -560,36 +541,6 @@ func (ex *stepExecution) runAttempt(ctx context.Context) error { return do(func() error { return ex.state.After(ctxStep, ex.step, err) }) } -func (ex *stepExecution) wireNotify(option *StepOption) { - if option == nil || option.RetryOption == nil { - return - } - // Deep-copy RetryOption before mutating its Notify field. - // option is a fresh StepOption from State.Option(), but its RetryOption pointer - // may be shared (e.g. when Workflow.DefaultOption carries a RetryOption) — a - // shallow copy of StepOption does not copy the pointed-to RetryOption. - ro := *option.RetryOption - option.RetryOption = &ro - - userNotify := option.RetryOption.Notify - option.RetryOption.Notify = func(err error, d time.Duration) { - // ex.attempt has already been incremented by the buildAttemptChain wrapper's - // defer, so subtract 1 to get the attempt number that just failed. - e := WorkflowEvent{ - Step: ex.step, - Type: EventRetrying, - Attempt: ex.attempt - 1, - Err: err, - BackoffDuration: d, - } - if ex.onRetry != nil { - ex.onRetry(e) - } - if userNotify != nil { - userNotify(err, d) - } - } -} func (w *Workflow) lease() bool { if w.leaseBucket == nil { diff --git a/workflow_test.go b/workflow_test.go index 7f1c117..a72668e 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -375,7 +375,7 @@ func TestStepExecution_SkippedStep(t *testing.T) { assert.Equal(t, []EventType{EventScheduled, EventSkipped}, eventTypes(events)) } -func TestStepExecution_RetryingEvent(t *testing.T) { +func TestStepExecution_RetryingStep(t *testing.T) { t.Parallel() var events []WorkflowEvent mu := sync.Mutex{} @@ -393,80 +393,25 @@ func TestStepExecution_RetryingEvent(t *testing.T) { } return nil }) - w := &Workflow{ - StepInterceptors: []StepInterceptor{ - NewStepEventSink(record), - }, - AttemptInterceptors: []AttemptInterceptor{ - NewAttemptEventSink(record), - }, - } - w.Add(Step(step).Retry(func(o *RetryOption) { - o.Attempts = 3 - o.Backoff = &backoff.ZeroBackOff{} - })) - assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, []EventType{ - EventScheduled, - EventStarted, EventRetrying, - EventStarted, EventRetrying, - EventStarted, EventSucceeded, - }, eventTypes(events)) -} - -func eventTypes(events []WorkflowEvent) []EventType { - types := make([]EventType, len(events)) - for i, e := range events { - types[i] = e.Type - } - return types -} - -func TestStepExecution_RetryingEventAttemptNumbers(t *testing.T) { - t.Parallel() - - var events []WorkflowEvent - mu := sync.Mutex{} - record := func(e WorkflowEvent) { - mu.Lock() - events = append(events, e) - mu.Unlock() - } - - callCount := 0 - step := Func("flaky", func(ctx context.Context) error { - callCount++ - if callCount < 3 { - return errors.New("not yet") - } - return nil - }) - w := &Workflow{ StepInterceptors: []StepInterceptor{NewStepEventSink(record)}, AttemptInterceptors: []AttemptInterceptor{NewAttemptEventSink(record)}, } w.Add(Step(step).Retry(func(o *RetryOption) { - o.Attempts = 5 + o.Attempts = 3 o.Backoff = &backoff.ZeroBackOff{} })) - assert.NoError(t, w.Do(context.Background())) - + // StepInterceptor sees Scheduled + terminal only; AttemptInterceptor sees + // one EventStarted per attempt; no Retrying events are emitted. assert.Equal(t, []EventType{ EventScheduled, - EventStarted, // attempt 0 - EventRetrying, // attempt 0 failed - EventStarted, // attempt 1 - EventRetrying, // attempt 1 failed - EventStarted, // attempt 2 succeeds + EventStarted, // attempt 0 (fails, retried) + EventStarted, // attempt 1 (fails, retried) + EventStarted, // attempt 2 (succeeds) EventSucceeded, }, eventTypes(events)) - retryingEvents := filterEvents(events, EventRetrying) - assert.Equal(t, uint64(0), retryingEvents[0].Attempt) - assert.Equal(t, uint64(1), retryingEvents[1].Attempt) - startedEvents := filterEvents(events, EventStarted) assert.Equal(t, uint64(0), startedEvents[0].Attempt) assert.Equal(t, uint64(1), startedEvents[1].Attempt) @@ -483,6 +428,14 @@ func TestWorkflow_NoInterceptors_NoRegression(t *testing.T) { assert.Equal(t, Succeeded, w.StateOf(step).GetStatus()) } +func eventTypes(events []WorkflowEvent) []EventType { + types := make([]EventType, len(events)) + for i, e := range events { + types[i] = e.Type + } + return types +} + func filterEvents(events []WorkflowEvent, et EventType) []WorkflowEvent { var rv []WorkflowEvent for _, e := range events { From d6020576a0e329d5accc7d4dbd9675284d9dfac3 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 13:18:06 +0000 Subject: [PATCH 19/29] refactor: remove EventSink/WorkflowEvent; replace event.go with interceptor.go MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Users can implement whatever event system they want on top of the interceptor interfaces — the framework does not need to prescribe WorkflowEvent, EventType constants, or NewStepEventSink/NewAttemptEventSink helpers. - Delete event.go (EventType, WorkflowEvent, NewStepEventSink, NewAttemptEventSink, terminalEventType, terminalStepStatusToEventType, stepEventSink) - Delete event_test.go - Add interceptor.go: StepInfo, AttemptInfo, StepInterceptor, AttemptInterceptor, StepInterceptorFunc, AttemptInterceptorFunc, InterceptorReceiver - Rewrite workflow_test.go and wrap_test.go to use interceptors directly - Update spec to reflect the clean minimal API Co-Authored-By: Claude Sonnet 4.6 --- .../2026-05-06-step-interceptor-design.md | 235 ++++++------------ event.go | 151 ----------- event_test.go | 111 --------- interceptor.go | 53 ++++ workflow_test.go | 80 +++--- wrap_test.go | 67 ++--- 6 files changed, 197 insertions(+), 500 deletions(-) delete mode 100644 event.go delete mode 100644 event_test.go create mode 100644 interceptor.go diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md index 485eddd..ea0e05b 100644 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -10,7 +10,7 @@ Currently, observability in go-workflow requires users to wire `BeforeStep`/`AfterStep` callbacks manually on individual steps. There is no structured way to observe all steps globally — no -lifecycle events, no attempt count, no timing. +lifecycle hooks, no attempt count, no timing. In production, you need to answer: which step is running right now? How many retries has step X done? How long did step Y take? None of these are answerable today without bespoke instrumentation. @@ -19,25 +19,22 @@ This design introduces a two-layer interceptor system that: - Provides global, structured observability across all steps - Is orthogonal to `BeforeStep`/`AfterStep` — they serve different scopes and both are preserved - Propagates automatically into nested `SubWorkflow`s -- Ships with built-in `EventSink` adapters for slog, OTel, Prometheus --- ## Concepts -### StepStatus vs EventType - -These are deliberately separate types serving different consumers. +### StepStatus vs the interceptor layers **`StepStatus`** is the state machine used by the orchestration engine. It is persistent and queryable. The `Condition` system reads it to decide whether to run downstream steps. -**`EventType`** is a stream of instantaneous observations for external consumers (logs, traces, -metrics). It is fire-and-forget. +The interceptors are a separate, orthogonal observability mechanism. They do not replace or alter +`StepStatus` — they wrap execution to give users structured hooks. -The key difference: `Running` is a single `StepStatus` that spans the entire retry loop, but -within it multiple `EventStarted` events occur. They cannot be merged without breaking the -`Condition` system. +The key difference: `Running` is a single `StepStatus` that spans the entire retry loop. Within +it, `AttemptInterceptor` fires multiple times (once per attempt). These cannot be merged without +breaking the `Condition` system. ``` StepStatus: Pending ──────────────────────────────► Running ──────────────► Succeeded @@ -46,31 +43,13 @@ StepStatus: Pending ─────────────────── └─────────────────────────────────────────────────────────────► Skipped └─────────────────────────────────────────────────────────────► Canceled -EventType: EventScheduled EventStarted EventStarted EventStarted EventSucceeded/EventFailed/EventCanceled - [attempt 0] [attempt 1] [attempt 2] +Interceptors: StepInterceptor.entry + AttemptInterceptor[attempt=0] + AttemptInterceptor[attempt=1] + AttemptInterceptor[attempt=2] + StepInterceptor.exit (err=nil → Succeeded) ``` -Mapping of EventType to where it is emitted: - -| EventType | StepStatus transition | Emitted in | -|------------------|-------------------------------|--------------------------| -| `EventScheduled` | `Pending → scheduled` | StepInterceptor entry | -| `EventStarted` | status stays `Running` | AttemptInterceptor entry | -| `EventSucceeded` | `Running → Succeeded` | StepInterceptor exit | -| `EventFailed` | `Running → Failed` | StepInterceptor exit | -| `EventCanceled` | `Running/Pending → Canceled` | StepInterceptor exit | -| `EventSkipped` | `Pending → Skipped` | StepInterceptor exit | - -**Ownership of events by layer:** - -- `StepInterceptor` sees only: `EventScheduled` + one terminal event. - It is not aware of how many retries occurred. -- `AttemptInterceptor` sees: `EventStarted` per attempt. The failure error for each - attempt is available when `InterceptAttempt` returns. - -`EventFailed` is **only** a terminal event. Individual attempt failures within a retry loop -are not separately named — they are observable via the error returned from `InterceptAttempt`. - --- ## Architecture @@ -88,65 +67,39 @@ StepInterceptor[0] └── [per-step AfterStep callbacks] ``` -**StepInterceptor** wraps the entire lifecycle of a step including all retry attempts. It sees -the step exactly once: entry on `EventScheduled`, exit on terminal status. It has no visibility -into individual retry attempts. It is the right place for OTel spans (one span per step, not per -attempt) and step-level metrics. +**StepInterceptor** wraps the entire lifecycle of a step including all retry attempts. It is +called exactly once per step: on entry `info.TerminalReason` tells you whether the step will +execute (`Pending`) or has already been determined terminal (`Skipped`/`Canceled`). On exit the +returned error reflects the final outcome. Right place for OTel spans (one span per step) and +step-level metrics. -**AttemptInterceptor** wraps each individual attempt (`Before → Do → After`). It sees every -attempt, including retried ones. The failure error for each attempt is available on return. -It is the right place for per-attempt logging, attempt-level tracing, and retry observability. +**AttemptInterceptor** wraps each individual attempt (`Before → Do → After`). It fires once per +attempt, including retried ones. The error returned by `next` is the attempt's failure (if any) +— the interceptor can inspect it before returning. Right place for per-attempt logging and +attempt-level tracing. -**BeforeStep/AfterStep** (existing) are a different mechanism from Interceptors. Interceptors are -workflow-level and apply globally to all steps. BeforeStep/AfterStep are step-level and are -configured per-step via `StepConfig`. They are orthogonal: in the execution stack, Interceptors -execute on the outside, BeforeStep/AfterStep execute on the inside — but conceptually they belong -to different layers of the system and serve different purposes. Users configure them independently. +**BeforeStep/AfterStep** (existing) are step-level callbacks configured per-step via `StepConfig`. +Interceptors are workflow-level and apply globally. They are orthogonal — interceptors execute on +the outside, BeforeStep/AfterStep execute on the inside. ### stepExecution (internal) -The current anonymous goroutine in `tick()` is replaced by a `stepExecution` struct that owns -the full step lifecycle: +The current anonymous goroutine in `tick()` is replaced by a `stepExecution` struct: ```go type stepExecution struct { w *Workflow step Steper state *State - attempt uint64 // single source of truth for attempt count + attempt uint64 // single source of truth; incremented in buildAttemptChain wrapper } ``` -`attempt` is incremented in a wrapper inside `buildAttemptChain` that surrounds the full -interceptor chain, so it always advances regardless of whether interceptors short-circuit. - ### tick() simplification -`tick()` is reduced to a single responsibility: **atomically claiming a step** to prevent +`tick()` is reduced to atomically claiming a step (private `scheduled` sentinel) to prevent double-spawning. All other logic moves into `stepExecution.run()`. -```go -// tick() — before -if w.lease() { - state.SetStatus(Running) // claim + status in one - go func() { ... runStep ... }() -} - -// tick() — after -if w.lease() { - state.SetStatus(scheduled) // claim only (private sentinel) - w.waitGroup.Add(1) - go (&stepExecution{...}).run(ctx) -} -``` - -`scheduled` is a private `StepStatus` sentinel. It is never exposed to users or visible in -`Condition` evaluation. Its only purpose is to prevent `tick()` from spawning the same step -twice. - -Condition evaluation moves into `stepExecution.run()`. This is safe because by the time a step -is eligible to run, all its upstreams are terminated — their status cannot change. - --- ## API @@ -154,24 +107,6 @@ is eligible to run, all its upstreams are terminated — their status cannot cha ### New Types ```go -// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). -// info.TerminalReason is Pending for steps that will execute normally. -// For Skipped or Canceled steps, TerminalReason is set and next must not be called. -type StepInterceptor interface { - InterceptStep(ctx context.Context, info StepInfo, next func(ctx context.Context) error) error -} - -// AttemptInterceptor intercepts each individual attempt (Before → Do → After). -type AttemptInterceptor interface { - InterceptAttempt(ctx context.Context, info AttemptInfo, next func(ctx context.Context) error) error -} - -// StepInterceptorFunc is a function adapter for StepInterceptor. -type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(ctx context.Context) error) error - -// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. -type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(ctx context.Context) error) error - // StepInfo is passed to StepInterceptor. type StepInfo struct { Step Steper @@ -185,29 +120,26 @@ type AttemptInfo struct { Attempt uint64 } -// EventType identifies a step lifecycle event. -type EventType string - -const ( - EventScheduled EventType = "Scheduled" - EventStarted EventType = "Started" - EventSucceeded EventType = "Succeeded" - EventFailed EventType = "Failed" - EventCanceled EventType = "Canceled" - EventSkipped EventType = "Skipped" -) - -// WorkflowEvent carries information about a step lifecycle event. -type WorkflowEvent struct { - Step Steper - Type EventType - Attempt uint64 - Err error - Duration time.Duration +// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). +// If info.TerminalReason != Pending, next must not be called — the step will not execute. +// Return nil in that case after observing the event. +type StepInterceptor interface { + InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error } -// InterceptorReceiver is implemented by steps that contain a sub-workflow -// and need to receive interceptors from the parent workflow. +// AttemptInterceptor intercepts each individual attempt (Before → Do → After). +// The error returned by next (if any) is the attempt's failure. +type AttemptInterceptor interface { + InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +} + +// StepInterceptorFunc is a function adapter for StepInterceptor. +type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error + +// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. +type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error + +// InterceptorReceiver is implemented by steps that contain a sub-workflow. type InterceptorReceiver interface { PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) } @@ -225,43 +157,40 @@ type Workflow struct { // AttemptInterceptors are called once per attempt, inside the retry loop. // Executed in order: [0] is outermost, [len-1] is innermost. - // BeforeStep/AfterStep callbacks are always the innermost layer. AttemptInterceptors []AttemptInterceptor } ``` -### Built-in EventSink adapters +### Usage examples ```go -// NewStepEventSink returns a StepInterceptor that emits EventScheduled and a terminal -// event (EventSucceeded/EventFailed/EventCanceled/EventSkipped) for every step. -// It is not aware of individual retry attempts. -func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor - -// NewAttemptEventSink returns an AttemptInterceptor that emits an EventStarted event -// for each attempt. The failure error (if any) is available when InterceptAttempt returns. -func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor -``` - -Usage examples: - -```go -// Step-level only +// OTel: one span per step w := &flow.Workflow{ StepInterceptors: []flow.StepInterceptor{ - flow.NewStepEventSink(func(e flow.WorkflowEvent) { - slog.Info("step event", - "step", flow.String(e.Step), "type", e.Type, - "err", e.Err, "duration", e.Duration, - ) + flow.StepInterceptorFunc(func(ctx context.Context, info flow.StepInfo, next func(context.Context) error) error { + ctx, span := tracer.Start(ctx, flow.String(info.Step)) + defer span.End() + if info.TerminalReason != flow.Pending { + return nil // step will not execute + } + err := next(ctx) + if err != nil { + span.RecordError(err) + } + return err }), }, } -// Full observability: step-level spans + per-attempt detail +// Per-attempt logging with attempt number and error w := &flow.Workflow{ - StepInterceptors: []flow.StepInterceptor{myOtelStepInterceptor}, - AttemptInterceptors: []flow.AttemptInterceptor{flow.NewAttemptEventSink(mySink)}, + AttemptInterceptors: []flow.AttemptInterceptor{ + flow.AttemptInterceptorFunc(func(ctx context.Context, info flow.AttemptInfo, next func(context.Context) error) error { + err := next(ctx) + slog.Info("attempt", "step", flow.String(info.Step), "attempt", info.Attempt, "err", err) + return err + }), + }, } ``` @@ -269,27 +198,25 @@ w := &flow.Workflow{ ## SubWorkflow Propagation -`SubWorkflow` implements `InterceptorReceiver`. Once in `executeWithRetry` (before the retry loop -starts), `stepExecution` checks whether the step implements this interface and injects the parent's -interceptors: +`SubWorkflow` implements `InterceptorReceiver`. Once in `executeWithRetry` (before the retry loop), +`stepExecution` injects the parent's interceptors into the child workflow: ```go -// in stepExecution.executeWithRetry(), once before the retry loop if recv, ok := ex.step.(InterceptorReceiver); ok { recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) } ``` -`SubWorkflow.PrependInterceptors` prepends parent interceptors before its own, so the execution -stack for inner steps is: +`PrependInterceptors` uses `make`+`copy` to build fresh slices, so parent interceptors are +prepended without aliasing the parent's backing array and without accumulating across `Reset()` +cycles. + +Execution stack for inner steps: ``` [parent StepInterceptors] → [child StepInterceptors] → retry → [parent AttemptInterceptors] → [child AttemptInterceptors] → Before → Do → After ``` -This is injected once per step execution (not per attempt) because `executeWithRetry` runs once -per step, outside the retry loop. - --- ## Skipped / Canceled in StepInterceptor @@ -298,8 +225,7 @@ Steps that are Skipped or Canceled by their `Condition` still enter the `StepInt `StepInfo.TerminalReason` carries the reason. The contract is: - If `TerminalReason != Pending`, the interceptor **must not** call `next`. -- The interceptor should emit `EventScheduled` then `EventSkipped`/`EventCanceled` and return nil. -- The built-in `NewStepEventSink` handles this correctly. +- Return nil after observing the terminal reason. --- @@ -319,7 +245,7 @@ Steps that are Skipped or Canceled by their `Condition` still enter the `StepInt | File | Change | |------|--------| | `workflow.go` | Add `StepInterceptors`, `AttemptInterceptors` fields; simplify `tick()`; add `stepExecution` | -| `event.go` | New file: `EventType`, `WorkflowEvent`, interceptor interfaces, `NewStepEventSink`, `NewAttemptEventSink` | +| `interceptor.go` | New file: interceptor interfaces, info types, func adapters, `InterceptorReceiver` | | `wrap.go` | `SubWorkflow` implements `InterceptorReceiver` | --- @@ -330,14 +256,13 @@ None. All questions from the brainstorm have been resolved: | Question | Resolution | |----------|------------| -| EventSink vs Interceptor | Interceptor; EventSink becomes a built-in adapter | +| EventSink vs Interceptor | Pure interceptor; no built-in EventSink adapter — users bring their own event types | | Per-step vs per-attempt | Both layers; different use cases | | Skipped/Canceled visibility | Enter StepInterceptor chain via TerminalReason | -| SubWorkflow propagation | PrependInterceptors on InterceptorReceiver | -| Retrying event | Removed; individual attempt failures observable via error returned from InterceptAttempt | +| SubWorkflow propagation | PrependInterceptors on InterceptorReceiver; once per step, make+copy | +| Retrying / BackoffDuration event | Removed; not worth the side-channel complexity; failure error available from InterceptAttempt | | attempt counter ownership | stepExecution owns it; incremented in buildAttemptChain wrapper | -| BeforeStep/AfterStep fate | Unchanged; orthogonal to Interceptors (step-level vs workflow-level) | +| BeforeStep/AfterStep fate | Unchanged; orthogonal to Interceptors | | Step identifier / name | No precomputed name; Step pointer is the identifier; callers call flow.String() | -| EventType naming | All constants prefixed with `Event` for consistency | -| retry.go changes | None needed | +| EventType / WorkflowEvent | Removed; users define their own event types | | Breaking changes | None | diff --git a/event.go b/event.go deleted file mode 100644 index 04c32eb..0000000 --- a/event.go +++ /dev/null @@ -1,151 +0,0 @@ -package flow - -import ( - "context" - "time" -) - -// EventType identifies a step lifecycle event. -type EventType string - -const ( - EventScheduled EventType = "Scheduled" - EventStarted EventType = "Started" - EventSucceeded EventType = "Succeeded" - EventFailed EventType = "Failed" - EventCanceled EventType = "Canceled" - EventSkipped EventType = "Skipped" -) - -// WorkflowEvent carries information about a step lifecycle event. -type WorkflowEvent struct { - Step Steper - Type EventType - Attempt uint64 - Err error - Duration time.Duration -} - -// StepInfo is passed to StepInterceptor. -// Step is the canonical identifier — same pointer used as map key in Workflow. -// Callers that need a human-readable name can call flow.String(info.Step). -type StepInfo struct { - Step Steper - TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute -} - -// AttemptInfo is passed to AttemptInterceptor. -// Interceptors that need timing should record time.Now() at the top of InterceptAttempt. -type AttemptInfo struct { - StepInfo - Attempt uint64 -} - -// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). -// If info.TerminalReason != Pending, next must not be called — the step will not execute. -// Return nil in that case after observing the event. -type StepInterceptor interface { - InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error -} - -// AttemptInterceptor intercepts each individual attempt (Before → Do → After). -type AttemptInterceptor interface { - InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error -} - -// StepInterceptorFunc is a function adapter for StepInterceptor. -type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error - -func (f StepInterceptorFunc) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { - return f(ctx, info, next) -} - -// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. -type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error - -func (f AttemptInterceptorFunc) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - return f(ctx, info, next) -} - -// InterceptorReceiver is implemented by steps that contain a sub-workflow. -// stepExecution calls PrependInterceptors once (in executeWithRetry, before the retry loop) -// so that parent interceptors wrap child interceptors for the entire step lifetime. -type InterceptorReceiver interface { - PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) -} - -// terminalEventType maps an error to the corresponding terminal EventType. -func terminalEventType(err error) EventType { - if err == nil { - return EventSucceeded - } - switch StatusFromError(err) { - case Canceled: - return EventCanceled - case Skipped: - return EventSkipped - default: - return EventFailed - } -} - -// terminalStepStatusToEventType converts a terminal StepStatus to its EventType counterpart. -func terminalStepStatusToEventType(s StepStatus) EventType { - switch s { - case Succeeded: - return EventSucceeded - case Failed: - return EventFailed - case Canceled: - return EventCanceled - case Skipped: - return EventSkipped - default: - return EventFailed - } -} - -// stepEventSink is the concrete type returned by NewStepEventSink. -type stepEventSink struct { - sink func(WorkflowEvent) -} - -// NewStepEventSink returns a StepInterceptor that emits Scheduled then a terminal -// event (Succeeded/Failed/Canceled/Skipped) for every step. -// It is not aware of individual retry attempts; use NewAttemptEventSink for that. -func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor { - return &stepEventSink{sink: sink} -} - -func (s *stepEventSink) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { - s.sink(WorkflowEvent{Step: info.Step, Type: EventScheduled}) - - if info.TerminalReason != Pending { - s.sink(WorkflowEvent{Step: info.Step, Type: terminalStepStatusToEventType(info.TerminalReason)}) - return nil - } - - start := time.Now() - err := next(ctx) - s.sink(WorkflowEvent{ - Step: info.Step, - Type: terminalEventType(err), - Err: err, - Duration: time.Since(start), - }) - return err -} - -// NewAttemptEventSink returns an AttemptInterceptor that emits an EventStarted -// event for each attempt. The attempt's error (if any) is available when -// InterceptAttempt returns. -func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor { - return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - sink(WorkflowEvent{ - Step: info.Step, - Type: EventStarted, - Attempt: info.Attempt, - }) - return next(ctx) - }) -} diff --git a/event_test.go b/event_test.go deleted file mode 100644 index bf24184..0000000 --- a/event_test.go +++ /dev/null @@ -1,111 +0,0 @@ -package flow - -import ( - "context" - "errors" - "testing" - - "github.com/stretchr/testify/assert" -) - -func TestEventTypeConstants(t *testing.T) { - // Verify all constants exist and are distinct - types := []EventType{EventScheduled, EventStarted, EventSucceeded, EventFailed, EventCanceled, EventSkipped} - seen := map[EventType]bool{} - for _, et := range types { - assert.False(t, seen[et], "duplicate EventType: %q", et) - seen[et] = true - } -} - -func TestStepInterceptorFunc(t *testing.T) { - called := false - var ic StepInterceptor = StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { - called = true - return next(ctx) - }) - _ = ic.InterceptStep(context.Background(), StepInfo{}, func(ctx context.Context) error { return nil }) - assert.True(t, called) -} - -func TestAttemptInterceptorFunc(t *testing.T) { - called := false - var ic AttemptInterceptor = AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - called = true - return next(ctx) - }) - _ = ic.InterceptAttempt(context.Background(), AttemptInfo{}, func(ctx context.Context) error { return nil }) - assert.True(t, called) -} - -func TestNewStepEventSink_SucceededStep(t *testing.T) { - var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - step := NoOp("a") - info := StepInfo{Step: step, TerminalReason: Pending} - err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { - return nil - }) - - assert.NoError(t, err) - assert.Len(t, events, 2) - assert.Equal(t, EventScheduled, events[0].Type) - assert.Equal(t, step, events[0].Step) - assert.Equal(t, EventSucceeded, events[1].Type) - assert.NotZero(t, events[1].Duration) -} - -func TestNewStepEventSink_FailedStep(t *testing.T) { - var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - step := NoOp("a") - boom := errors.New("boom") - info := StepInfo{Step: step, TerminalReason: Pending} - err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { - return boom - }) - - assert.Equal(t, boom, err) - assert.Len(t, events, 2) - assert.Equal(t, EventScheduled, events[0].Type) - assert.Equal(t, EventFailed, events[1].Type) - assert.Equal(t, boom, events[1].Err) -} - -func TestNewStepEventSink_SkippedStep(t *testing.T) { - var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - step := NoOp("a") - info := StepInfo{Step: step, TerminalReason: Skipped} - nextCalled := false - err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { - nextCalled = true - return nil - }) - - assert.NoError(t, err) - assert.False(t, nextCalled, "next must not be called for Skipped") - assert.Len(t, events, 2) - assert.Equal(t, EventScheduled, events[0].Type) - assert.Equal(t, EventSkipped, events[1].Type) -} - -func TestNewAttemptEventSink_EmitsStarted(t *testing.T) { - var events []WorkflowEvent - sink := NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - step := NoOp("a") - info := AttemptInfo{StepInfo: StepInfo{Step: step}, Attempt: 2} - err := sink.InterceptAttempt(context.Background(), info, func(ctx context.Context) error { - return nil - }) - - assert.NoError(t, err) - assert.Len(t, events, 1) - assert.Equal(t, EventStarted, events[0].Type) - assert.Equal(t, uint64(2), events[0].Attempt) - assert.Equal(t, step, events[0].Step) -} diff --git a/interceptor.go b/interceptor.go new file mode 100644 index 0000000..ce0df9e --- /dev/null +++ b/interceptor.go @@ -0,0 +1,53 @@ +package flow + +import "context" + +// StepInfo is passed to StepInterceptor. +// Step is the canonical identifier — same pointer used as map key in Workflow. +// Callers that need a human-readable name can call flow.String(info.Step). +type StepInfo struct { + Step Steper + TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute +} + +// AttemptInfo is passed to AttemptInterceptor. +// Interceptors that need timing should record time.Now() at the top of InterceptAttempt. +type AttemptInfo struct { + StepInfo + Attempt uint64 +} + +// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). +// If info.TerminalReason != Pending, next must not be called — the step will not execute. +// Return nil in that case after observing the event. +type StepInterceptor interface { + InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error +} + +// AttemptInterceptor intercepts each individual attempt (Before → Do → After). +// The error returned by next (if any) is the attempt's failure — it is available +// for inspection before being returned. +type AttemptInterceptor interface { + InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +} + +// StepInterceptorFunc is a function adapter for StepInterceptor. +type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error + +func (f StepInterceptorFunc) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { + return f(ctx, info, next) +} + +// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. +type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error + +func (f AttemptInterceptorFunc) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + return f(ctx, info, next) +} + +// InterceptorReceiver is implemented by steps that contain a sub-workflow. +// stepExecution calls PrependInterceptors once (in executeWithRetry, before the retry loop) +// so that parent interceptors wrap child interceptors for the entire step lifetime. +type InterceptorReceiver interface { + PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) +} diff --git a/workflow_test.go b/workflow_test.go index a72668e..2c4897d 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -308,17 +308,19 @@ func TestMaxConcurrencyDeadlockStress(t *testing.T) { func TestStepExecution_BasicSuccess(t *testing.T) { t.Parallel() - var events []WorkflowEvent + var stepped []Steper step := NoOp("a") w := &Workflow{ StepInterceptors: []StepInterceptor{ - NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), + StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + stepped = append(stepped, info.Step) + return next(ctx) + }), }, } w.Add(Step(step)) - err := w.Do(context.Background()) - assert.NoError(t, err) - assert.Equal(t, []EventType{EventScheduled, EventSucceeded}, eventTypes(events)) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []Steper{step}, stepped) } func TestStepExecution_StepInterceptorOrder(t *testing.T) { @@ -361,61 +363,55 @@ func TestStepExecution_AttemptInterceptorOrder(t *testing.T) { func TestStepExecution_SkippedStep(t *testing.T) { t.Parallel() - var events []WorkflowEvent + var terminalReason StepStatus step := NoOp("a") w := &Workflow{ StepInterceptors: []StepInterceptor{ - NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), + StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + terminalReason = info.TerminalReason + if info.TerminalReason != Pending { + return nil + } + return next(ctx) + }), }, } w.Add(Step(step).When(func(_ context.Context, _ map[Steper]StepResult) StepStatus { return Skipped })) assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, []EventType{EventScheduled, EventSkipped}, eventTypes(events)) + assert.Equal(t, Skipped, terminalReason) } func TestStepExecution_RetryingStep(t *testing.T) { t.Parallel() - var events []WorkflowEvent + var attempts []uint64 mu := sync.Mutex{} - record := func(e WorkflowEvent) { - mu.Lock() - events = append(events, e) - mu.Unlock() - } boom := errors.New("boom") - attempts := 0 + callCount := 0 step := Func("s", func(ctx context.Context) error { - attempts++ - if attempts < 3 { + callCount++ + if callCount < 3 { return boom } return nil }) w := &Workflow{ - StepInterceptors: []StepInterceptor{NewStepEventSink(record)}, - AttemptInterceptors: []AttemptInterceptor{NewAttemptEventSink(record)}, + AttemptInterceptors: []AttemptInterceptor{ + AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + mu.Lock() + attempts = append(attempts, info.Attempt) + mu.Unlock() + return next(ctx) + }), + }, } w.Add(Step(step).Retry(func(o *RetryOption) { o.Attempts = 3 o.Backoff = &backoff.ZeroBackOff{} })) assert.NoError(t, w.Do(context.Background())) - // StepInterceptor sees Scheduled + terminal only; AttemptInterceptor sees - // one EventStarted per attempt; no Retrying events are emitted. - assert.Equal(t, []EventType{ - EventScheduled, - EventStarted, // attempt 0 (fails, retried) - EventStarted, // attempt 1 (fails, retried) - EventStarted, // attempt 2 (succeeds) - EventSucceeded, - }, eventTypes(events)) - - startedEvents := filterEvents(events, EventStarted) - assert.Equal(t, uint64(0), startedEvents[0].Attempt) - assert.Equal(t, uint64(1), startedEvents[1].Attempt) - assert.Equal(t, uint64(2), startedEvents[2].Attempt) + assert.Equal(t, []uint64{0, 1, 2}, attempts) } func TestWorkflow_NoInterceptors_NoRegression(t *testing.T) { @@ -427,21 +423,3 @@ func TestWorkflow_NoInterceptors_NoRegression(t *testing.T) { assert.NoError(t, w.Do(context.Background())) assert.Equal(t, Succeeded, w.StateOf(step).GetStatus()) } - -func eventTypes(events []WorkflowEvent) []EventType { - types := make([]EventType, len(events)) - for i, e := range events { - types[i] = e.Type - } - return types -} - -func filterEvents(events []WorkflowEvent, et EventType) []WorkflowEvent { - var rv []WorkflowEvent - for _, e := range events { - if e.Type == et { - rv = append(rv, e) - } - } - return rv -} diff --git a/wrap_test.go b/wrap_test.go index a6368b1..a0487e7 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -158,12 +158,16 @@ func TestBuildStep(t *testing.T) { func TestSubWorkflow_InterceptorPropagation(t *testing.T) { t.Parallel() - var events []WorkflowEvent + var stepped []Steper mu := sync.Mutex{} - sink := NewStepEventSink(func(e WorkflowEvent) { + ic := StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { mu.Lock() - events = append(events, e) + stepped = append(stepped, info.Step) mu.Unlock() + if info.TerminalReason != Pending { + return nil + } + return next(ctx) }) innerStep := NoOp("inner") @@ -171,62 +175,61 @@ func TestSubWorkflow_InterceptorPropagation(t *testing.T) { sub := &mySubStep{} sub.Add(Step(innerStep)) - w := &Workflow{ - StepInterceptors: []StepInterceptor{sink}, - } + w := &Workflow{StepInterceptors: []StepInterceptor{ic}} w.Add(Step(sub)) assert.NoError(t, w.Do(context.Background())) - types := make([]EventType, len(events)) - for i, e := range events { - types[i] = e.Type - } - // At least 4 events: EventScheduled+Succeeded for sub, EventScheduled+Succeeded for innerStep - assert.GreaterOrEqual(t, len(events), 4) - assert.Contains(t, types, EventScheduled) - assert.Contains(t, types, EventSucceeded) - for _, e := range events { - assert.NotNil(t, e.Step) + // Parent interceptor must have seen both the sub step and the inner step. + assert.GreaterOrEqual(t, len(stepped), 2) + found := false + for _, s := range stepped { + if s == innerStep { + found = true + } } + assert.True(t, found, "parent interceptor should see inner step via propagation") } func TestSubWorkflow_ChildInterceptorPreserved(t *testing.T) { t.Parallel() - var parentEvents []WorkflowEvent - var childEvents []WorkflowEvent - pmu := sync.Mutex{} - cmu := sync.Mutex{} + var parentStepped, childStepped []Steper + pmu, cmu := sync.Mutex{}, sync.Mutex{} - parentSink := NewStepEventSink(func(e WorkflowEvent) { + parentIC := StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { pmu.Lock() - parentEvents = append(parentEvents, e) + parentStepped = append(parentStepped, info.Step) pmu.Unlock() + if info.TerminalReason != Pending { + return nil + } + return next(ctx) }) - childSink := NewStepEventSink(func(e WorkflowEvent) { + childIC := StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { cmu.Lock() - childEvents = append(childEvents, e) + childStepped = append(childStepped, info.Step) cmu.Unlock() + if info.TerminalReason != Pending { + return nil + } + return next(ctx) }) innerStep := NoOp("inner") type mySubStep struct{ SubWorkflow } sub := &mySubStep{} sub.Add(Step(innerStep)) - sub.w.StepInterceptors = []StepInterceptor{childSink} + sub.w.StepInterceptors = []StepInterceptor{childIC} - w := &Workflow{ - StepInterceptors: []StepInterceptor{parentSink}, - } + w := &Workflow{StepInterceptors: []StepInterceptor{parentIC}} w.Add(Step(sub)) assert.NoError(t, w.Do(context.Background())) - // Parent sees outer step (sub) + inner step (propagated) = at least 4 events - assert.GreaterOrEqual(t, len(parentEvents), 4) - // Child sees inner step only = at least 2 events - assert.GreaterOrEqual(t, len(childEvents), 2) + // Parent sees sub + inner (propagated); child sees inner only. + assert.GreaterOrEqual(t, len(parentStepped), 2) + assert.GreaterOrEqual(t, len(childStepped), 1) } func TestSubWorkflow_InterceptorNotDuplicatedOnRetry(t *testing.T) { From 329816560615f5fde2a91e3967c8546150ba80da Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 15:43:20 +0000 Subject: [PATCH 20/29] simplify interceptor API: remove StepInfo/AttemptInfo wrappers Replace struct parameters with direct values: - StepInterceptor.InterceptStep(ctx, step Steper, next) - AttemptInterceptor.InterceptAttempt(ctx, step Steper, attempt uint64, next) Skipped/Canceled steps now bypass the interceptor chain entirely instead of entering it with a TerminalReason field. This removes the footgun of calling next on a non-Pending step and simplifies the mental model. Co-Authored-By: Claude Sonnet 4.6 --- .../2026-05-06-step-interceptor-design.md | 48 ++++++------------- interceptor.go | 34 ++++--------- workflow.go | 48 ++++++------------- workflow_test.go | 23 ++++----- wrap_test.go | 23 +++------ 5 files changed, 56 insertions(+), 120 deletions(-) diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md index ea0e05b..5655bf2 100644 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -107,37 +107,23 @@ double-spawning. All other logic moves into `stepExecution.run()`. ### New Types ```go -// StepInfo is passed to StepInterceptor. -type StepInfo struct { - Step Steper - TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute -} - -// AttemptInfo is passed to AttemptInterceptor. -// Interceptors that need timing should record time.Now() at the top of InterceptAttempt. -type AttemptInfo struct { - StepInfo - Attempt uint64 -} - // StepInterceptor intercepts the full lifecycle of a step (all retry attempts). -// If info.TerminalReason != Pending, next must not be called — the step will not execute. -// Return nil in that case after observing the event. +// Skipped and Canceled steps do not enter the interceptor chain. type StepInterceptor interface { - InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error + InterceptStep(ctx context.Context, step Steper, next func(context.Context) error) error } // AttemptInterceptor intercepts each individual attempt (Before → Do → After). // The error returned by next (if any) is the attempt's failure. type AttemptInterceptor interface { - InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error + InterceptAttempt(ctx context.Context, step Steper, attempt uint64, next func(context.Context) error) error } // StepInterceptorFunc is a function adapter for StepInterceptor. -type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error +type StepInterceptorFunc func(ctx context.Context, step Steper, next func(context.Context) error) error // AttemptInterceptorFunc is a function adapter for AttemptInterceptor. -type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +type AttemptInterceptorFunc func(ctx context.Context, step Steper, attempt uint64, next func(context.Context) error) error // InterceptorReceiver is implemented by steps that contain a sub-workflow. type InterceptorReceiver interface { @@ -167,12 +153,9 @@ type Workflow struct { // OTel: one span per step w := &flow.Workflow{ StepInterceptors: []flow.StepInterceptor{ - flow.StepInterceptorFunc(func(ctx context.Context, info flow.StepInfo, next func(context.Context) error) error { - ctx, span := tracer.Start(ctx, flow.String(info.Step)) + flow.StepInterceptorFunc(func(ctx context.Context, step flow.Steper, next func(context.Context) error) error { + ctx, span := tracer.Start(ctx, flow.String(step)) defer span.End() - if info.TerminalReason != flow.Pending { - return nil // step will not execute - } err := next(ctx) if err != nil { span.RecordError(err) @@ -185,9 +168,9 @@ w := &flow.Workflow{ // Per-attempt logging with attempt number and error w := &flow.Workflow{ AttemptInterceptors: []flow.AttemptInterceptor{ - flow.AttemptInterceptorFunc(func(ctx context.Context, info flow.AttemptInfo, next func(context.Context) error) error { + flow.AttemptInterceptorFunc(func(ctx context.Context, step flow.Steper, attempt uint64, next func(context.Context) error) error { err := next(ctx) - slog.Info("attempt", "step", flow.String(info.Step), "attempt", info.Attempt, "err", err) + slog.Info("attempt", "step", flow.String(step), "attempt", attempt, "err", err) return err }), }, @@ -219,13 +202,11 @@ Execution stack for inner steps: --- -## Skipped / Canceled in StepInterceptor - -Steps that are Skipped or Canceled by their `Condition` still enter the `StepInterceptor` chain. -`StepInfo.TerminalReason` carries the reason. The contract is: +## Skipped / Canceled steps -- If `TerminalReason != Pending`, the interceptor **must not** call `next`. -- Return nil after observing the terminal reason. +Steps that are Skipped or Canceled by their `Condition` do **not** enter the interceptor chain. +Their final status is set directly and the interceptors are never invoked. Post-run status is +queryable via `workflow.StateOf(step).GetStatus()`. --- @@ -258,7 +239,8 @@ None. All questions from the brainstorm have been resolved: |----------|------------| | EventSink vs Interceptor | Pure interceptor; no built-in EventSink adapter — users bring their own event types | | Per-step vs per-attempt | Both layers; different use cases | -| Skipped/Canceled visibility | Enter StepInterceptor chain via TerminalReason | +| Skipped/Canceled visibility | Skipped/Canceled steps bypass interceptor chain entirely; query post-run via StateOf | +| StepInfo / AttemptInfo wrappers | Removed; step passed as Steper directly; attempt as uint64 directly | | SubWorkflow propagation | PrependInterceptors on InterceptorReceiver; once per step, make+copy | | Retrying / BackoffDuration event | Removed; not worth the side-channel complexity; failure error available from InterceptAttempt | | attempt counter ownership | stepExecution owns it; incremented in buildAttemptChain wrapper | diff --git a/interceptor.go b/interceptor.go index ce0df9e..5be4dce 100644 --- a/interceptor.go +++ b/interceptor.go @@ -2,47 +2,31 @@ package flow import "context" -// StepInfo is passed to StepInterceptor. -// Step is the canonical identifier — same pointer used as map key in Workflow. -// Callers that need a human-readable name can call flow.String(info.Step). -type StepInfo struct { - Step Steper - TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute -} - -// AttemptInfo is passed to AttemptInterceptor. -// Interceptors that need timing should record time.Now() at the top of InterceptAttempt. -type AttemptInfo struct { - StepInfo - Attempt uint64 -} - // StepInterceptor intercepts the full lifecycle of a step (all retry attempts). -// If info.TerminalReason != Pending, next must not be called — the step will not execute. -// Return nil in that case after observing the event. +// Skipped and Canceled steps do not enter the interceptor chain. type StepInterceptor interface { - InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error + InterceptStep(ctx context.Context, step Steper, next func(context.Context) error) error } // AttemptInterceptor intercepts each individual attempt (Before → Do → After). // The error returned by next (if any) is the attempt's failure — it is available // for inspection before being returned. type AttemptInterceptor interface { - InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error + InterceptAttempt(ctx context.Context, step Steper, attempt uint64, next func(context.Context) error) error } // StepInterceptorFunc is a function adapter for StepInterceptor. -type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error +type StepInterceptorFunc func(ctx context.Context, step Steper, next func(context.Context) error) error -func (f StepInterceptorFunc) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { - return f(ctx, info, next) +func (f StepInterceptorFunc) InterceptStep(ctx context.Context, step Steper, next func(context.Context) error) error { + return f(ctx, step, next) } // AttemptInterceptorFunc is a function adapter for AttemptInterceptor. -type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error +type AttemptInterceptorFunc func(ctx context.Context, step Steper, attempt uint64, next func(context.Context) error) error -func (f AttemptInterceptorFunc) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - return f(ctx, info, next) +func (f AttemptInterceptorFunc) InterceptAttempt(ctx context.Context, step Steper, attempt uint64, next func(context.Context) error) error { + return f(ctx, step, attempt, next) } // InterceptorReceiver is implemented by steps that contain a sub-workflow. diff --git a/workflow.go b/workflow.go index 8cdbe59..a171c38 100644 --- a/workflow.go +++ b/workflow.go @@ -419,39 +419,25 @@ func (ex *stepExecution) run(ctx context.Context) { cond = option.Condition } - terminalReason := Pending - if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { - terminalReason = nextStatus - } - - info := StepInfo{Step: ex.step, TerminalReason: terminalReason} - - // Build StepInterceptor chain. - // The innermost next is executeWithRetry for normal steps; a no-op for terminal steps - // (interceptors that observe terminalReason should not call next). - var stepNext func(context.Context) error - if terminalReason == Pending { - stepNext = ex.executeWithRetry - } else { - stepNext = func(ctx context.Context) error { return nil } - } - for i := len(ex.w.StepInterceptors) - 1; i >= 0; i-- { - ic := ex.w.StepInterceptors[i] - next := stepNext - icLocal := ic - stepNext = func(ctx context.Context) error { - return icLocal.InterceptStep(ctx, info, next) - } - } - var status StepStatus var err error - if terminalReason != Pending { - err = stepNext(ctx) - status = terminalReason + if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { + // Skipped/Canceled steps do not enter the interceptor chain. + status = nextStatus } else { ex.state.SetStatus(Running) + + // Build StepInterceptor chain; innermost next is executeWithRetry. + stepNext := func(ctx context.Context) error { return ex.executeWithRetry(ctx) } + for i := len(ex.w.StepInterceptors) - 1; i >= 0; i-- { + ic := ex.w.StepInterceptors[i] + next := stepNext + stepNext = func(ctx context.Context) error { + return ic.InterceptStep(ctx, ex.step, next) + } + } + err = stepNext(ctx) status = StatusFromError(err) if status == Failed { @@ -505,11 +491,7 @@ func (ex *stepExecution) buildAttemptChain() func(context.Context) error { next := chain icLocal := ic chain = func(ctx context.Context) error { - info := AttemptInfo{ - StepInfo: StepInfo{Step: ex.step}, - Attempt: ex.attempt, - } - return icLocal.InterceptAttempt(ctx, info, next) + return icLocal.InterceptAttempt(ctx, ex.step, ex.attempt, next) } } // Wrap the full attempt chain (including interceptors) so ex.attempt is always diff --git a/workflow_test.go b/workflow_test.go index 2c4897d..d278ea9 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -312,8 +312,8 @@ func TestStepExecution_BasicSuccess(t *testing.T) { step := NoOp("a") w := &Workflow{ StepInterceptors: []StepInterceptor{ - StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { - stepped = append(stepped, info.Step) + StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { + stepped = append(stepped, s) return next(ctx) }), }, @@ -327,7 +327,7 @@ func TestStepExecution_StepInterceptorOrder(t *testing.T) { t.Parallel() var order []string makeIC := func(name string) StepInterceptor { - return StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + return StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { order = append(order, name+":before") err := next(ctx) order = append(order, name+":after") @@ -346,7 +346,7 @@ func TestStepExecution_AttemptInterceptorOrder(t *testing.T) { t.Parallel() var order []string makeIC := func(name string) AttemptInterceptor { - return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + return AttemptInterceptorFunc(func(ctx context.Context, s Steper, attempt uint64, next func(context.Context) error) error { order = append(order, name+":before") err := next(ctx) order = append(order, name+":after") @@ -363,15 +363,12 @@ func TestStepExecution_AttemptInterceptorOrder(t *testing.T) { func TestStepExecution_SkippedStep(t *testing.T) { t.Parallel() - var terminalReason StepStatus + interceptorCalled := false step := NoOp("a") w := &Workflow{ StepInterceptors: []StepInterceptor{ - StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { - terminalReason = info.TerminalReason - if info.TerminalReason != Pending { - return nil - } + StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { + interceptorCalled = true return next(ctx) }), }, @@ -380,7 +377,7 @@ func TestStepExecution_SkippedStep(t *testing.T) { return Skipped })) assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, Skipped, terminalReason) + assert.False(t, interceptorCalled, "interceptor must not be called for skipped steps") } func TestStepExecution_RetryingStep(t *testing.T) { @@ -398,9 +395,9 @@ func TestStepExecution_RetryingStep(t *testing.T) { }) w := &Workflow{ AttemptInterceptors: []AttemptInterceptor{ - AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { + AttemptInterceptorFunc(func(ctx context.Context, s Steper, attempt uint64, next func(context.Context) error) error { mu.Lock() - attempts = append(attempts, info.Attempt) + attempts = append(attempts, attempt) mu.Unlock() return next(ctx) }), diff --git a/wrap_test.go b/wrap_test.go index a0487e7..c6619f0 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -160,13 +160,10 @@ func TestSubWorkflow_InterceptorPropagation(t *testing.T) { var stepped []Steper mu := sync.Mutex{} - ic := StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + ic := StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { mu.Lock() - stepped = append(stepped, info.Step) + stepped = append(stepped, s) mu.Unlock() - if info.TerminalReason != Pending { - return nil - } return next(ctx) }) @@ -197,22 +194,16 @@ func TestSubWorkflow_ChildInterceptorPreserved(t *testing.T) { var parentStepped, childStepped []Steper pmu, cmu := sync.Mutex{}, sync.Mutex{} - parentIC := StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + parentIC := StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { pmu.Lock() - parentStepped = append(parentStepped, info.Step) + parentStepped = append(parentStepped, s) pmu.Unlock() - if info.TerminalReason != Pending { - return nil - } return next(ctx) }) - childIC := StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + childIC := StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { cmu.Lock() - childStepped = append(childStepped, info.Step) + childStepped = append(childStepped, s) cmu.Unlock() - if info.TerminalReason != Pending { - return nil - } return next(ctx) }) @@ -236,7 +227,7 @@ func TestSubWorkflow_InterceptorNotDuplicatedOnRetry(t *testing.T) { t.Parallel() var count atomic.Int32 - sink := StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { + sink := StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { count.Add(1) return next(ctx) }) From 33f204cb8389e498c512d563b34ea0ab602acf68 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 23:47:42 +0000 Subject: [PATCH 21/29] move PrependInterceptors to Workflow + add IsolateInterceptors opt-out - Workflow itself now implements InterceptorReceiver, so any nested *Workflow used as a step inherits parent interceptors (not just SubWorkflow). - SubWorkflow.PrependInterceptors delegates to embedded Workflow. - New Workflow.IsolateInterceptors bool: when true, PrependInterceptors is a no-op and the child runs only its own interceptor stack. Co-Authored-By: Claude Sonnet 4.6 --- .../2026-05-06-step-interceptor-design.md | 29 +++++++-- workflow.go | 35 +++++++---- wrap_test.go | 60 +++++++++++++++++++ 3 files changed, 108 insertions(+), 16 deletions(-) diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md index 5655bf2..a4a67b0 100644 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -144,6 +144,10 @@ type Workflow struct { // AttemptInterceptors are called once per attempt, inside the retry loop. // Executed in order: [0] is outermost, [len-1] is innermost. AttemptInterceptors []AttemptInterceptor + + // IsolateInterceptors disables inheriting interceptors from a parent workflow. + // When true, PrependInterceptors is a no-op for this workflow. + IsolateInterceptors bool } ``` @@ -181,8 +185,12 @@ w := &flow.Workflow{ ## SubWorkflow Propagation -`SubWorkflow` implements `InterceptorReceiver`. Once in `executeWithRetry` (before the retry loop), -`stepExecution` injects the parent's interceptors into the child workflow: +`Workflow` itself implements `InterceptorReceiver` via `PrependInterceptors`, so any nested +workflow — whether embedded via `SubWorkflow` or used directly as a step — inherits its parent's +interceptors. `SubWorkflow.PrependInterceptors` simply delegates to the inner `Workflow`. + +Once in `executeWithRetry` (before the retry loop), `stepExecution` injects the parent's +interceptors into the child: ```go if recv, ok := ex.step.(InterceptorReceiver); ok { @@ -194,12 +202,25 @@ if recv, ok := ex.step.(InterceptorReceiver); ok { prepended without aliasing the parent's backing array and without accumulating across `Reset()` cycles. -Execution stack for inner steps: +### Opting out: `IsolateInterceptors` + +Set `Workflow.IsolateInterceptors = true` on a child to disable inheritance. `PrependInterceptors` +becomes a no-op and the child runs with only its own interceptor stack. Useful when the child +defines a self-contained observability pipeline (e.g., its own tracer / event sink) that must +not be wrapped by the parent. + +Execution stack for inner steps (default, inheritance enabled): ``` [parent StepInterceptors] → [child StepInterceptors] → retry → [parent AttemptInterceptors] → [child AttemptInterceptors] → Before → Do → After ``` +With `IsolateInterceptors = true`: + +``` +[child StepInterceptors] → retry → [child AttemptInterceptors] → Before → Do → After +``` + --- ## Skipped / Canceled steps @@ -227,7 +248,7 @@ queryable via `workflow.StateOf(step).GetStatus()`. |------|--------| | `workflow.go` | Add `StepInterceptors`, `AttemptInterceptors` fields; simplify `tick()`; add `stepExecution` | | `interceptor.go` | New file: interceptor interfaces, info types, func adapters, `InterceptorReceiver` | -| `wrap.go` | `SubWorkflow` implements `InterceptorReceiver` | +| `wrap.go` | `SubWorkflow.PrependInterceptors` delegates to embedded `Workflow.PrependInterceptors` | --- diff --git a/workflow.go b/workflow.go index a171c38..f19ecb5 100644 --- a/workflow.go +++ b/workflow.go @@ -46,6 +46,7 @@ type Workflow struct { StepInterceptors []StepInterceptor // per-step global interceptors AttemptInterceptors []AttemptInterceptor // per-attempt global interceptors + IsolateInterceptors bool // if true, do not inherit interceptors from a parent workflow StepBuilder // StepBuilder to call BuildStep() for Steps @@ -261,6 +262,25 @@ func (w *Workflow) reset() { } } +// PrependInterceptors implements InterceptorReceiver on Workflow itself, +// so a Workflow used directly as a step (or embedded via SubWorkflow) can +// inherit interceptors from its parent. If IsolateInterceptors is true, +// the call is a no-op and the workflow uses only its own interceptors. +func (w *Workflow) PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) { + if w.IsolateInterceptors { + return + } + combined := make([]StepInterceptor, len(step)+len(w.StepInterceptors)) + copy(combined, step) + copy(combined[len(step):], w.StepInterceptors) + w.StepInterceptors = combined + + combinedA := make([]AttemptInterceptor, len(attempt)+len(w.AttemptInterceptors)) + copy(combinedA, attempt) + copy(combinedA[len(attempt):], w.AttemptInterceptors) + w.AttemptInterceptors = combinedA +} + // Do starts the Step execution in topological order, // and waits until all Steps terminated. // @@ -523,7 +543,6 @@ func (ex *stepExecution) runAttempt(ctx context.Context) error { return do(func() error { return ex.state.After(ctxStep, ex.step, err) }) } - func (w *Workflow) lease() bool { if w.leaseBucket == nil { return true @@ -596,16 +615,8 @@ func (s *SubWorkflow) Do(ctx context.Context) error { return s.w.Do(ctx) } // Reset resets the sub-workflow to ready for BuildStep() func (s *SubWorkflow) Reset() { s.w = Workflow{} } -// PrependInterceptors implements InterceptorReceiver. -// Parent workflow interceptors are prepended so they execute outside child interceptors. +// PrependInterceptors implements InterceptorReceiver by delegating to the +// embedded Workflow. func (s *SubWorkflow) PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) { - combined := make([]StepInterceptor, len(step)+len(s.w.StepInterceptors)) - copy(combined, step) - copy(combined[len(step):], s.w.StepInterceptors) - s.w.StepInterceptors = combined - - combinedA := make([]AttemptInterceptor, len(attempt)+len(s.w.AttemptInterceptors)) - copy(combinedA, attempt) - copy(combinedA[len(attempt):], s.w.AttemptInterceptors) - s.w.AttemptInterceptors = combinedA + s.w.PrependInterceptors(step, attempt) } diff --git a/wrap_test.go b/wrap_test.go index c6619f0..48cefad 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -256,3 +256,63 @@ func TestSubWorkflow_InterceptorNotDuplicatedOnRetry(t *testing.T) { // once for the outer sub step, once for the inner step (regardless of retry count). assert.Equal(t, int32(2), count.Load()) } + +func TestWorkflow_AsStep_InheritsInterceptors(t *testing.T) { + t.Parallel() + + var stepped []Steper + mu := sync.Mutex{} + ic := StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { + mu.Lock() + stepped = append(stepped, s) + mu.Unlock() + return next(ctx) + }) + + innerStep := NoOp("inner") + child := &Workflow{} + child.Add(Step(innerStep)) + + parent := &Workflow{StepInterceptors: []StepInterceptor{ic}} + parent.Add(Step(child)) + assert.NoError(t, parent.Do(context.Background())) + + // parent's interceptor should see both the child workflow step and the inner step + found := false + for _, s := range stepped { + if s == innerStep { + found = true + } + } + assert.True(t, found, "parent interceptor should see inner step via Workflow.PrependInterceptors") +} + +func TestSubWorkflow_IsolateInterceptors(t *testing.T) { + t.Parallel() + + var parentCount, childCount atomic.Int32 + parentIC := StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { + parentCount.Add(1) + return next(ctx) + }) + childIC := StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { + childCount.Add(1) + return next(ctx) + }) + + innerStep := NoOp("inner") + type mySubStep struct{ SubWorkflow } + sub := &mySubStep{} + sub.Add(Step(innerStep)) + sub.w.StepInterceptors = []StepInterceptor{childIC} + sub.w.IsolateInterceptors = true + + w := &Workflow{StepInterceptors: []StepInterceptor{parentIC}} + w.Add(Step(sub)) + assert.NoError(t, w.Do(context.Background())) + + // parent only sees the outer sub step (1), not the inner step (since isolated) + assert.Equal(t, int32(1), parentCount.Load()) + // child only sees the inner step + assert.Equal(t, int32(1), childCount.Load()) +} From 71155c48f48d68c25b3d6238dee14f2524c5561a Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Wed, 6 May 2026 23:50:58 +0000 Subject: [PATCH 22/29] add openspec/specs/step-interceptor reflecting current API The 2026-05-06-structured-event-sink change was archived without syncing its delta to a main spec, and its delta still described the old EventSink/StepInfo/AttemptInfo/TerminalReason API. Create a fresh main spec that documents the current implementation: - StepInterceptor / AttemptInterceptor with direct Steper / uint64 parameters (no wrapper types). - Skipped/Canceled steps bypass the interceptor chain entirely. - Workflow itself implements InterceptorReceiver; SubWorkflow delegates. - Workflow.IsolateInterceptors opts out of inheritance from a parent. - Attempt counter ownership and short-circuit-safe increment. Co-Authored-By: Claude Sonnet 4.6 --- openspec/specs/step-interceptor/spec.md | 195 ++++++++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 openspec/specs/step-interceptor/spec.md diff --git a/openspec/specs/step-interceptor/spec.md b/openspec/specs/step-interceptor/spec.md new file mode 100644 index 0000000..ea8aaac --- /dev/null +++ b/openspec/specs/step-interceptor/spec.md @@ -0,0 +1,195 @@ +## ADDED Requirements + +### Requirement: Two-layer interceptor types + +go-workflow SHALL provide two orthogonal interceptor interfaces for global, structured +observability across all Steps in a Workflow: + +- `StepInterceptor` wraps the **full lifecycle** of a Step (all retry attempts, called once + per Step). +- `AttemptInterceptor` wraps **each individual attempt** (called once per attempt, including + retried attempts). + +```go +type StepInterceptor interface { + InterceptStep(ctx context.Context, step Steper, next func(context.Context) error) error +} +type AttemptInterceptor interface { + InterceptAttempt(ctx context.Context, step Steper, attempt uint64, next func(context.Context) error) error +} +``` + +Function adapters `StepInterceptorFunc` and `AttemptInterceptorFunc` are provided so callers +can pass plain functions. + +The `Steper` value passed to interceptors is the canonical Step identifier — the same +pointer used as the map key inside `Workflow`. Callers needing a human-readable name SHALL +call `flow.String(step)`. + +#### Scenario: StepInterceptor fires exactly once per step +- **WHEN** a Step executes (succeeds, fails, or retries any number of times) +- **THEN** each registered `StepInterceptor.InterceptStep` is invoked exactly once + +#### Scenario: AttemptInterceptor fires once per attempt +- **WHEN** a Step is retried N times (i.e. N+1 attempts total) +- **THEN** each registered `AttemptInterceptor.InterceptAttempt` is invoked N+1 times, + with `attempt` taking values `0, 1, ..., N` + +#### Scenario: Attempt error is observable +- **WHEN** an `AttemptInterceptor` calls `next(ctx)` and the attempt fails +- **THEN** `next` returns the attempt's error and the interceptor MAY inspect it before + returning + +--- + +### Requirement: Skipped and Canceled steps bypass the interceptor chain + +Steps whose `Condition` evaluates to a terminal status (`Skipped` or `Canceled`) before +execution SHALL NOT enter the `StepInterceptor` chain. Their final status is set directly +on the `State` and no interceptor is invoked. The post-run status remains queryable via +`workflow.StateOf(step).GetStatus()`. + +This avoids the footgun of forcing every interceptor to check whether the step "will +actually execute" before calling `next`. + +#### Scenario: Skipped step does not invoke interceptors +- **WHEN** a Step's Condition returns `Skipped` +- **THEN** no `StepInterceptor` or `AttemptInterceptor` is invoked for that step +- **AND** `workflow.StateOf(step).GetStatus()` returns `Skipped` + +#### Scenario: Canceled-by-condition step does not invoke interceptors +- **WHEN** a Step's Condition returns `Canceled` +- **THEN** no `StepInterceptor` or `AttemptInterceptor` is invoked for that step + +--- + +### Requirement: Workflow registration of interceptors + +`Workflow` SHALL expose two slice fields for global interceptor registration: + +```go +type Workflow struct { + StepInterceptors []StepInterceptor // [0] outermost, [len-1] innermost + AttemptInterceptors []AttemptInterceptor // [0] outermost, [len-1] innermost + IsolateInterceptors bool // if true, do not inherit from a parent workflow +} +``` + +Nil/empty slices mean no interceptors. Existing workflows without interceptors SHALL behave +identically to before this feature was added (zero-value safe, no allocations on the hot +path). + +#### Scenario: Outer-to-inner ordering +- **WHEN** `StepInterceptors = [A, B]` are registered +- **THEN** the execution order is `A:before → B:before → step → B:after → A:after` + +#### Scenario: No interceptors means no behavioural change +- **WHEN** a Workflow is constructed without `StepInterceptors` or `AttemptInterceptors` +- **THEN** all existing semantics (retries, conditions, BeforeStep/AfterStep) are unchanged + +--- + +### Requirement: BeforeStep / AfterStep are orthogonal to interceptors + +`BeforeStep` and `AfterStep` callbacks (configured per-step via `StepConfig`) execute +**inside** the `AttemptInterceptor` chain — they wrap a single `Do` call. Interceptors are +workflow-level and apply globally; `BeforeStep`/`AfterStep` are step-level and configured +per-step. Both mechanisms are preserved and complementary. + +The full execution stack for a single attempt is: + +``` +StepInterceptor[0] → ... → StepInterceptor[N-1] + → retry loop + → AttemptInterceptor[0] → ... → AttemptInterceptor[M-1] + → BeforeStep callbacks + → step.Do(ctx) + → AfterStep callbacks +``` + +#### Scenario: BeforeStep runs inside AttemptInterceptor +- **WHEN** an `AttemptInterceptor` calls `next(ctx)` +- **THEN** the chain reaches the per-step `BeforeStep` callbacks before `step.Do` runs + +--- + +### Requirement: Interceptor propagation to nested workflows + +`Workflow` SHALL implement the `InterceptorReceiver` interface so that when a `*Workflow` +(or a step embedding `SubWorkflow`) is used as a Step inside another Workflow, the parent's +interceptors are prepended to the child's interceptor stack. + +```go +type InterceptorReceiver interface { + PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) +} +``` + +`stepExecution` calls `PrependInterceptors` exactly once per step, in `executeWithRetry` +before the retry loop begins. The implementation SHALL use `make`+`copy` to construct fresh +slices, so: + +- Parent backing arrays are never aliased. +- Repeated `Do()` runs (across `Reset()` cycles) do not accumulate prepended interceptors. + +`SubWorkflow.PrependInterceptors` SHALL delegate to the embedded `Workflow.PrependInterceptors`. + +#### Scenario: Nested *Workflow inherits parent interceptors +- **GIVEN** a parent Workflow with a `StepInterceptor` X, and a child `*Workflow` containing + step `S` added as a step in the parent +- **WHEN** the parent runs +- **THEN** X is invoked for both the child workflow step and the inner step S + +#### Scenario: SubWorkflow inherits parent interceptors +- **GIVEN** a parent Workflow with a `StepInterceptor` X, and a step embedding `SubWorkflow` + containing step `S` +- **WHEN** the parent runs +- **THEN** X is invoked for both the outer step and the inner step S + +#### Scenario: PrependInterceptors does not duplicate across retries +- **WHEN** a sub-workflow step is retried N times +- **THEN** parent interceptors are prepended exactly once, not N times + +#### Scenario: PrependInterceptors does not accumulate across Reset +- **WHEN** a workflow that received prepended interceptors is reset and run again as a child +- **THEN** the parent's interceptors are present exactly once, not duplicated + +--- + +### Requirement: Opting out of inheritance via IsolateInterceptors + +A nested `Workflow` MAY set `IsolateInterceptors = true` to opt out of inheriting +interceptors from its parent. When true, `Workflow.PrependInterceptors` SHALL be a no-op +and the workflow runs only with its own registered interceptors. + +This is intended for self-contained sub-workflows that define their own observability +pipeline (e.g., their own tracer or event sink) that must not be wrapped by parent +interceptors. + +#### Scenario: Isolated child does not see parent interceptors +- **GIVEN** a parent Workflow with `StepInterceptor` X and a child Workflow with + `IsolateInterceptors = true` and its own `StepInterceptor` Y, containing inner step S +- **WHEN** the parent runs the child as a step +- **THEN** X is invoked exactly once (for the child workflow step itself) +- **AND** Y is invoked for inner step S +- **AND** X is NOT invoked for inner step S + +--- + +### Requirement: Attempt counter ownership and increment timing + +The internal `stepExecution` SHALL own the attempt counter (`uint64`), exposed to +`AttemptInterceptor` as the `attempt` parameter. The counter is incremented after each +attempt completes — including attempts that are short-circuited by an +`AttemptInterceptor` (e.g., one that returns without calling `next`). + +This guarantees the value passed as `attempt` is monotonically increasing and zero-indexed, +regardless of interceptor behaviour. + +#### Scenario: Attempt counter starts at zero +- **WHEN** a Step's first attempt runs +- **THEN** the `attempt` argument to `AttemptInterceptor.InterceptAttempt` is `0` + +#### Scenario: Attempt counter increments even when interceptor short-circuits +- **WHEN** an `AttemptInterceptor` returns without calling `next` +- **THEN** the next attempt (if retried) still receives `attempt = previous + 1` From 028db92b973ef4936c99931680d140f61d525fa6 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Thu, 7 May 2026 00:36:04 +0000 Subject: [PATCH 23/29] fix: PrependInterceptors must not accumulate across Do() runs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously, PrependInterceptors wrote the combined slice back into the child's StepInterceptors / AttemptInterceptors. The make+copy only avoided backing-array aliasing — it did not prevent permanent mutation of the receiver. As a result, running the same parent multiple times made the parent's interceptors stack up on the child (N+1 invocations on the Nth run). Fix: keep StepInterceptors / AttemptInterceptors immutable after construction. PrependInterceptors writes to private inheritedStep / inheritedAttempt slices instead. Run-time chain construction goes through effectiveStepInterceptors / effectiveAttemptInterceptors which build an ephemeral [inherited..., base...] slice. The inherited slices are cleared after waitGroup.Wait() at the end of each Do() so the next run starts clean. Also add TestSubWorkflow_PrependInterceptorsIdempotentAcrossDo covering 3 sequential runs of the same parent+child pair. Co-Authored-By: Claude Sonnet 4.6 --- workflow.go | 79 +++++++++++++++++++++++++++++++++++++++++----------- wrap_test.go | 33 ++++++++++++++++++++++ 2 files changed, 96 insertions(+), 16 deletions(-) diff --git a/workflow.go b/workflow.go index f19ecb5..fb070c0 100644 --- a/workflow.go +++ b/workflow.go @@ -44,14 +44,21 @@ type Workflow struct { Clock clock.Clock // Clock for retry and unit test DefaultOption *StepOption // DefaultOption is the default option for all Steps - StepInterceptors []StepInterceptor // per-step global interceptors - AttemptInterceptors []AttemptInterceptor // per-attempt global interceptors + StepInterceptors []StepInterceptor // per-step global interceptors (immutable base) + AttemptInterceptors []AttemptInterceptor // per-attempt global interceptors (immutable base) IsolateInterceptors bool // if true, do not inherit interceptors from a parent workflow StepBuilder // StepBuilder to call BuildStep() for Steps steps map[Steper]*State // the internal states of Steps + // inheritedStep / inheritedAttempt are populated by PrependInterceptors at the + // start of each Do() (parent → child) and cleared by reset(). They are never + // merged into StepInterceptors / AttemptInterceptors so the user-supplied base + // stays untouched and repeated runs do not accumulate. + inheritedStep []StepInterceptor + inheritedAttempt []AttemptInterceptor + statusChange *sync.Cond // a condition to signal the status change to proceed tick leaseBucket chan struct{} // constraint max concurrency of running Steps, nil means no limit waitGroup sync.WaitGroup // to prevent goroutine leak @@ -266,19 +273,50 @@ func (w *Workflow) reset() { // so a Workflow used directly as a step (or embedded via SubWorkflow) can // inherit interceptors from its parent. If IsolateInterceptors is true, // the call is a no-op and the workflow uses only its own interceptors. +// +// The inherited slices are stored separately from StepInterceptors / +// AttemptInterceptors so the user-supplied base is never mutated and +// repeated runs do not accumulate. func (w *Workflow) PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) { if w.IsolateInterceptors { return } - combined := make([]StepInterceptor, len(step)+len(w.StepInterceptors)) - copy(combined, step) - copy(combined[len(step):], w.StepInterceptors) - w.StepInterceptors = combined + if len(step) > 0 { + merged := make([]StepInterceptor, 0, len(step)+len(w.inheritedStep)) + merged = append(merged, step...) + merged = append(merged, w.inheritedStep...) + w.inheritedStep = merged + } + if len(attempt) > 0 { + mergedA := make([]AttemptInterceptor, 0, len(attempt)+len(w.inheritedAttempt)) + mergedA = append(mergedA, attempt...) + mergedA = append(mergedA, w.inheritedAttempt...) + w.inheritedAttempt = mergedA + } +} + +// effectiveStepInterceptors returns the chain to invoke for this run: +// inherited (from parent, if any) prepended to the user-configured base. +// The result is never written back to either field. +func (w *Workflow) effectiveStepInterceptors() []StepInterceptor { + if len(w.inheritedStep) == 0 { + return w.StepInterceptors + } + out := make([]StepInterceptor, 0, len(w.inheritedStep)+len(w.StepInterceptors)) + out = append(out, w.inheritedStep...) + out = append(out, w.StepInterceptors...) + return out +} - combinedA := make([]AttemptInterceptor, len(attempt)+len(w.AttemptInterceptors)) - copy(combinedA, attempt) - copy(combinedA[len(attempt):], w.AttemptInterceptors) - w.AttemptInterceptors = combinedA +// effectiveAttemptInterceptors mirrors effectiveStepInterceptors for AttemptInterceptors. +func (w *Workflow) effectiveAttemptInterceptors() []AttemptInterceptor { + if len(w.inheritedAttempt) == 0 { + return w.AttemptInterceptors + } + out := make([]AttemptInterceptor, 0, len(w.inheritedAttempt)+len(w.AttemptInterceptors)) + out = append(out, w.inheritedAttempt...) + out = append(out, w.AttemptInterceptors...) + return out } // Do starts the Step execution in topological order, @@ -311,6 +349,11 @@ func (w *Workflow) Do(ctx context.Context) error { w.statusChange.L.Unlock() // ensure all goroutines are exited w.waitGroup.Wait() + // Clear inherited interceptors set by a parent during this run so that the + // next time this workflow runs (under any parent, or standalone) it starts + // fresh and PrependInterceptors does not accumulate across runs. + w.inheritedStep = nil + w.inheritedAttempt = nil // return the error err := make(ErrWorkflow) for step, state := range w.steps { @@ -450,8 +493,9 @@ func (ex *stepExecution) run(ctx context.Context) { // Build StepInterceptor chain; innermost next is executeWithRetry. stepNext := func(ctx context.Context) error { return ex.executeWithRetry(ctx) } - for i := len(ex.w.StepInterceptors) - 1; i >= 0; i-- { - ic := ex.w.StepInterceptors[i] + stepICs := ex.w.effectiveStepInterceptors() + for i := len(stepICs) - 1; i >= 0; i-- { + ic := stepICs[i] next := stepNext stepNext = func(ctx context.Context) error { return ic.InterceptStep(ctx, ex.step, next) @@ -484,9 +528,11 @@ func (ex *stepExecution) run(ctx context.Context) { func (ex *stepExecution) executeWithRetry(ctx context.Context) error { option := ex.state.Option() - // Propagate interceptors to SubWorkflow once — before the retry loop starts. + // Propagate the effective chain (inherited prefix + this workflow's own base) + // so multi-level nesting (grandparent → parent → child) accumulates correctly + // within one run, while the user-supplied base on each workflow stays untouched. if recv, ok := ex.step.(InterceptorReceiver); ok { - recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) + recv.PrependInterceptors(ex.w.effectiveStepInterceptors(), ex.w.effectiveAttemptInterceptors()) } attemptChain := ex.buildAttemptChain() @@ -506,8 +552,9 @@ func (ex *stepExecution) buildAttemptChain() func(context.Context) error { chain := func(ctx context.Context) error { return ex.runAttempt(ctx) } - for i := len(ex.w.AttemptInterceptors) - 1; i >= 0; i-- { - ic := ex.w.AttemptInterceptors[i] + attemptICs := ex.w.effectiveAttemptInterceptors() + for i := len(attemptICs) - 1; i >= 0; i-- { + ic := attemptICs[i] next := chain icLocal := ic chain = func(ctx context.Context) error { diff --git a/wrap_test.go b/wrap_test.go index 48cefad..4f1d45f 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -287,6 +287,39 @@ func TestWorkflow_AsStep_InheritsInterceptors(t *testing.T) { assert.True(t, found, "parent interceptor should see inner step via Workflow.PrependInterceptors") } +// TestSubWorkflow_PrependInterceptorsIdempotentAcrossDo ensures that running the +// same parent (with a sub-workflow child) multiple times does NOT accumulate +// prepended interceptors on the child. The parent's interceptor should fire +// exactly twice per run (outer sub step + inner step), regardless of how many +// times Do() is called. +func TestSubWorkflow_PrependInterceptorsIdempotentAcrossDo(t *testing.T) { + t.Parallel() + + var count atomic.Int32 + ic := StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { + count.Add(1) + return next(ctx) + }) + + innerStep := NoOp("inner") + type mySubStep struct{ SubWorkflow } + sub := &mySubStep{} + sub.Add(Step(innerStep)) + + parent := &Workflow{StepInterceptors: []StepInterceptor{ic}} + parent.Add(Step(sub)) + + const runs = 3 + for i := 0; i < runs; i++ { + count.Store(0) + // reset both parent and child step states so the workflow is re-runnable + assert.NoError(t, parent.Reset()) + assert.NoError(t, parent.Do(context.Background())) + assert.Equal(t, int32(2), count.Load(), + "run %d: parent interceptor must fire exactly 2 times (outer sub + inner), accumulation detected", i) + } +} + func TestSubWorkflow_IsolateInterceptors(t *testing.T) { t.Parallel() From df41a9be2cd3cd07dcbf9c1122c0ec6345813491 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Thu, 7 May 2026 00:58:56 +0000 Subject: [PATCH 24/29] evaluate Condition inline in tick(), remove scheduled sentinel MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two coupled fixes addressing Copilot review: 1. Skipped/Canceled steps no longer consume a concurrency lease or spawn a worker goroutine. tick() evaluates each runnable step's Condition inline; terminal results are settled directly on State without going through the worker path. Previously, every step — including those about to be Skipped — took a lease, spawned a goroutine, evaluated condition, then released the lease. Under MaxConcurrency=1 with many condition-skipped steps this serialized work that could be settled synchronously. 2. The private "scheduled" StepStatus sentinel is removed. Its sole purpose was preventing tick() from double-spawning a step before the worker set Running. Since tick() now sets Running itself (under statusChange.L, before spawning the worker), no sentinel is needed. As a side effect, StateOf(step).GetStatus() can no longer return the undocumented "scheduled" string — only the public StepStatus values. tick() now re-iterates within a single call when it inline-settles any step, so newly-unblocked downstream steps are picked up without waiting for a signalStatusChange (which is never fired when no worker is spawned). Add TestSkippedStep_DoesNotConsumeLease (MaxConcurrency=1, a→b(skip) →c) asserting b never enters the AttemptInterceptor path while a and c do. Update openspec spec and design doc. Co-Authored-By: Claude Sonnet 4.6 --- .../2026-05-06-step-interceptor-design.md | 20 ++- openspec/specs/step-interceptor/spec.md | 16 ++- workflow.go | 134 ++++++++++-------- workflow_test.go | 52 +++++++ 4 files changed, 159 insertions(+), 63 deletions(-) diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md index a4a67b0..878f409 100644 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md @@ -97,8 +97,22 @@ type stepExecution struct { ### tick() simplification -`tick()` is reduced to atomically claiming a step (private `scheduled` sentinel) to prevent -double-spawning. All other logic moves into `stepExecution.run()`. +`tick()` evaluates each runnable Step's `Condition` inline: + +- If the Condition returns a terminal status (`Skipped` / `Canceled`), the Step's + `StepResult` is set directly and execution moves on. No goroutine is spawned, no + `MaxConcurrency` lease is consumed, no interceptor runs. +- Otherwise, `tick()` takes a lease, sets the status to `Running`, and spawns a worker + that runs the interceptor chain. + +Because the worker's status flip to `Running` happens under `statusChange.L` *before* the +goroutine is spawned, a subsequent `tick()` cannot see the Step as `Pending` and double- +spawn it. No `scheduled` sentinel is needed, and `StateOf(step).GetStatus()` only ever +returns documented public `StepStatus` values. + +When a Step is settled inline, `tick()` re-iterates within the same call so newly- +unblocked downstream Steps are picked up immediately (no signal would otherwise wake the +main loop). --- @@ -235,7 +249,7 @@ queryable via `workflow.StateOf(step).GetStatus()`. - `BeforeStep` / `AfterStep` / `Input` / `Output` — API and behavior unchanged - `StepConfig`, `StepOption`, `RetryOption` — unchanged -- `StepStatus` — no new exported values; `scheduled` is private +- `StepStatus` — no new values; only documented public statuses are ever observable - `Condition` system — unchanged - `SubWorkflow` embedding pattern — unchanged, just gains `PrependInterceptors` - No breaking changes to existing workflow definitions diff --git a/openspec/specs/step-interceptor/spec.md b/openspec/specs/step-interceptor/spec.md index ea8aaac..669ce45 100644 --- a/openspec/specs/step-interceptor/spec.md +++ b/openspec/specs/step-interceptor/spec.md @@ -45,12 +45,15 @@ call `flow.String(step)`. ### Requirement: Skipped and Canceled steps bypass the interceptor chain Steps whose `Condition` evaluates to a terminal status (`Skipped` or `Canceled`) before -execution SHALL NOT enter the `StepInterceptor` chain. Their final status is set directly -on the `State` and no interceptor is invoked. The post-run status remains queryable via +execution SHALL NOT enter the `StepInterceptor` chain. The Workflow SHALL evaluate the +Condition inline in the scheduling loop (`tick()`) and settle the step's terminal +`StepResult` directly — without spawning a worker goroutine and without consuming a +`MaxConcurrency` lease. The post-run status remains queryable via `workflow.StateOf(step).GetStatus()`. This avoids the footgun of forcing every interceptor to check whether the step "will -actually execute" before calling `next`. +actually execute" before calling `next`, and ensures terminal-by-condition steps do not +serialize behind a low concurrency limit. #### Scenario: Skipped step does not invoke interceptors - **WHEN** a Step's Condition returns `Skipped` @@ -61,6 +64,13 @@ actually execute" before calling `next`. - **WHEN** a Step's Condition returns `Canceled` - **THEN** no `StepInterceptor` or `AttemptInterceptor` is invoked for that step +#### Scenario: Skipped step does not consume a concurrency lease +- **GIVEN** a Workflow with `MaxConcurrency = 1` and a chain `a → b → c` where `b`'s + Condition returns `Skipped` +- **WHEN** the Workflow runs +- **THEN** `b` is settled inline; no worker goroutine is spawned for `b`; `b` does not + occupy the single available lease while `a` or `c` are running + --- ### Requirement: Workflow registration of interceptors diff --git a/workflow.go b/workflow.go index fb070c0..8c462eb 100644 --- a/workflow.go +++ b/workflow.go @@ -370,10 +370,6 @@ func (w *Workflow) Do(ctx context.Context) error { const scanned StepStatus = "scanned" // a private status for preflight -// scheduled is a private StepStatus sentinel used by tick() to atomically -// claim a step and prevent double-spawning. Never exposed to users. -const scheduled StepStatus = "scheduled" - type stepExecution struct { w *Workflow step Steper @@ -440,30 +436,69 @@ func (w *Workflow) preflight() error { // tick will not block, it starts a goroutine for each runnable Step. // tick returns true if all steps in all phases are terminated. +// +// The Step's Condition is evaluated here (in the tick goroutine, holding +// statusChange.L) so that: +// - Steps whose Condition resolves to a terminal status (Skipped/Canceled) +// are settled inline without spawning a goroutine or consuming a +// concurrency lease. +// - Steps that will execute have their status set to Running before the +// worker goroutine is spawned, so a subsequent tick cannot double-spawn +// them. +// +// Inline-settled steps may unblock downstream steps in the same tick. Because +// no goroutine is spawned for them, no signalStatusChange is fired — so we +// loop until a single pass produces no inline progress, otherwise the main +// loop in Do() would Wait() forever for a signal that never comes. func (w *Workflow) tick(ctx context.Context) bool { - if w.IsTerminated() { - return true - } - for step := range w.steps { - state := w.StateOf(step) - // we only process pending Steps - if state.GetStatus() != Pending { - continue + for { + if w.IsTerminated() { + return true } - // we only process Steps whose all upstreams are terminated - ups := w.UpstreamOf(step) - if isAnyUpstreamNotTerminated(ups) { - continue + progressed := false + for step := range w.steps { + state := w.StateOf(step) + // we only process pending Steps + if state.GetStatus() != Pending { + continue + } + // we only process Steps whose all upstreams are terminated + ups := w.UpstreamOf(step) + if isAnyUpstreamNotTerminated(ups) { + continue + } + + // Evaluate Condition inline. If terminal (Skipped/Canceled), settle + // the step here — no goroutine, no lease, no interceptor chain. + cond := DefaultCondition + if option := state.Option(); option != nil && option.Condition != nil { + cond = option.Condition + } + if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { + state.SetStepResult(StepResult{ + Status: nextStatus, + FinishedAt: w.Clock.Now(), + }) + progressed = true + continue + } + + // Step will execute: take a lease and spawn a worker goroutine. + // SetStatus(Running) happens here (under statusChange.L) so a + // subsequent tick won't see it as Pending and double-spawn. + if w.lease() { + state.SetStatus(Running) + w.waitGroup.Add(1) + ex := &stepExecution{w: w, step: step, state: state} + go ex.run(ctx) + } } - // kick off the Step - if w.lease() { - state.SetStatus(scheduled) - w.waitGroup.Add(1) - ex := &stepExecution{w: w, step: step, state: state} - go ex.run(ctx) + // If we settled any step inline this pass, re-iterate to give downstream + // steps a chance to be picked up without waiting for a signal. + if !progressed { + return false } } - return false } func (w *Workflow) signalStatusChange() { @@ -475,42 +510,27 @@ func (w *Workflow) signalStatusChange() { func (ex *stepExecution) run(ctx context.Context) { defer ex.w.waitGroup.Done() - ups := ex.w.UpstreamOf(ex.step) - option := ex.state.Option() - cond := DefaultCondition - if option != nil && option.Condition != nil { - cond = option.Condition - } - - var status StepStatus - var err error - - if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { - // Skipped/Canceled steps do not enter the interceptor chain. - status = nextStatus - } else { - ex.state.SetStatus(Running) - - // Build StepInterceptor chain; innermost next is executeWithRetry. - stepNext := func(ctx context.Context) error { return ex.executeWithRetry(ctx) } - stepICs := ex.w.effectiveStepInterceptors() - for i := len(stepICs) - 1; i >= 0; i-- { - ic := stepICs[i] - next := stepNext - stepNext = func(ctx context.Context) error { - return ic.InterceptStep(ctx, ex.step, next) - } + // By the time we get here, tick() has already evaluated the Condition + // (terminal results are settled inline) and set the status to Running. + // Build the StepInterceptor chain; innermost next is executeWithRetry. + stepNext := func(ctx context.Context) error { return ex.executeWithRetry(ctx) } + stepICs := ex.w.effectiveStepInterceptors() + for i := len(stepICs) - 1; i >= 0; i-- { + ic := stepICs[i] + next := stepNext + stepNext = func(ctx context.Context) error { + return ic.InterceptStep(ctx, ex.step, next) } + } - err = stepNext(ctx) - status = StatusFromError(err) - if status == Failed { - switch { - case DefaultIsCanceled(err), - errors.Is(err, context.Canceled), - errors.Is(err, context.DeadlineExceeded): - status = Canceled - } + err := stepNext(ctx) + status := StatusFromError(err) + if status == Failed { + switch { + case DefaultIsCanceled(err), + errors.Is(err, context.Canceled), + errors.Is(err, context.DeadlineExceeded): + status = Canceled } } diff --git a/workflow_test.go b/workflow_test.go index d278ea9..4d5c670 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -420,3 +420,55 @@ func TestWorkflow_NoInterceptors_NoRegression(t *testing.T) { assert.NoError(t, w.Do(context.Background())) assert.Equal(t, Succeeded, w.StateOf(step).GetStatus()) } + +// TestSkippedStep_DoesNotConsumeLease verifies that a Skipped step does NOT +// occupy a concurrency lease nor spawn a worker goroutine. With +// MaxConcurrency=1 and a chain a → b(skip) → c, b being skipped must not +// block c from running concurrently with a's *next* tick — and most +// importantly, b must not even briefly hold the only lease. +// +// We assert this by attaching an AttemptInterceptor that records every step +// that actually runs through the worker path. b must not appear there. +func TestSkippedStep_DoesNotConsumeLease(t *testing.T) { + t.Parallel() + + var ranSteps []string + mu := sync.Mutex{} + ic := AttemptInterceptorFunc(func(ctx context.Context, s Steper, attempt uint64, next func(context.Context) error) error { + mu.Lock() + ranSteps = append(ranSteps, String(s)) + mu.Unlock() + return next(ctx) + }) + + a, b, c := NoOp("a"), NoOp("b"), NoOp("c") + w := &Workflow{ + MaxConcurrency: 1, + AttemptInterceptors: []AttemptInterceptor{ic}, + } + w.Add( + Step(a), + Step(b).DependsOn(a).When(func(_ context.Context, _ map[Steper]StepResult) StepStatus { + return Skipped + }), + Step(c).DependsOn(b).When(AllSucceededOrSkipped), + ) + + done := make(chan error, 1) + go func() { done <- w.Do(context.Background()) }() + select { + case err := <-done: + assert.NoError(t, err) + case <-time.After(5 * time.Second): + t.Fatal("workflow did not complete within 5s") + } + + // Skipped step b must not have entered the worker path. + assert.Equal(t, Skipped, w.StateOf(b).GetStatus()) + mu.Lock() + defer mu.Unlock() + for _, name := range ranSteps { + assert.NotEqual(t, "b", name, "skipped step must not consume a worker lease / fire AttemptInterceptor") + } + assert.ElementsMatch(t, []string{"a", "c"}, ranSteps) +} From 6bd8e4683a082d09a3010fde764ca2db515e9906 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Thu, 7 May 2026 02:17:30 +0000 Subject: [PATCH 25/29] address Copilot review: panic protection, lifecycle, doc accuracy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Five Copilot review items addressed: 1. inheritedStep / inheritedAttempt only cleared on the success path → Move clearing to a defer at the start of Do() so all exit paths (preflight error, panic, early Empty return) reset per-run state. 2. Comment on inheritedStep field said "cleared by reset()" — wrong (reset cannot clear them without breaking inheritance, since the parent writes them just before child.Do() runs reset). Rewrite the comment to document the actual lifecycle: written by parent before child.Do(), read during child.Do(), cleared by Do()'s defer. 3. DontPanic does not catch panics from user StepInterceptors → wrap each StepInterceptor invocation in catchPanicAsError when DontPanic is true, preventing process crashes / lease leaks / status-signal loss from a faulty interceptor. 4. Same as 3 for AttemptInterceptor. 5. Plan doc still describes the old EventSink/EventType design. Add a prominent OUTDATED banner at the top pointing to the current design doc and synced openspec, kept only as a record of the original direction. Bonus: public Reset() now also clears inheritedStep / inheritedAttempt (internal reset() must NOT, because it runs at the start of Do() — clearing there would wipe the prefix the parent just wrote and break inheritance). Documented this asymmetry on the field and on Reset(). New tests: - TestInterceptorPanic_DontPanic — guards #3 - TestAttemptInterceptorPanic_DontPanic — guards #4 Co-Authored-By: Claude Sonnet 4.6 --- .../plans/2026-05-06-step-interceptor.md | 17 ++++++ workflow.go | 52 +++++++++++++--- workflow_test.go | 59 +++++++++++++++++++ 3 files changed, 119 insertions(+), 9 deletions(-) diff --git a/docs/superpowers/plans/2026-05-06-step-interceptor.md b/docs/superpowers/plans/2026-05-06-step-interceptor.md index 4571d4e..f6087b1 100644 --- a/docs/superpowers/plans/2026-05-06-step-interceptor.md +++ b/docs/superpowers/plans/2026-05-06-step-interceptor.md @@ -1,5 +1,22 @@ # Step Interceptor Implementation Plan +> ⚠️ **OUTDATED — DO NOT USE AS A REFERENCE FOR THE SHIPPED IMPLEMENTATION.** +> +> This plan was written before the design was simplified. The shipped PR drops the +> `EventSink` / `WorkflowEvent` / `EventType` vocabulary entirely (users plug their +> own event types into the interceptors), removes the `StepInfo` / `AttemptInfo` +> wrappers (interceptors receive `Steper` and `uint64` directly), removes +> `TerminalReason` (Skipped/Canceled steps bypass the interceptor chain entirely), +> and removes the `scheduled` `StepStatus` sentinel (`tick()` evaluates Condition +> inline and sets `Running` directly). Files referenced here that do not exist in +> the final tree (`event.go`, `event_test.go`) were never created. +> +> The current design and rationale live in +> [`docs/superpowers/specs/2026-05-06-step-interceptor-design.md`](../specs/2026-05-06-step-interceptor-design.md). +> The synced main spec lives in +> [`openspec/specs/step-interceptor/spec.md`](../../../openspec/specs/step-interceptor/spec.md). +> This plan is kept only as a record of the original direction. + > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Add a two-layer interceptor system (`StepInterceptor` + `AttemptInterceptor`) to go-workflow, enabling structured global observability with built-in `EventSink` adapters. diff --git a/workflow.go b/workflow.go index 8c462eb..bcaa5ab 100644 --- a/workflow.go +++ b/workflow.go @@ -52,10 +52,20 @@ type Workflow struct { steps map[Steper]*State // the internal states of Steps - // inheritedStep / inheritedAttempt are populated by PrependInterceptors at the - // start of each Do() (parent → child) and cleared by reset(). They are never - // merged into StepInterceptors / AttemptInterceptors so the user-supplied base - // stays untouched and repeated runs do not accumulate. + // inheritedStep / inheritedAttempt are populated by PrependInterceptors when + // this workflow runs as a child step under a parent. The lifecycle is: + // 1. Parent writes them BEFORE calling child.Do() (in executeWithRetry). + // 2. child.Do() reads them while building the effective interceptor chain. + // 3. child.Do()'s defer clears them after waitGroup.Wait() (covers all + // exit paths: success, preflight error, panic). + // + // They are NOT cleared by the internal reset() — reset() runs at the start + // of Do(), which would wipe out what the parent just wrote and break + // inheritance. The public Reset() method does clear them, since users call + // Reset() between runs and expect a fully-fresh state. + // + // They are never merged into StepInterceptors / AttemptInterceptors so the + // user-supplied base stays untouched and repeated runs do not accumulate. inheritedStep []StepInterceptor inheritedAttempt []AttemptInterceptor @@ -244,12 +254,20 @@ func (w *Workflow) IsTerminated() bool { } // Reset resets the Workflow to ready for a new run. +// +// Unlike the internal reset() (which Do() calls at its own start), Reset() also +// clears interceptors inherited from a parent during a previous run. The internal +// reset() must not clear them, because the parent writes them just before calling +// child.Do(), and child.Do() then calls reset() — clearing there would wipe the +// just-written prefix and break inheritance. func (w *Workflow) Reset() error { if !w.isRunning.TryLock() { return ErrWorkflowIsRunning } defer w.isRunning.Unlock() w.reset() + w.inheritedStep = nil + w.inheritedAttempt = nil return nil } @@ -329,6 +347,14 @@ func (w *Workflow) Do(ctx context.Context) error { return ErrWorkflowIsRunning } defer w.isRunning.Unlock() + // Clear inherited interceptors set by a parent during this run on every exit + // path, so the next time this workflow runs (under any parent, or standalone) + // it starts fresh and PrependInterceptors does not accumulate. Using defer + // ensures even early exits (Empty, preflight failure, panic) reset state. + defer func() { + w.inheritedStep = nil + w.inheritedAttempt = nil + }() // if no steps to run if w.Empty() { return nil @@ -349,11 +375,6 @@ func (w *Workflow) Do(ctx context.Context) error { w.statusChange.L.Unlock() // ensure all goroutines are exited w.waitGroup.Wait() - // Clear inherited interceptors set by a parent during this run so that the - // next time this workflow runs (under any parent, or standalone) it starts - // fresh and PrependInterceptors does not accumulate across runs. - w.inheritedStep = nil - w.inheritedAttempt = nil // return the error err := make(ErrWorkflow) for step, state := range w.steps { @@ -513,12 +534,20 @@ func (ex *stepExecution) run(ctx context.Context) { // By the time we get here, tick() has already evaluated the Condition // (terminal results are settled inline) and set the status to Running. // Build the StepInterceptor chain; innermost next is executeWithRetry. + // When DontPanic is true, each interceptor invocation is wrapped in + // catchPanicAsError so a panicking user interceptor cannot crash the + // process or leave the lease unreleased / status unsignalled. stepNext := func(ctx context.Context) error { return ex.executeWithRetry(ctx) } stepICs := ex.w.effectiveStepInterceptors() for i := len(stepICs) - 1; i >= 0; i-- { ic := stepICs[i] next := stepNext stepNext = func(ctx context.Context) error { + if ex.w.DontPanic { + return catchPanicAsError(func() error { + return ic.InterceptStep(ctx, ex.step, next) + }) + } return ic.InterceptStep(ctx, ex.step, next) } } @@ -578,6 +607,11 @@ func (ex *stepExecution) buildAttemptChain() func(context.Context) error { next := chain icLocal := ic chain = func(ctx context.Context) error { + if ex.w.DontPanic { + return catchPanicAsError(func() error { + return icLocal.InterceptAttempt(ctx, ex.step, ex.attempt, next) + }) + } return icLocal.InterceptAttempt(ctx, ex.step, ex.attempt, next) } } diff --git a/workflow_test.go b/workflow_test.go index 4d5c670..8bb3a17 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -472,3 +472,62 @@ func TestSkippedStep_DoesNotConsumeLease(t *testing.T) { } assert.ElementsMatch(t, []string{"a", "c"}, ranSteps) } + +// TestInterceptorPanic_DontPanic ensures that when DontPanic is true, a panic +// inside a user StepInterceptor is converted to an error rather than crashing +// the process or leaving the lease unreleased / status unsignalled. +func TestInterceptorPanic_DontPanic(t *testing.T) { + t.Parallel() + step := NoOp("a") + w := &Workflow{ + DontPanic: true, + StepInterceptors: []StepInterceptor{ + StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { + panic("boom from StepInterceptor") + }), + }, + } + w.Add(Step(step)) + + done := make(chan error, 1) + go func() { done <- w.Do(context.Background()) }() + select { + case err := <-done: + // Workflow returns ErrWorkflow because the step ended in Failed. + assert.Error(t, err) + stepErr := w.StateOf(step).GetStepResult().Err + assert.Error(t, stepErr) + assert.Contains(t, stepErr.Error(), "boom from StepInterceptor") + case <-time.After(5 * time.Second): + t.Fatal("workflow hung after panicking interceptor — lease leak suspected") + } +} + +// TestAttemptInterceptorPanic_DontPanic mirrors the StepInterceptor panic test +// but for AttemptInterceptor. It ensures the panic is caught for retried +// attempts as well. +func TestAttemptInterceptorPanic_DontPanic(t *testing.T) { + t.Parallel() + step := NoOp("a") + w := &Workflow{ + DontPanic: true, + AttemptInterceptors: []AttemptInterceptor{ + AttemptInterceptorFunc(func(ctx context.Context, s Steper, attempt uint64, next func(context.Context) error) error { + panic("boom from AttemptInterceptor") + }), + }, + } + w.Add(Step(step)) + + done := make(chan error, 1) + go func() { done <- w.Do(context.Background()) }() + select { + case err := <-done: + assert.Error(t, err) + stepErr := w.StateOf(step).GetStepResult().Err + assert.Error(t, stepErr) + assert.Contains(t, stepErr.Error(), "boom from AttemptInterceptor") + case <-time.After(5 * time.Second): + t.Fatal("workflow hung after panicking AttemptInterceptor — lease leak suspected") + } +} From 9d7557338760807607957f0ef2f2bf1cee2cec13 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Thu, 7 May 2026 02:26:06 +0000 Subject: [PATCH 26/29] remove PR-introduced superpowers docs and archived OpenSpec changes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drop the planning artifacts that this PR added under docs/superpowers and openspec/changes/archive — they reflect intermediate / outdated designs and now live only as a historical distraction. The single authoritative spec lives in openspec/specs/step-interceptor/spec.md. Update openspec/specs/step-interceptor/spec.md to fully match the shipped behavior: - Document the per-run scoped inheritance using inheritedStep / inheritedAttempt fields (not "make+copy" on the user-supplied base). - Clarify the Reset() vs reset() asymmetry and why internal reset() must NOT clear inherited interceptors. - New scenario: PrependInterceptors does not accumulate across repeated Do() runs. - New requirement: DontPanic protects interceptor panics (with scenarios for both StepInterceptor and AttemptInterceptor). Pre-existing files under docs/superpowers/plans (e.g. 2026-05-04-test-spec-alignment.md) and openspec/changes/archive (e.g. 2026-05-04-document-existing-behaviors) are preserved. Co-Authored-By: Claude Sonnet 4.6 --- .../2026-05-05-errworkflow-execution-order.md | 567 --------- .../plans/2026-05-06-step-interceptor.md | 1076 ----------------- .../2026-05-06-step-interceptor-design.md | 285 ----- .../.openspec.yaml | 2 - .../design.md | 54 - .../proposal.md | 27 - .../specs/execution-model/spec.md | 73 -- .../tasks.md | 31 - .../.openspec.yaml | 2 - .../design.md | 109 -- .../proposal.md | 75 -- .../specs/step-interceptor/spec.md | 137 --- .../2026-05-06-structured-event-sink/tasks.md | 33 - 13 files changed, 2471 deletions(-) delete mode 100644 docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md delete mode 100644 docs/superpowers/plans/2026-05-06-step-interceptor.md delete mode 100644 docs/superpowers/specs/2026-05-06-step-interceptor-design.md delete mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/.openspec.yaml delete mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/design.md delete mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/proposal.md delete mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/specs/execution-model/spec.md delete mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/tasks.md delete mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/.openspec.yaml delete mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/design.md delete mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/proposal.md delete mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/specs/step-interceptor/spec.md delete mode 100644 openspec/changes/archive/2026-05-06-structured-event-sink/tasks.md diff --git a/docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md b/docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md deleted file mode 100644 index ac38abe..0000000 --- a/docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md +++ /dev/null @@ -1,567 +0,0 @@ -# ErrWorkflow Execution Order Implementation Plan - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Make `ErrWorkflow.Error()` output steps in execution-finish order by adding a `FinishedAt time.Time` field to `StepResult` and sorting on output. - -**Architecture:** Add `FinishedAt` to the public `StepResult` struct; record it via a new `State.SetFinishedAt` setter in the single step-goroutine defer that already calls `SetStatus`/`SetError`; for condition-skipped steps record it at the `state.SetStatus(nextStatus)` site in `tick()`; sort in `ErrWorkflow.Error()` and `Unwrap()` using a shared helper. - -**Tech Stack:** Go stdlib (`sort`, `time`), `github.com/benbjohnson/clock` (already a project dependency, used for testable time) - ---- - -## Files - -| File | Change | -|------|--------| -| `error.go` | Add `FinishedAt time.Time` to `StepResult`; add `sort` import; add `sortedSteps` helper; rewrite `Error()` and `Unwrap()` | -| `state.go` | Add `SetFinishedAt(time.Time)` and `GetFinishedAt() time.Time` methods to `State` | -| `workflow.go` | Call `state.SetFinishedAt(w.Clock.Now())` in the step-goroutine defer and in the condition-skip branch of `tick()` | -| `error_test.go` | Add tests for `ErrWorkflow` ordering | -| `execution_model_test.go` | Add test that `StepResult.FinishedAt` is populated after execution | -| `condition_test.go` | Update `StepResult{...}` literals to use field names | - ---- - -## Task 1: Add `FinishedAt` to `StepResult` - -**Files:** -- Modify: `error.go` (around line 106 — the `StepResult` struct) - -- [ ] **Step 1: Add the field** - - In `error.go`, update `StepResult`: - - ```go - // StepResult contains the status and error of a Step. - type StepResult struct { - Status StepStatus - Err error - FinishedAt time.Time - } - ``` - - Also add `"time"` to the import block at the top of `error.go`: - - ```go - import ( - "fmt" - "runtime" - "sort" - "strings" - "time" - ) - ``` - -- [ ] **Step 2: Verify it compiles** - - ```bash - go build ./... - ``` - - Expected: compile error in `condition_test.go` about `StepResult` composite literal — **this is expected**, we'll fix it in Task 4. - ---- - -## Task 2: Add `SetFinishedAt` / `GetFinishedAt` to `State` - -**Files:** -- Modify: `state.go` (after `SetError`/`GetError` around line 33) - -- [ ] **Step 1: Add setter and getter** - - In `state.go`, after the `SetError` method, add: - - ```go - func (s *State) GetFinishedAt() time.Time { - s.RLock() - defer s.RUnlock() - return s.FinishedAt - } - func (s *State) SetFinishedAt(t time.Time) { - s.Lock() - defer s.Unlock() - s.FinishedAt = t - } - ``` - - Add `"time"` to the import block in `state.go`: - - ```go - import ( - "context" - "sync" - "time" - ) - ``` - -- [ ] **Step 2: Verify it compiles (ignoring test errors)** - - ```bash - go build ./... - ``` - - Expected: same compile errors as before in tests only. Core package builds. - ---- - -## Task 3: Record `FinishedAt` at Step Termination - -**Files:** -- Modify: `workflow.go` - -There are **two termination sites** to update: - -### Site A — step goroutine defer (running steps) - -Around line 402–406 in `workflow.go`: - -```go -defer func() { - state.SetStatus(status) - state.SetError(err) - w.unlease() - w.signalStatusChange() -}() -``` - -### Site B — condition-skip in `tick()` (steps skipped before running) - -Around line 387–393: - -```go -if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { - state.SetStatus(nextStatus) - w.waitGroup.Add(1) - go func() { - defer w.waitGroup.Done() - w.signalStatusChange() - }() - continue -} -``` - -- [ ] **Step 1: Update the step-goroutine defer (Site A)** - - Change the defer to record `FinishedAt` before setting status: - - ```go - defer func() { - state.SetFinishedAt(w.Clock.Now()) - state.SetStatus(status) - state.SetError(err) - w.unlease() - w.signalStatusChange() - }() - ``` - -- [ ] **Step 2: Update the condition-skip branch (Site B)** - - Change the condition-skip block to also record `FinishedAt`: - - ```go - if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { - state.SetFinishedAt(w.Clock.Now()) - state.SetStatus(nextStatus) - w.waitGroup.Add(1) - go func() { - defer w.waitGroup.Done() - w.signalStatusChange() - }() - continue - } - ``` - -- [ ] **Step 3: Verify it compiles** - - ```bash - go build ./... - ``` - - Expected: same test-only errors, core package builds. - ---- - -## Task 4: Fix `condition_test.go` Struct Literal - -**Files:** -- Modify: `condition_test.go` (around line 28) - -The composite literal `flow.StepResult{Status: ..., Err: ...}` already uses field names, so it will compile without changes. Verify this: - -- [ ] **Step 1: Check the literal** - - ```bash - grep -n "StepResult{" condition_test.go - ``` - - Expected output: - ``` - 28: ups[s] = flow.StepResult{ - ``` - - The block at line 28–31 uses named fields (`Status:`, `Err:`), so it is fine. No edit needed. - -- [ ] **Step 2: Verify tests compile** - - ```bash - go test -run NOMATCH ./... 2>&1 | grep -v "^ok" - ``` - - Expected: no compile errors. - ---- - -## Task 5: Implement `sortedSteps` and Rewrite `Error()` / `Unwrap()` - -**Files:** -- Modify: `error.go` - -- [ ] **Step 1: Write a failing test first** - - In `error_test.go`, add: - - ```go - func TestErrWorkflowErrorOrdering(t *testing.T) { - t.Run("sorted by FinishedAt ascending", func(t *testing.T) { - now := time.Now() - type namedStep struct{ name string } - // We need actual Steper values; use the existing test helpers. - // Build ErrWorkflow manually with known FinishedAt values. - a := &namedStep{"A-step"} - b := &namedStep{"B-step"} - c := &namedStep{"C-step"} - - // Use a real workflow so steps are proper Steper instances. - // Instead, we test via a real workflow run with mock clock. - mockClock := clock.NewMock() - w := &flow.Workflow{Clock: mockClock} - - errA := fmt.Errorf("A failed") - errB := fmt.Errorf("B failed") - errC := fmt.Errorf("C failed") - _ = a; _ = b; _ = c; _ = errA; _ = errB; _ = errC; _ = now - // Real ordering test is in TestErrWorkflowOrderingIntegration below. - _ = w - }) - } - ``` - - > Actually, skip the manual struct test — the integration test below is more meaningful. Remove the stub above and add only the integration test. - - Replace the above with this in `error_test.go`: - - ```go - import ( - "context" - "fmt" - "strings" - "testing" - "time" - - flow "github.com/Azure/go-workflow" - "github.com/benbjohnson/clock" - "github.com/stretchr/testify/assert" - "github.com/stretchr/testify/require" - ) - - // serialStep is a step that signals when it starts and waits to be released. - type serialStep struct { - name string - started chan struct{} - release chan struct{} - err error - } - - func newSerialStep(name string, err error) *serialStep { - return &serialStep{name: name, started: make(chan struct{}, 1), release: make(chan struct{}), err: err} - } - func (s *serialStep) Do(_ context.Context) error { - s.started <- struct{}{} - <-s.release - return s.err - } - func (s *serialStep) String() string { return s.name } - - func TestErrWorkflowOutputOrdering(t *testing.T) { - // Build a 3-step serial chain: A -> B -> C - // Step names are chosen so alphabetical != execution order: C, A, B - mockClock := clock.NewMock() - stepC := newSerialStep("C-first", fmt.Errorf("C failed")) - stepA := newSerialStep("A-second", fmt.Errorf("A failed")) - stepB := newSerialStep("B-third", fmt.Errorf("B failed")) - - w := &flow.Workflow{Clock: mockClock} - w.Add( - flow.Step(stepC), - flow.Step(stepA).DependsOn(stepC), - flow.Step(stepB).DependsOn(stepA), - ) - - done := make(chan error, 1) - go func() { done <- w.Do(context.Background()) }() - - // C runs first — let it finish - <-stepC.started - mockClock.Add(time.Second) - close(stepC.release) - - // A runs second - <-stepA.started - mockClock.Add(time.Second) - close(stepA.release) - - // B runs third - <-stepB.started - mockClock.Add(time.Second) - close(stepB.release) - - err := <-done - require.Error(t, err) - - var errW flow.ErrWorkflow - require.ErrorAs(t, err, &errW) - - output := errW.Error() - posC := strings.Index(output, "C-first") - posA := strings.Index(output, "A-second") - posB := strings.Index(output, "B-third") - - assert.Greater(t, posA, posC, "A-second should appear after C-first in output") - assert.Greater(t, posB, posA, "B-third should appear after A-second in output") - } - - func TestErrWorkflowTieBreakByName(t *testing.T) { - // Two steps with identical FinishedAt → sort by name - mockClock := clock.NewMock() - now := mockClock.Now() - - e := flow.ErrWorkflow{ - // We can't easily construct Steper keys without running a workflow. - // Test via integration: two parallel steps finishing at same clock tick. - } - _ = e; _ = now - // See TestErrWorkflowTieBreakIntegration below. - } - - func TestErrWorkflowTieBreakIntegration(t *testing.T) { - // Two parallel steps, both fail at the same clock tick → output is alphabetical. - mockClock := clock.NewMock() - stepZ := newSerialStep("Z-step", fmt.Errorf("Z failed")) - stepA := newSerialStep("A-step", fmt.Errorf("A failed")) - - w := &flow.Workflow{Clock: mockClock} - w.Add(flow.Step(stepZ), flow.Step(stepA)) - - done := make(chan error, 1) - go func() { done <- w.Do(context.Background()) }() - - // Both steps start in parallel; release them before advancing clock - <-stepZ.started - <-stepA.started - // Advance clock THEN release — both get same timestamp - mockClock.Add(time.Second) - close(stepZ.release) - close(stepA.release) - - err := <-done - require.Error(t, err) - - var errW flow.ErrWorkflow - require.ErrorAs(t, err, &errW) - - output := errW.Error() - posA := strings.Index(output, "A-step") - posZ := strings.Index(output, "Z-step") - assert.Less(t, posA, posZ, "A-step should appear before Z-step (tie-break by name)") - } - ``` - -- [ ] **Step 2: Run test to verify it fails** - - ```bash - go test -run "TestErrWorkflow" ./... -v 2>&1 | tail -20 - ``` - - Expected: FAIL — output ordering assertions fail (map iteration is random). - -- [ ] **Step 3: Implement `sortedSteps` and update `Error()` / `Unwrap()`** - - In `error.go`, add the helper and update both methods: - - ```go - // sortedSteps returns the steps in ErrWorkflow sorted by FinishedAt ascending. - // Steps with zero FinishedAt (never ran) sort last. - // Tie-break: lexicographic order of String(step). - func sortedSteps(e ErrWorkflow) []Steper { - steps := make([]Steper, 0, len(e)) - for step := range e { - steps = append(steps, step) - } - sort.Slice(steps, func(i, j int) bool { - ti := e[steps[i]].FinishedAt - tj := e[steps[j]].FinishedAt - zeroI := ti.IsZero() - zeroJ := tj.IsZero() - if zeroI != zeroJ { - return !zeroI // non-zero before zero - } - if !ti.Equal(tj) { - return ti.Before(tj) - } - return String(steps[i]) < String(steps[j]) - }) - return steps - } - - func (e ErrWorkflow) Unwrap() []error { - steps := sortedSteps(e) - rv := make([]error, 0, len(e)) - for _, step := range steps { - rv = append(rv, e[step].Err) - } - return rv - } - - // ErrWorkflow will be printed as: - // - // Step: [Status] - // error message - func (e ErrWorkflow) Error() string { - var builder strings.Builder - for _, step := range sortedSteps(e) { - builder.WriteString(fmt.Sprintf("%s: ", String(step))) - builder.WriteString(fmt.Sprintln(e[step].Error())) - } - return builder.String() - } - ``` - -- [ ] **Step 4: Run the ordering tests** - - ```bash - go test -run "TestErrWorkflow" ./... -v 2>&1 | tail -30 - ``` - - Expected: all pass. - -- [ ] **Step 5: Commit** - - ```bash - git add error.go state.go workflow.go error_test.go condition_test.go - git commit -m "feat: add FinishedAt to StepResult, sort ErrWorkflow output by execution order" - ``` - ---- - -## Task 6: Test `FinishedAt` Population in `execution_model_test.go` - -**Files:** -- Modify: `execution_model_test.go` - -- [ ] **Step 1: Write failing test** - - Add to `execution_model_test.go`: - - ```go - func TestStepResultFinishedAtPopulated(t *testing.T) { - mockClock := clock.NewMock() - step := &succeededStep{} // uses the existing test helper in testutil_test.go - w := &Workflow{Clock: mockClock} - w.Add(Step(step)) - - mockClock.Add(time.Second) // advance so FinishedAt is non-zero - err := w.Do(context.Background()) - assert.NoError(t, err) - - state := w.StateOf(step) - result := state.GetStepResult() - assert.False(t, result.FinishedAt.IsZero(), "FinishedAt should be populated after step execution") - assert.Equal(t, mockClock.Now(), result.FinishedAt) - } - ``` - - Check what helpers exist in `testutil_test.go`: - - ```bash - grep -n "succeededStep\|failedStep\|type.*Step" testutil_test.go | head -20 - ``` - - Adjust the step type name based on what you see. - -- [ ] **Step 2: Run to verify it fails** - - ```bash - go test -run "TestStepResultFinishedAtPopulated" ./... -v - ``` - - Expected: FAIL — `FinishedAt` is zero because we haven't wired it up yet. - - > **Note:** If Task 3 is already done, this test may already pass. In that case, skip to Step 4. - -- [ ] **Step 3: Verify wiring from Task 3 makes it pass** - - The `SetFinishedAt` calls added in Task 3 should make this pass. Run: - - ```bash - go test -run "TestStepResultFinishedAtPopulated" ./... -v - ``` - - Expected: PASS. - -- [ ] **Step 4: Commit** - - ```bash - git add execution_model_test.go - git commit -m "test: verify StepResult.FinishedAt is populated after step execution" - ``` - ---- - -## Task 7: Full Test Suite and Final Verification - -- [ ] **Step 1: Run all tests** - - ```bash - go test ./... -count=1 - ``` - - Expected: all pass, no failures. - -- [ ] **Step 2: Run vet** - - ```bash - go vet ./... - ``` - - Expected: no output (no issues). - -- [ ] **Step 3: Run build** - - ```bash - go build ./... - ``` - - Expected: no output (clean build). - -- [ ] **Step 4: Final commit if anything was adjusted** - - ```bash - git status - ``` - - If there are uncommitted changes: - - ```bash - git add -p - git commit -m "fix: address review feedback" - ``` - ---- - -## Self-Review Notes - -- **`condition_test.go`**: The literal at line 28 already uses named fields (`Status:`, `Err:`), so it compiles without edits. The plan reflects this (Task 4 is a verify-only step). -- **Clock timing in tests**: `clock.NewMock()` starts at a fixed non-zero time, so `mockClock.Now()` after `Add(time.Second)` gives a consistent value. The test in Task 6 advances before the run — but `FinishedAt` is recorded *during* the run at whatever `Clock.Now()` returns then. Adjust the assertion to `assert.False(t, result.FinishedAt.IsZero())` if the exact value is hard to pin. -- **Parallel steps tie-break test**: both goroutines call `w.Clock.Now()` in their defers. With `clock.Mock`, concurrent calls return the same value, which is exactly what we need for the tie-break test. -- **`StateOf` visibility**: `w.StateOf(step)` is used in existing tests — it's an exported method so it's accessible from `_test` package. diff --git a/docs/superpowers/plans/2026-05-06-step-interceptor.md b/docs/superpowers/plans/2026-05-06-step-interceptor.md deleted file mode 100644 index f6087b1..0000000 --- a/docs/superpowers/plans/2026-05-06-step-interceptor.md +++ /dev/null @@ -1,1076 +0,0 @@ -# Step Interceptor Implementation Plan - -> ⚠️ **OUTDATED — DO NOT USE AS A REFERENCE FOR THE SHIPPED IMPLEMENTATION.** -> -> This plan was written before the design was simplified. The shipped PR drops the -> `EventSink` / `WorkflowEvent` / `EventType` vocabulary entirely (users plug their -> own event types into the interceptors), removes the `StepInfo` / `AttemptInfo` -> wrappers (interceptors receive `Steper` and `uint64` directly), removes -> `TerminalReason` (Skipped/Canceled steps bypass the interceptor chain entirely), -> and removes the `scheduled` `StepStatus` sentinel (`tick()` evaluates Condition -> inline and sets `Running` directly). Files referenced here that do not exist in -> the final tree (`event.go`, `event_test.go`) were never created. -> -> The current design and rationale live in -> [`docs/superpowers/specs/2026-05-06-step-interceptor-design.md`](../specs/2026-05-06-step-interceptor-design.md). -> The synced main spec lives in -> [`openspec/specs/step-interceptor/spec.md`](../../../openspec/specs/step-interceptor/spec.md). -> This plan is kept only as a record of the original direction. - -> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. - -**Goal:** Add a two-layer interceptor system (`StepInterceptor` + `AttemptInterceptor`) to go-workflow, enabling structured global observability with built-in `EventSink` adapters. - -**Architecture:** Introduce `event.go` for public types, refactor `workflow.go` to extract `stepExecution` (replacing the anonymous goroutine in `tick()`), and add `InterceptorReceiver` to `SubWorkflow` for nested propagation. `BeforeStep`/`AfterStep` remain unchanged as step-level configuration; interceptors are workflow-level and orthogonal. - -**Tech Stack:** Go 1.23, `github.com/stretchr/testify`, `github.com/benbjohnson/clock` - ---- - -## File Map - -| File | Action | Responsibility | -|------|--------|----------------| -| `event.go` | **Create** | `EventType`, `WorkflowEvent`, `StepInterceptor`, `AttemptInterceptor`, `StepInterceptorFunc`, `AttemptInterceptorFunc`, `StepInfo`, `AttemptInfo`, `InterceptorReceiver`, `NewStepEventSink`, `NewAttemptEventSink`, private `retryNotifier` | -| `event_test.go` | **Create** | Tests for `NewStepEventSink` and `NewAttemptEventSink` | -| `workflow.go` | **Modify** | Add `StepInterceptors`/`AttemptInterceptors` fields; introduce `stepExecution`; simplify `tick()`; add `wireNotify` | -| `workflow_test.go` | **Modify** | Integration tests for interceptor ordering, SubWorkflow propagation, Retrying events | -| `wrap.go` | **Modify** | `SubWorkflow` implements `InterceptorReceiver` | -| `wrap_test.go` | **Modify** | Tests for interceptor propagation through SubWorkflow | - ---- - -## Task 1: Define public types in `event.go` - -**Files:** -- Create: `event.go` -- Create: `event_test.go` - -- [ ] **Step 1: Write the failing test** - -```go -// event_test.go -package flow - -import ( - "testing" - "github.com/stretchr/testify/assert" -) - -func TestEventTypeConstants(t *testing.T) { - // Verify all constants exist and are distinct - types := []EventType{Scheduled, Started, Retrying, Succeeded, Failed, Canceled, Skipped} - seen := map[EventType]bool{} - for _, et := range types { - assert.False(t, seen[et], "duplicate EventType: %q", et) - seen[et] = true - } -} - -func TestStepInterceptorFunc(t *testing.T) { - called := false - var ic StepInterceptor = StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { - called = true - return next(ctx) - }) - _ = ic.InterceptStep(context.Background(), StepInfo{}, func(ctx context.Context) error { return nil }) - assert.True(t, called) -} - -func TestAttemptInterceptorFunc(t *testing.T) { - called := false - var ic AttemptInterceptor = AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - called = true - return next(ctx) - }) - _ = ic.InterceptAttempt(context.Background(), AttemptInfo{}, func(ctx context.Context) error { return nil }) - assert.True(t, called) -} -``` - -- [ ] **Step 2: Run test to verify it fails** - -```bash -go test ./... -run "TestEventType|TestStepInterceptorFunc|TestAttemptInterceptorFunc" -v -``` - -Expected: FAIL — types not defined. - -- [ ] **Step 3: Write `event.go`** - -```go -package flow - -import ( - "context" - "time" -) - -// EventType identifies a step lifecycle event. -type EventType string - -const ( - Scheduled EventType = "Scheduled" - Started EventType = "Started" - Retrying EventType = "Retrying" - Succeeded EventType = "Succeeded" - Failed EventType = "Failed" - Canceled EventType = "Canceled" - Skipped EventType = "Skipped" -) - -// WorkflowEvent carries information about a step lifecycle event. -type WorkflowEvent struct { - Step Steper - Type EventType - Attempt uint64 - Err error - Duration time.Duration - BackoffDuration time.Duration // non-zero only for Retrying -} - -// StepInfo is passed to StepInterceptor. -// Step is the canonical identifier — same pointer used as map key in Workflow. -// Callers that need a human-readable name can call flow.String(info.Step). -type StepInfo struct { - Step Steper - TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not execute -} - -// AttemptInfo is passed to AttemptInterceptor. -// Interceptors that need timing should record time.Now() at the top of InterceptAttempt. -type AttemptInfo struct { - StepInfo - Attempt uint64 -} - -// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). -// If info.TerminalReason != Pending, next must not be called — the step will not execute. -// Return nil in that case after observing the event. -type StepInterceptor interface { - InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error -} - -// AttemptInterceptor intercepts each individual attempt (Before → Do → After). -type AttemptInterceptor interface { - InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error -} - -// StepInterceptorFunc is a function adapter for StepInterceptor. -type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error - -func (f StepInterceptorFunc) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { - return f(ctx, info, next) -} - -// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. -type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error - -func (f AttemptInterceptorFunc) InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - return f(ctx, info, next) -} - -// InterceptorReceiver is implemented by steps that contain a sub-workflow. -// stepExecution calls PrependInterceptors before each attempt so that -// parent interceptors wrap child interceptors. -type InterceptorReceiver interface { - PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) -} - -// retryNotifier is a package-private interface implemented by the concrete -// type returned by NewStepEventSink. stepExecution uses it to deliver -// Retrying events (which bypass the interceptor chain) to the sink. -type retryNotifier interface { - onRetry(WorkflowEvent) -} -``` - -Note: `event.go` also needs `"time"` in imports — add it. - -- [ ] **Step 4: Run tests to verify they pass** - -```bash -go test ./... -run "TestEventType|TestStepInterceptorFunc|TestAttemptInterceptorFunc" -v -``` - -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add event.go event_test.go -git commit -m "feat: add interceptor public types and EventType constants" -``` - ---- - -## Task 2: Built-in `NewStepEventSink` and `NewAttemptEventSink` - -**Files:** -- Modify: `event.go` -- Modify: `event_test.go` - -- [ ] **Step 1: Write failing tests** - -```go -// event_test.go — add these tests - -func TestNewStepEventSink_SucceededStep(t *testing.T) { - var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - step := NoOp("a") - info := StepInfo{Step: step, TerminalReason: Pending} - err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { - return nil - }) - - assert.NoError(t, err) - assert.Len(t, events, 2) - assert.Equal(t, Scheduled, events[0].Type) - assert.Equal(t, step, events[0].Step) - assert.Equal(t, Succeeded, events[1].Type) - assert.NotZero(t, events[1].Duration) -} - -func TestNewStepEventSink_FailedStep(t *testing.T) { - var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - step := NoOp("a") - boom := errors.New("boom") - info := StepInfo{Step: step, TerminalReason: Pending} - err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { - return boom - }) - - assert.Equal(t, boom, err) - assert.Len(t, events, 2) - assert.Equal(t, Scheduled, events[0].Type) - assert.Equal(t, Failed, events[1].Type) - assert.Equal(t, boom, events[1].Err) -} - -func TestNewStepEventSink_SkippedStep(t *testing.T) { - var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - step := NoOp("a") - info := StepInfo{Step: step, TerminalReason: Skipped} - nextCalled := false - err := sink.InterceptStep(context.Background(), info, func(ctx context.Context) error { - nextCalled = true - return nil - }) - - assert.NoError(t, err) - assert.False(t, nextCalled, "next must not be called for Skipped") - assert.Len(t, events, 2) - assert.Equal(t, Scheduled, events[0].Type) - assert.Equal(t, Skipped, events[1].Type) -} - -func TestNewStepEventSink_OnRetry(t *testing.T) { - var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - rn, ok := sink.(retryNotifier) - assert.True(t, ok, "NewStepEventSink should implement retryNotifier") - - boom := errors.New("boom") - rn.onRetry(WorkflowEvent{Type: Retrying, Attempt: 0, Err: boom, BackoffDuration: time.Second}) - - assert.Len(t, events, 1) - assert.Equal(t, Retrying, events[0].Type) - assert.Equal(t, boom, events[0].Err) -} - -func TestNewAttemptEventSink_EmitsStarted(t *testing.T) { - var events []WorkflowEvent - sink := NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }) - - step := NoOp("a") - info := AttemptInfo{StepInfo: StepInfo{Step: step}, Attempt: 2} - err := sink.InterceptAttempt(context.Background(), info, func(ctx context.Context) error { - return nil - }) - - assert.NoError(t, err) - assert.Len(t, events, 1) - assert.Equal(t, Started, events[0].Type) - assert.Equal(t, uint64(2), events[0].Attempt) - assert.Equal(t, step, events[0].Step) -} -``` - -- [ ] **Step 2: Run tests to verify they fail** - -```bash -go test ./... -run "TestNewStepEventSink|TestNewAttemptEventSink" -v -``` - -Expected: FAIL — functions not defined. - -- [ ] **Step 3: Implement `NewStepEventSink` and `NewAttemptEventSink` in `event.go`** - -```go -// Add to event.go: - -// terminalEventType maps an error to the corresponding terminal EventType. -func terminalEventType(err error) EventType { - if err == nil { - return Succeeded - } - switch StatusFromError(err) { - case Canceled: - return Canceled - case Skipped: - return Skipped - default: - return Failed - } -} - -// stepEventSink is the concrete type returned by NewStepEventSink. -// It implements both StepInterceptor and the package-private retryNotifier. -type stepEventSink struct { - sink func(WorkflowEvent) -} - -// NewStepEventSink returns a StepInterceptor that emits Scheduled, then a terminal -// event (Succeeded/Failed/Canceled/Skipped) for every step. It also receives -// Retrying events via the package-private retryNotifier interface. -func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor { - return &stepEventSink{sink: sink} -} - -func (s *stepEventSink) InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error { - s.sink(WorkflowEvent{Step: info.Step, Type: Scheduled}) - - if info.TerminalReason != Pending { - s.sink(WorkflowEvent{Step: info.Step, Type: EventType(info.TerminalReason)}) - return nil - } - - start := time.Now() - err := next(ctx) - s.sink(WorkflowEvent{ - Step: info.Step, - Type: terminalEventType(err), - Err: err, - Duration: time.Since(start), - }) - return err -} - -func (s *stepEventSink) onRetry(e WorkflowEvent) { s.sink(e) } - -// NewAttemptEventSink returns an AttemptInterceptor that emits a Started event -// for each attempt. -func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor { - return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - sink(WorkflowEvent{ - Step: info.Step, - Type: Started, - Attempt: info.Attempt, - }) - return next(ctx) - }) -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -```bash -go test ./... -run "TestNewStepEventSink|TestNewAttemptEventSink" -v -``` - -Expected: PASS. - -- [ ] **Step 5: Commit** - -```bash -git add event.go event_test.go -git commit -m "feat: add NewStepEventSink and NewAttemptEventSink" -``` - ---- - -## Task 3: Introduce `stepExecution` and refactor `tick()` - -This is the largest refactor. We replace the anonymous goroutine in `tick()` with a -`stepExecution` struct. `makeDoForStep` is deleted; its logic moves into -`stepExecution.runAttempt`. - -**Files:** -- Modify: `workflow.go` -- Modify: `workflow_test.go` - -- [ ] **Step 1: Write failing tests for `stepExecution` behavior** - -```go -// workflow_test.go — add these tests - -func TestStepExecution_BasicSuccess(t *testing.T) { - t.Parallel() - var events []WorkflowEvent - step := NoOp("a") - w := &Workflow{ - StepInterceptors: []StepInterceptor{ - NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), - }, - } - w.Add(Step(step)) - err := w.Do(context.Background()) - assert.NoError(t, err) - assert.Equal(t, []EventType{Scheduled, Succeeded}, eventTypes(events)) -} - -func TestStepExecution_StepInterceptorOrder(t *testing.T) { - t.Parallel() - var order []string - makeIC := func(name string) StepInterceptor { - return StepInterceptorFunc(func(ctx context.Context, info StepInfo, next func(context.Context) error) error { - order = append(order, name+":before") - err := next(ctx) - order = append(order, name+":after") - return err - }) - } - w := &Workflow{ - StepInterceptors: []StepInterceptor{makeIC("A"), makeIC("B")}, - } - w.Add(Step(NoOp("s"))) - assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, []string{"A:before", "B:before", "B:after", "A:after"}, order) -} - -func TestStepExecution_AttemptInterceptorOrder(t *testing.T) { - t.Parallel() - var order []string - makeIC := func(name string) AttemptInterceptor { - return AttemptInterceptorFunc(func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error { - order = append(order, name+":before") - err := next(ctx) - order = append(order, name+":after") - return err - }) - } - w := &Workflow{ - AttemptInterceptors: []AttemptInterceptor{makeIC("X"), makeIC("Y")}, - } - w.Add(Step(NoOp("s"))) - assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, []string{"X:before", "Y:before", "Y:after", "X:after"}, order) -} - -func TestStepExecution_SkippedStep(t *testing.T) { - t.Parallel() - var events []WorkflowEvent - step := NoOp("a") - w := &Workflow{ - StepInterceptors: []StepInterceptor{ - NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), - }, - } - w.Add(Step(step).When(func(_ context.Context, _ map[Steper]StepResult) StepStatus { - return Skipped - })) - assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, []EventType{Scheduled, Skipped}, eventTypes(events)) -} - -func TestStepExecution_RetryingEvent(t *testing.T) { - t.Parallel() - var events []WorkflowEvent - boom := errors.New("boom") - attempts := 0 - step := Func("s", func(ctx context.Context) error { - attempts++ - if attempts < 3 { - return boom - } - return nil - }) - w := &Workflow{ - StepInterceptors: []StepInterceptor{ - NewStepEventSink(func(e WorkflowEvent) { events = append(events, e) }), - }, - AttemptInterceptors: []AttemptInterceptor{ - NewAttemptEventSink(func(e WorkflowEvent) { events = append(events, e) }), - }, - } - w.Add(Step(step).Retry(func(o *RetryOption) { - o.Attempts = 3 - o.Backoff = &backoff.ZeroBackOff{} - })) - assert.NoError(t, w.Do(context.Background())) - types := eventTypes(events) - // Scheduled, Started(0), Retrying(0), Started(1), Retrying(1), Started(2), Succeeded - assert.Equal(t, []EventType{ - Scheduled, - Started, Retrying, - Started, Retrying, - Started, Succeeded, - }, types) - assert.Equal(t, []EventType{ - Scheduled, - Started, Retrying, - Started, Retrying, - Started, Succeeded, - }, types) -} - -// helper -func eventTypes(events []WorkflowEvent) []EventType { - types := make([]EventType, len(events)) - for i, e := range events { - types[i] = e.Type - } - return types -} -``` - -- [ ] **Step 2: Run tests to verify they fail** - -```bash -go test ./... -run "TestStepExecution" -v -``` - -Expected: FAIL — `StepInterceptors` field not defined. - -- [ ] **Step 3: Add fields to `Workflow` struct** - -In `workflow.go`, add two fields to the `Workflow` struct: - -```go -type Workflow struct { - MaxConcurrency int - DontPanic bool - SkipAsError bool - Clock clock.Clock - DefaultOption *StepOption - StepInterceptors []StepInterceptor // per-step global interceptors - AttemptInterceptors []AttemptInterceptor // per-attempt global interceptors - - StepBuilder - steps map[Steper]*State - statusChange *sync.Cond - leaseBucket chan struct{} - waitGroup sync.WaitGroup - isRunning sync.Mutex -} -``` - -- [ ] **Step 4: Add `stepExecution` struct and `scheduled` sentinel** - -Add after the `Workflow` struct in `workflow.go`: - -```go -// scheduled is a private StepStatus sentinel used by tick() to atomically -// claim a step and prevent double-spawning. It is never exposed to users. -const scheduled StepStatus = "scheduled" - -// stepExecution owns the full lifecycle of a single step run. -type stepExecution struct { - w *Workflow - step Steper - state *State - attempt uint64 // single source of truth shared by AttemptInfo and wireNotify - onRetry func(WorkflowEvent) // assembled from StepInterceptors that implement retryNotifier -} -``` - -- [ ] **Step 5: Implement `stepExecution.run()`** - -Add to `workflow.go`: - -```go -func (ex *stepExecution) run(ctx context.Context) { - defer ex.w.waitGroup.Done() - - // Evaluate condition now (safe: all upstreams are terminated at this point). - ups := ex.w.UpstreamOf(ex.step) - option := ex.state.Option() - cond := DefaultCondition - if option != nil && option.Condition != nil { - cond = option.Condition - } - - terminalReason := Pending - if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { - terminalReason = nextStatus - } - - info := StepInfo{Step: ex.step, TerminalReason: terminalReason} - - // Build StepInterceptor chain; also collect retryNotifiers for wireNotify. - var retrySinks []func(WorkflowEvent) - stepNext := ex.executeWithRetry - for i := len(ex.w.StepInterceptors) - 1; i >= 0; i-- { - ic := ex.w.StepInterceptors[i] - if rn, ok := ic.(retryNotifier); ok { - retrySinks = append(retrySinks, rn.onRetry) - } - next := stepNext - icLocal := ic - stepNext = func(ctx context.Context) error { - return icLocal.InterceptStep(ctx, info, next) - } - } - ex.onRetry = func(e WorkflowEvent) { - for _, s := range retrySinks { - s(e) - } - } - - var status StepStatus - var err error - - if terminalReason != Pending { - // Skipped or Canceled: run the chain (interceptors observe it), but - // executeWithRetry will never be called because chain was built with - // terminalReason set. The chain returns nil. - err = stepNext(ctx) - status = terminalReason - } else { - ex.state.SetStatus(Running) - err = stepNext(ctx) - status = statusFromError(err) - if status == Failed { - switch { - case DefaultIsCanceled(err), - errors.Is(err, context.Canceled), - errors.Is(err, context.DeadlineExceeded): - status = Canceled - } - } - } - - ex.state.SetStatus(status) - ex.state.SetError(err) - ex.w.unlease() - ex.w.signalStatusChange() -} -``` - -- [ ] **Step 6: Implement `stepExecution.executeWithRetry` and `stepExecution.runAttempt`** - -```go -// executeWithRetry is the bottom of the StepInterceptor chain. -// It wires Retrying events and drives the retry loop. -func (ex *stepExecution) executeWithRetry(ctx context.Context) error { - option := ex.state.Option() - ex.wireNotify(option) - - // Build AttemptInterceptor chain; innermost is runAttempt (Before→Do→After). - attemptChain := ex.buildAttemptChain() - - var notAfter time.Time - if option != nil && option.Timeout != nil { - notAfter = ex.w.Clock.Now().Add(*option.Timeout) - var cancel func() - ctx, cancel = ex.w.Clock.WithDeadline(ctx, notAfter) - defer cancel() - } - - return ex.w.retry(option.RetryOption)(ctx, attemptChain, notAfter) -} - -func (ex *stepExecution) buildAttemptChain() func(context.Context) error { - // Innermost: per-step Before callbacks → Do → After callbacks. - chain := func(ctx context.Context) error { - return ex.runAttempt(ctx) - } - for i := len(ex.w.AttemptInterceptors) - 1; i >= 0; i-- { - ic := ex.w.AttemptInterceptors[i] - next := chain - icLocal := ic - chain = func(ctx context.Context) error { - info := AttemptInfo{ - StepInfo: StepInfo{Step: ex.step}, - Attempt: ex.attempt, - } - return icLocal.InterceptAttempt(ctx, info, next) - } - } - return chain -} - -func (ex *stepExecution) runAttempt(ctx context.Context) error { - defer func() { ex.attempt++ }() - - // Propagate interceptors to SubWorkflow if applicable. - if recv, ok := ex.step.(InterceptorReceiver); ok { - recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) - } - - do := func(fn func() error) error { return fn() } - if ex.w.DontPanic { - do = catchPanicAsError - } - - var ctxStep context.Context - err := do(func() error { - ctxBefore, errBefore := ex.state.Before(ctx, ex.step, do) - ctxStep = ctxBefore - return errBefore - }) - if err != nil { - return ErrBeforeStep{err} - } - err = do(func() error { return ex.step.Do(ctxStep) }) - return do(func() error { return ex.state.After(ctxStep, ex.step, err) }) -} - -func (ex *stepExecution) wireNotify(option *StepOption) { - if option == nil || option.RetryOption == nil { - return - } - userNotify := option.RetryOption.Notify - option.RetryOption.Notify = func(err error, d time.Duration) { - e := WorkflowEvent{ - Step: ex.step, - Type: Retrying, - Attempt: ex.attempt, - Err: err, - BackoffDuration: d, - } - ex.attempt++ - if ex.onRetry != nil { - ex.onRetry(e) - } - if userNotify != nil { - userNotify(err, d) - } - } -} -``` - -Note: add a `statusFromError` helper in `workflow.go` (replaces the inline `StatusFromError` call): - -```go -func statusFromError(err error) StepStatus { - if err == nil { - return Succeeded - } - if s := StatusFromError(err); s != Failed { - return s - } - return Failed -} -``` - -- [ ] **Step 7: Simplify `tick()`** - -Replace the entire goroutine block in `tick()`: - -```go -// Before (remove this block): -state.SetStatus(Running) -w.waitGroup.Add(1) -go func(ctx context.Context, step Steper, state *State) { - // ... entire anonymous goroutine body ... -}(ctx, step, state) - -// After: -state.SetStatus(scheduled) -w.waitGroup.Add(1) -ex := &stepExecution{w: w, step: step, state: state} -go ex.run(ctx) -``` - -Also remove the `makeDoForStep` method from `workflow.go` entirely (its logic is now in `stepExecution.runAttempt`). - -And update `runStep` — it is now unused; remove it. Its timeout and retry logic moved into `executeWithRetry`. - -- [ ] **Step 8: Run all tests** - -```bash -go test ./... -v 2>&1 | tail -30 -``` - -Expected: All existing tests PASS, new `TestStepExecution_*` tests PASS. - -- [ ] **Step 9: Commit** - -```bash -git add workflow.go workflow_test.go event.go event_test.go -git commit -m "feat: introduce stepExecution, add StepInterceptors/AttemptInterceptors to Workflow" -``` - ---- - -## Task 4: `SubWorkflow` implements `InterceptorReceiver` - -**Files:** -- Modify: `wrap.go` -- Modify: `wrap_test.go` - -- [ ] **Step 1: Write failing test** - -```go -// wrap_test.go — add this test - -func TestSubWorkflow_InterceptorPropagation(t *testing.T) { - t.Parallel() - - var events []WorkflowEvent - sink := NewStepEventSink(func(e WorkflowEvent) { - events = append(events, e) - }) - - innerStep := NoOp("inner") - type mySubStep struct{ SubWorkflow } - sub := &mySubStep{} - sub.Add(Step(innerStep)) - - w := &Workflow{ - StepInterceptors: []StepInterceptor{sink}, - } - w.Add(Step(sub)) - - assert.NoError(t, w.Do(context.Background())) - - // Expect events for both outer step (sub) and inner step (innerStep) - types := eventTypes(events) - assert.Contains(t, types, Scheduled) - assert.Contains(t, types, Succeeded) - // There should be at least 4 events: Scheduled+Succeeded for sub, Scheduled+Succeeded for innerStep - assert.GreaterOrEqual(t, len(events), 4) - // All events should have a non-nil Step - for _, e := range events { - assert.NotNil(t, e.Step) - } -} - -func TestSubWorkflow_ChildInterceptorPreserved(t *testing.T) { - t.Parallel() - - var parentEvents []WorkflowEvent - var childEvents []WorkflowEvent - - parentSink := NewStepEventSink(func(e WorkflowEvent) { parentEvents = append(parentEvents, e) }) - childSink := NewStepEventSink(func(e WorkflowEvent) { childEvents = append(childEvents, e) }) - - innerStep := NoOp("inner") - type mySubStep struct{ SubWorkflow } - sub := &mySubStep{} - sub.Add(Step(innerStep)) - // child-only interceptor - sub.w.StepInterceptors = []StepInterceptor{childSink} - - w := &Workflow{ - StepInterceptors: []StepInterceptor{parentSink}, - } - w.Add(Step(sub)) - - assert.NoError(t, w.Do(context.Background())) - - // Parent sees outer step + inner step (propagated) - assert.GreaterOrEqual(t, len(parentEvents), 4) - // Child sees inner step only - assert.GreaterOrEqual(t, len(childEvents), 2) -} -``` - -- [ ] **Step 2: Run tests to verify they fail** - -```bash -go test ./... -run "TestSubWorkflow_Interceptor" -v -``` - -Expected: FAIL — `SubWorkflow` does not implement `InterceptorReceiver`. - -- [ ] **Step 3: Implement `PrependInterceptors` on `SubWorkflow`** - -Add to `wrap.go`: - -```go -// PrependInterceptors implements InterceptorReceiver. -// Parent interceptors are prepended so they execute outside child interceptors. -func (s *SubWorkflow) PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) { - s.w.StepInterceptors = append(step, s.w.StepInterceptors...) - s.w.AttemptInterceptors = append(attempt, s.w.AttemptInterceptors...) -} -``` - -- [ ] **Step 4: Run tests to verify they pass** - -```bash -go test ./... -run "TestSubWorkflow_Interceptor" -v -``` - -Expected: PASS. - -- [ ] **Step 5: Run full test suite** - -```bash -go test ./... -``` - -Expected: All PASS. - -- [ ] **Step 6: Commit** - -```bash -git add wrap.go wrap_test.go -git commit -m "feat: SubWorkflow implements InterceptorReceiver for interceptor propagation" -``` - ---- - -## Task 5: Verify `retry()` integration with `wireNotify` - -This task tests the full retry + Retrying event pipeline end-to-end with real backoff. - -**Files:** -- Modify: `workflow_test.go` - -- [ ] **Step 1: Write failing test** - -```go -// workflow_test.go — add this test - -func TestStepExecution_RetryingEventAttemptNumbers(t *testing.T) { - t.Parallel() - - var events []WorkflowEvent - mu := sync.Mutex{} - record := func(e WorkflowEvent) { - mu.Lock() - events = append(events, e) - mu.Unlock() - } - - callCount := 0 - step := Func("flaky", func(ctx context.Context) error { - callCount++ - if callCount < 3 { - return errors.New("not yet") - } - return nil - }) - - w := &Workflow{ - StepInterceptors: []StepInterceptor{NewStepEventSink(record)}, - AttemptInterceptors: []AttemptInterceptor{NewAttemptEventSink(record)}, - } - w.Add(Step(step).Retry(func(o *RetryOption) { - o.Attempts = 5 - o.Backoff = &backoff.ZeroBackOff{} - })) - - assert.NoError(t, w.Do(context.Background())) - - types := eventTypes(events) - assert.Equal(t, []EventType{ - Scheduled, // StepInterceptor entry - Started, // attempt 0 - Retrying, // attempt 0 failed - Started, // attempt 1 - Retrying, // attempt 1 failed - Started, // attempt 2 succeeds - Succeeded, // StepInterceptor exit - }, types) - - // Verify attempt numbers in Retrying events - retryingEvents := filterEvents(events, Retrying) - assert.Equal(t, uint64(0), retryingEvents[0].Attempt) - assert.Equal(t, uint64(1), retryingEvents[1].Attempt) - - // Verify attempt numbers in Started events - startedEvents := filterEvents(events, Started) - assert.Equal(t, uint64(0), startedEvents[0].Attempt) - assert.Equal(t, uint64(1), startedEvents[1].Attempt) - assert.Equal(t, uint64(2), startedEvents[2].Attempt) -} - -// helpers (add once, reuse across tests) -func filterEvents(events []WorkflowEvent, t EventType) []WorkflowEvent { - var rv []WorkflowEvent - for _, e := range events { - if e.Type == t { - rv = append(rv, e) - } - } - return rv -} -``` - -- [ ] **Step 2: Run test to verify it fails** - -```bash -go test ./... -run "TestStepExecution_RetryingEventAttemptNumbers" -v -``` - -Expected: FAIL. - -- [ ] **Step 3: Run test to verify it passes (no code change needed)** - -This test should pass once Task 3 is complete. If it fails, there is a bug in `wireNotify` or attempt counter ordering — debug `stepExecution.wireNotify`. - -```bash -go test ./... -run "TestStepExecution_RetryingEventAttemptNumbers" -v -``` - -Expected: PASS. - -- [ ] **Step 4: Commit** - -```bash -git add workflow_test.go -git commit -m "test: verify Retrying event attempt numbers are correctly sequenced" -``` - ---- - -## Task 6: Final integration and cleanup - -**Files:** -- Modify: `workflow_test.go` -- Modify: `event_test.go` - -- [ ] **Step 1: Run full test suite including example package** - -```bash -go test ./... -``` - -Expected: All PASS with no race conditions. - -- [ ] **Step 2: Run with race detector** - -```bash -go test -race ./... -``` - -Expected: All PASS, no data race detected. - -- [ ] **Step 3: Verify zero-cost when no interceptors are set** - -```go -// workflow_test.go — add this test -func TestWorkflow_NoInterceptors_NoAlloc(t *testing.T) { - t.Parallel() - // Workflows without interceptors must not regress existing behavior. - step := NoOp("a") - w := &Workflow{} - w.Add(Step(step)) - assert.NoError(t, w.Do(context.Background())) - assert.Equal(t, Succeeded, w.StateOf(step).GetStatus()) -} -``` - -```bash -go test ./... -run "TestWorkflow_NoInterceptors_NoAlloc" -v -``` - -Expected: PASS. - -- [ ] **Step 4: Final commit** - -```bash -git add -u -git commit -m "test: final integration tests and race detector clean" -``` diff --git a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md b/docs/superpowers/specs/2026-05-06-step-interceptor-design.md deleted file mode 100644 index 878f409..0000000 --- a/docs/superpowers/specs/2026-05-06-step-interceptor-design.md +++ /dev/null @@ -1,285 +0,0 @@ -# Step Interceptor Design - -**Date:** 2026-05-06 -**Status:** Draft -**Scope:** go-workflow structured observability via two-layer interceptor system - ---- - -## Why - -Currently, observability in go-workflow requires users to wire `BeforeStep`/`AfterStep` callbacks -manually on individual steps. There is no structured way to observe all steps globally — no -lifecycle hooks, no attempt count, no timing. - -In production, you need to answer: which step is running right now? How many retries has step X -done? How long did step Y take? None of these are answerable today without bespoke instrumentation. - -This design introduces a two-layer interceptor system that: -- Provides global, structured observability across all steps -- Is orthogonal to `BeforeStep`/`AfterStep` — they serve different scopes and both are preserved -- Propagates automatically into nested `SubWorkflow`s - ---- - -## Concepts - -### StepStatus vs the interceptor layers - -**`StepStatus`** is the state machine used by the orchestration engine. It is persistent and -queryable. The `Condition` system reads it to decide whether to run downstream steps. - -The interceptors are a separate, orthogonal observability mechanism. They do not replace or alter -`StepStatus` — they wrap execution to give users structured hooks. - -The key difference: `Running` is a single `StepStatus` that spans the entire retry loop. Within -it, `AttemptInterceptor` fires multiple times (once per attempt). These cannot be merged without -breaking the `Condition` system. - -``` -StepStatus: Pending ──────────────────────────────► Running ──────────────► Succeeded - └──► Failed - └──► Canceled - └─────────────────────────────────────────────────────────────► Skipped - └─────────────────────────────────────────────────────────────► Canceled - -Interceptors: StepInterceptor.entry - AttemptInterceptor[attempt=0] - AttemptInterceptor[attempt=1] - AttemptInterceptor[attempt=2] - StepInterceptor.exit (err=nil → Succeeded) -``` - ---- - -## Architecture - -### Two-Layer Interceptor Stack - -``` -StepInterceptor[0] - └── StepInterceptor[1] - └── [retry loop] - └── AttemptInterceptor[0] - └── AttemptInterceptor[1] - └── [per-step BeforeStep callbacks] ← from StepConfig - └── step.Do(ctx) - └── [per-step AfterStep callbacks] -``` - -**StepInterceptor** wraps the entire lifecycle of a step including all retry attempts. It is -called exactly once per step: on entry `info.TerminalReason` tells you whether the step will -execute (`Pending`) or has already been determined terminal (`Skipped`/`Canceled`). On exit the -returned error reflects the final outcome. Right place for OTel spans (one span per step) and -step-level metrics. - -**AttemptInterceptor** wraps each individual attempt (`Before → Do → After`). It fires once per -attempt, including retried ones. The error returned by `next` is the attempt's failure (if any) -— the interceptor can inspect it before returning. Right place for per-attempt logging and -attempt-level tracing. - -**BeforeStep/AfterStep** (existing) are step-level callbacks configured per-step via `StepConfig`. -Interceptors are workflow-level and apply globally. They are orthogonal — interceptors execute on -the outside, BeforeStep/AfterStep execute on the inside. - -### stepExecution (internal) - -The current anonymous goroutine in `tick()` is replaced by a `stepExecution` struct: - -```go -type stepExecution struct { - w *Workflow - step Steper - state *State - attempt uint64 // single source of truth; incremented in buildAttemptChain wrapper -} -``` - -### tick() simplification - -`tick()` evaluates each runnable Step's `Condition` inline: - -- If the Condition returns a terminal status (`Skipped` / `Canceled`), the Step's - `StepResult` is set directly and execution moves on. No goroutine is spawned, no - `MaxConcurrency` lease is consumed, no interceptor runs. -- Otherwise, `tick()` takes a lease, sets the status to `Running`, and spawns a worker - that runs the interceptor chain. - -Because the worker's status flip to `Running` happens under `statusChange.L` *before* the -goroutine is spawned, a subsequent `tick()` cannot see the Step as `Pending` and double- -spawn it. No `scheduled` sentinel is needed, and `StateOf(step).GetStatus()` only ever -returns documented public `StepStatus` values. - -When a Step is settled inline, `tick()` re-iterates within the same call so newly- -unblocked downstream Steps are picked up immediately (no signal would otherwise wake the -main loop). - ---- - -## API - -### New Types - -```go -// StepInterceptor intercepts the full lifecycle of a step (all retry attempts). -// Skipped and Canceled steps do not enter the interceptor chain. -type StepInterceptor interface { - InterceptStep(ctx context.Context, step Steper, next func(context.Context) error) error -} - -// AttemptInterceptor intercepts each individual attempt (Before → Do → After). -// The error returned by next (if any) is the attempt's failure. -type AttemptInterceptor interface { - InterceptAttempt(ctx context.Context, step Steper, attempt uint64, next func(context.Context) error) error -} - -// StepInterceptorFunc is a function adapter for StepInterceptor. -type StepInterceptorFunc func(ctx context.Context, step Steper, next func(context.Context) error) error - -// AttemptInterceptorFunc is a function adapter for AttemptInterceptor. -type AttemptInterceptorFunc func(ctx context.Context, step Steper, attempt uint64, next func(context.Context) error) error - -// InterceptorReceiver is implemented by steps that contain a sub-workflow. -type InterceptorReceiver interface { - PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) -} -``` - -### Workflow struct additions - -```go -type Workflow struct { - // ... existing fields unchanged ... - - // StepInterceptors are called once per step, wrapping the full retry lifecycle. - // Executed in order: [0] is outermost, [len-1] is innermost. - StepInterceptors []StepInterceptor - - // AttemptInterceptors are called once per attempt, inside the retry loop. - // Executed in order: [0] is outermost, [len-1] is innermost. - AttemptInterceptors []AttemptInterceptor - - // IsolateInterceptors disables inheriting interceptors from a parent workflow. - // When true, PrependInterceptors is a no-op for this workflow. - IsolateInterceptors bool -} -``` - -### Usage examples - -```go -// OTel: one span per step -w := &flow.Workflow{ - StepInterceptors: []flow.StepInterceptor{ - flow.StepInterceptorFunc(func(ctx context.Context, step flow.Steper, next func(context.Context) error) error { - ctx, span := tracer.Start(ctx, flow.String(step)) - defer span.End() - err := next(ctx) - if err != nil { - span.RecordError(err) - } - return err - }), - }, -} - -// Per-attempt logging with attempt number and error -w := &flow.Workflow{ - AttemptInterceptors: []flow.AttemptInterceptor{ - flow.AttemptInterceptorFunc(func(ctx context.Context, step flow.Steper, attempt uint64, next func(context.Context) error) error { - err := next(ctx) - slog.Info("attempt", "step", flow.String(step), "attempt", attempt, "err", err) - return err - }), - }, -} -``` - ---- - -## SubWorkflow Propagation - -`Workflow` itself implements `InterceptorReceiver` via `PrependInterceptors`, so any nested -workflow — whether embedded via `SubWorkflow` or used directly as a step — inherits its parent's -interceptors. `SubWorkflow.PrependInterceptors` simply delegates to the inner `Workflow`. - -Once in `executeWithRetry` (before the retry loop), `stepExecution` injects the parent's -interceptors into the child: - -```go -if recv, ok := ex.step.(InterceptorReceiver); ok { - recv.PrependInterceptors(ex.w.StepInterceptors, ex.w.AttemptInterceptors) -} -``` - -`PrependInterceptors` uses `make`+`copy` to build fresh slices, so parent interceptors are -prepended without aliasing the parent's backing array and without accumulating across `Reset()` -cycles. - -### Opting out: `IsolateInterceptors` - -Set `Workflow.IsolateInterceptors = true` on a child to disable inheritance. `PrependInterceptors` -becomes a no-op and the child runs with only its own interceptor stack. Useful when the child -defines a self-contained observability pipeline (e.g., its own tracer / event sink) that must -not be wrapped by the parent. - -Execution stack for inner steps (default, inheritance enabled): - -``` -[parent StepInterceptors] → [child StepInterceptors] → retry → [parent AttemptInterceptors] → [child AttemptInterceptors] → Before → Do → After -``` - -With `IsolateInterceptors = true`: - -``` -[child StepInterceptors] → retry → [child AttemptInterceptors] → Before → Do → After -``` - ---- - -## Skipped / Canceled steps - -Steps that are Skipped or Canceled by their `Condition` do **not** enter the interceptor chain. -Their final status is set directly and the interceptors are never invoked. Post-run status is -queryable via `workflow.StateOf(step).GetStatus()`. - ---- - -## What Does Not Change - -- `BeforeStep` / `AfterStep` / `Input` / `Output` — API and behavior unchanged -- `StepConfig`, `StepOption`, `RetryOption` — unchanged -- `StepStatus` — no new values; only documented public statuses are ever observable -- `Condition` system — unchanged -- `SubWorkflow` embedding pattern — unchanged, just gains `PrependInterceptors` -- No breaking changes to existing workflow definitions - ---- - -## Files Affected - -| File | Change | -|------|--------| -| `workflow.go` | Add `StepInterceptors`, `AttemptInterceptors` fields; simplify `tick()`; add `stepExecution` | -| `interceptor.go` | New file: interceptor interfaces, info types, func adapters, `InterceptorReceiver` | -| `wrap.go` | `SubWorkflow.PrependInterceptors` delegates to embedded `Workflow.PrependInterceptors` | - ---- - -## Open Questions - -None. All questions from the brainstorm have been resolved: - -| Question | Resolution | -|----------|------------| -| EventSink vs Interceptor | Pure interceptor; no built-in EventSink adapter — users bring their own event types | -| Per-step vs per-attempt | Both layers; different use cases | -| Skipped/Canceled visibility | Skipped/Canceled steps bypass interceptor chain entirely; query post-run via StateOf | -| StepInfo / AttemptInfo wrappers | Removed; step passed as Steper directly; attempt as uint64 directly | -| SubWorkflow propagation | PrependInterceptors on InterceptorReceiver; once per step, make+copy | -| Retrying / BackoffDuration event | Removed; not worth the side-channel complexity; failure error available from InterceptAttempt | -| attempt counter ownership | stepExecution owns it; incremented in buildAttemptChain wrapper | -| BeforeStep/AfterStep fate | Unchanged; orthogonal to Interceptors | -| Step identifier / name | No precomputed name; Step pointer is the identifier; callers call flow.String() | -| EventType / WorkflowEvent | Removed; users define their own event types | -| Breaking changes | None | diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/.openspec.yaml b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/.openspec.yaml deleted file mode 100644 index eebe4d8..0000000 --- a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/.openspec.yaml +++ /dev/null @@ -1,2 +0,0 @@ -schema: spec-driven -created: 2026-05-05 diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/design.md b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/design.md deleted file mode 100644 index bcdc89b..0000000 --- a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/design.md +++ /dev/null @@ -1,54 +0,0 @@ -## Context - -`ErrWorkflow` is defined as `map[Steper]StepResult`. The `Error()` method iterates this map directly, producing non-deterministic output. `StepResult` currently holds only `Status StepStatus` and `Err error` — no timing information. - -The workflow already has an injected `clock.Clock` field (from `github.com/benbjohnson/clock`) used for retry/timeout, so clock-based timestamping is available without new dependencies. - -Step goroutines terminate via a shared `defer` block in `workflow.go` that calls `state.SetStatus(status)` and `state.SetError(err)` before signalling. This is the single canonical termination point — the right place to record `FinishedAt`. - -## Goals / Non-Goals - -**Goals:** -- `ErrWorkflow.Error()` produces deterministic, execution-finish-time-ordered output. -- `ErrWorkflow.Unwrap()` returns errors in the same order. -- `StepResult.FinishedAt` is populated for all terminated steps and available to `Condition` functions and external observers. -- Tests remain deterministic via the existing `clock.Clock` injection. - -**Non-Goals:** -- Recording `StartedAt` — out of scope for this change. -- Changing the `ErrWorkflow` underlying type from `map` to an ordered structure — the map stays; sorting happens only at output time. -- Displaying the timestamp in `Error()` output — ordering is the goal, not showing timestamps to users. - -## Decisions - -### D1: Add `FinishedAt time.Time` to `StepResult` (not to `State` separately) - -`StepResult` is the public snapshot type returned from `GetStepResult()`, passed into `Condition` functions, and embedded in `ErrWorkflow`. Adding the field here makes it available to all consumers with no additional API surface. - -Alternative considered: add a separate `finishedAt` field to `State` only and use it just for sorting in `ErrWorkflow.Error()`. Rejected — this hides useful information from `Condition` authors and duplicates the timestamp concept. - -### D2: Record timestamp at the step goroutine's defer, using `w.Clock.Now()` - -The defer in the step goroutine is the single termination point for all outcomes (success, failure, cancel, panic). Recording `FinishedAt` there — alongside `SetStatus` and `SetError` — is atomic from the workflow's perspective and covers all code paths. - -`w.Clock.Now()` keeps tests deterministic; tests using `clock.NewMock()` already control time for retry/timeout assertions. - -### D3: Sort in `Error()` and `Unwrap()` by `FinishedAt` ascending; zero-time steps last, then by `String(step)` for stability - -Steps that never executed (Skipped by condition before running, Pending) will have zero `FinishedAt`. They sort to the end. Among steps with the same timestamp (possible with mocked clocks or extremely fast steps), `String(step)` provides a stable secondary sort. This matches the mental model: "what ran first, then what failed." - -Alternative: sort only in `Error()`, leave `Unwrap()` unordered. Rejected — consistency between the two methods avoids surprising behavior when callers use `errors.As`/`errors.Is` traversal order for logic. - -### D4: No new exported helper to set `FinishedAt` — set it directly in the workflow goroutine - -`State` already has unexported setters (`SetStatus`, `SetError`). We add `SetFinishedAt(t time.Time)` following the same pattern, called only from the workflow's internal goroutine. This keeps the mutation path narrow. - -## Risks / Trade-offs - -- **Struct literal breakage**: Any code constructing `StepResult{val1, val2}` positionally (without field names) will fail to compile after the new field is added. This is caught at compile time and is trivially fixable. It is unlikely in practice since `StepResult` is a library type. -- **Mock clock in condition tests**: `Condition` unit tests that construct `StepResult` manually (e.g., `condition_test.go`) will need to populate `FinishedAt` explicitly if they care about ordering. Tests that don't check `ErrWorkflow.Error()` output need no changes. -- **Tied to wall clock resolution**: On systems with low-resolution clocks, two steps terminating in the same tick will fall back to name-based ordering. This is acceptable — the output is still deterministic. - -## Migration Plan - -No migration needed. `FinishedAt` is an additive field. Existing `StepResult` values constructed by field name (`StepResult{Status: ..., Err: ...}`) get a zero `FinishedAt` and continue to compile. The sort in `Error()` is purely cosmetic — no behavioral changes to workflow execution. diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/proposal.md b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/proposal.md deleted file mode 100644 index cc0a23a..0000000 --- a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/proposal.md +++ /dev/null @@ -1,27 +0,0 @@ -## Why - -When a workflow fails, `ErrWorkflow.Error()` outputs steps in random order because `ErrWorkflow` is a `map[Steper]StepResult` and Go map iteration is non-deterministic. This makes failure traces hard to read and impossible to compare across runs, hindering debugging. - -## What Changes - -- Add `FinishedAt time.Time` field to `StepResult` to record when each step terminated. -- Record the finish timestamp (using the workflow's injected `clock.Clock`) in the step goroutine, just before signalling status change. -- `ErrWorkflow.Error()` sorts steps by `FinishedAt` ascending (steps that never ran sort last, then by name for stability). -- `ErrWorkflow.Unwrap()` returns errors in the same sorted order for consistency. - -## Capabilities - -### New Capabilities - -_(none)_ - -### Modified Capabilities - -- `execution-model`: `StepResult` gains a `FinishedAt time.Time` field populated at step termination; `ErrWorkflow.Error()` and `Unwrap()` now produce output in execution-finish order instead of random map iteration order. - -## Impact - -- `StepResult` gains a new exported field — additive, not breaking for existing code that constructs or reads `StepResult` by field name. Code using `StepResult{Status: ..., Err: ...}` struct literals (without field names) would break at compile time, but that pattern is unlikely and easily fixed. -- `Condition` functions receive `map[Steper]StepResult` — the new field is available to condition authors at no extra cost. -- The workflow's `clock.Clock` field (already present) is used for timestamping, keeping tests deterministic. -- No new dependencies. No API removals. diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/specs/execution-model/spec.md b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/specs/execution-model/spec.md deleted file mode 100644 index 3974ee3..0000000 --- a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/specs/execution-model/spec.md +++ /dev/null @@ -1,73 +0,0 @@ -## ADDED Requirements - -### Requirement: StepResult carries finish timestamp - -`StepResult` SHALL include a `FinishedAt time.Time` field that records the moment -the step goroutine transitioned to a terminal status (`Succeeded`, `Failed`, `Canceled`, -or `Skipped`). - -The timestamp SHALL be recorded using the Workflow's injected `clock.Clock`, so that -tests using a mock clock produce deterministic values. - -Steps that are never executed (e.g., never transitioned to `Running`) SHALL have a -zero `FinishedAt` value. - -#### Scenario: Succeeded step has FinishedAt set -- **WHEN** `step.Do(ctx)` returns `nil` and the step transitions to `Succeeded` -- **THEN** `StepResult.FinishedAt` is set to `clock.Now()` at the moment of transition - -#### Scenario: Failed step has FinishedAt set -- **WHEN** `step.Do(ctx)` returns a non-nil error and the step transitions to `Failed` -- **THEN** `StepResult.FinishedAt` is set to `clock.Now()` at the moment of transition - -#### Scenario: Canceled step has FinishedAt set -- **WHEN** a step is canceled and transitions to `Canceled` -- **THEN** `StepResult.FinishedAt` is set to `clock.Now()` at the moment of transition - -#### Scenario: Skipped step has FinishedAt set -- **WHEN** a step's Condition evaluates to `Skipped` and the step never runs -- **THEN** `StepResult.FinishedAt` is set to `clock.Now()` at the moment of the skip transition - -#### Scenario: Never-executed step has zero FinishedAt -- **WHEN** a step remains `Pending` at the end of workflow execution -- **THEN** `StepResult.FinishedAt` is the zero value of `time.Time` - -#### Scenario: FinishedAt available in Condition functions -- **WHEN** a Condition function receives `map[Steper]StepResult` for upstream steps -- **THEN** `FinishedAt` is populated for all terminated upstream steps and available to the condition logic - -## MODIFIED Requirements - -### Requirement: ErrWorkflow error output ordering - -`ErrWorkflow.Error()` SHALL output steps sorted by `StepResult.FinishedAt` in ascending -order (earliest-finishing step first), so that the error message reflects the execution -timeline. - -Steps with a zero `FinishedAt` (i.e., steps that never executed) SHALL appear last. - -When two or more steps share an identical `FinishedAt` value, they SHALL be sorted -by their string name (`flow.String(step)`) in ascending lexicographic order to produce -a stable, deterministic output. - -`ErrWorkflow.Unwrap()` SHALL return errors in the same sorted order. - -#### Scenario: Single-step workflow failure output -- **WHEN** a workflow with one failed step produces `ErrWorkflow` -- **THEN** `ErrWorkflow.Error()` contains exactly that step's output - -#### Scenario: Multi-step output is sorted by finish time -- **WHEN** steps A, B, C finish in that order (A earliest, C latest) -- **THEN** `ErrWorkflow.Error()` lists them A, B, C regardless of map iteration order - -#### Scenario: Never-executed steps appear last -- **WHEN** some steps have zero `FinishedAt` (never ran) and others have non-zero timestamps -- **THEN** `ErrWorkflow.Error()` lists all non-zero-timestamp steps first, zero-timestamp steps last - -#### Scenario: Tie-breaking by name -- **WHEN** two steps have identical `FinishedAt` values -- **THEN** `ErrWorkflow.Error()` lists them in ascending lexicographic order by step name - -#### Scenario: Unwrap order matches Error order -- **WHEN** `ErrWorkflow.Unwrap()` is called -- **THEN** the returned error slice is in the same order as `ErrWorkflow.Error()` output diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/tasks.md b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/tasks.md deleted file mode 100644 index ba87c73..0000000 --- a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/tasks.md +++ /dev/null @@ -1,31 +0,0 @@ -## 1. Extend StepResult with FinishedAt - -- [ ] 1.1 Add `FinishedAt time.Time` field to `StepResult` struct in `error.go` -- [ ] 1.2 Add `SetFinishedAt(t time.Time)` method to `State` in `state.go`, following the same pattern as `SetStatus`/`SetError` -- [ ] 1.3 Add `GetFinishedAt() time.Time` method to `State` if needed for symmetry - -## 2. Record Timestamp at Step Termination - -- [ ] 2.1 In the step goroutine defer block in `workflow.go`, call `state.SetFinishedAt(w.Clock.Now())` just before `state.SetStatus(status)` and `state.SetError(err)` -- [ ] 2.2 For condition-evaluated skips (where a step is skipped without running), ensure `SetFinishedAt` is also called at the point the skip status is assigned in `tick()` - -## 3. Sort ErrWorkflow Output - -- [ ] 3.1 Add `sort` import to `error.go` -- [ ] 3.2 Implement a helper `sortedSteps(e ErrWorkflow) []Steper` that returns steps sorted by `FinishedAt` ascending, zero-time last, tie-broken by `String(step)` lexicographically -- [ ] 3.3 Rewrite `ErrWorkflow.Error()` to use `sortedSteps` for iteration -- [ ] 3.4 Rewrite `ErrWorkflow.Unwrap()` to use `sortedSteps` for iteration - -## 4. Tests - -- [ ] 4.1 Add a test asserting `StepResult.FinishedAt` is populated after workflow execution (use `clock.NewMock()` and advance time between steps) -- [ ] 4.2 Add a test for `ErrWorkflow.Error()` output ordering: run a 3-step serial workflow, verify output is in execution order regardless of step name sort order -- [ ] 4.3 Add a test for tie-breaking: construct an `ErrWorkflow` with two steps sharing identical `FinishedAt`, verify alphabetical order in output -- [ ] 4.4 Add a test for zero-`FinishedAt` steps appearing last in output -- [ ] 4.5 Verify existing tests in `condition_test.go` that construct `StepResult` manually still compile and pass (update literals to use field names if needed) - -## 5. Verify - -- [ ] 5.1 Run `go build ./...` — no compile errors -- [ ] 5.2 Run `go test ./...` — all tests pass -- [ ] 5.3 Run `go vet ./...` — no issues diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/.openspec.yaml b/openspec/changes/archive/2026-05-06-structured-event-sink/.openspec.yaml deleted file mode 100644 index 905325f..0000000 --- a/openspec/changes/archive/2026-05-06-structured-event-sink/.openspec.yaml +++ /dev/null @@ -1,2 +0,0 @@ -schema: spec-driven -created: 2026-05-04 diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/design.md b/openspec/changes/archive/2026-05-06-structured-event-sink/design.md deleted file mode 100644 index e7885a8..0000000 --- a/openspec/changes/archive/2026-05-06-structured-event-sink/design.md +++ /dev/null @@ -1,109 +0,0 @@ -# Step Interceptor Design - -## Summary - -The original proposal called for a simple `EventSink func(WorkflowEvent)` field on `Workflow`. -During design exploration, this evolved into a two-layer interceptor system that is strictly more -powerful: EventSink becomes a built-in adapter on top of the interceptor API. - -The key insight from studying Temporal's Go SDK: a global observability hook is most useful when -it wraps the full execution lifecycle (like a `WorkerInterceptor`), not just fires events. This -allows users to implement OTel spans, Prometheus histograms, and structured logging with a single -consistent API. - -## Design Decisions - -### Interceptor vs EventSink - -`StepInterceptor` and `AttemptInterceptor` replace the proposed `EventSink func(WorkflowEvent)`. -`NewStepEventSink` and `NewAttemptEventSink` are built-in adapters that implement these interfaces -and emit `WorkflowEvent`s — users who only want structured events use these adapters and never -interact with the interceptor interfaces directly. - -### Two Layers - -- **`StepInterceptor`**: wraps the full step lifecycle (all retry attempts). One invocation per step. - Right place for OTel spans, step-level metrics. -- **`AttemptInterceptor`**: wraps each individual attempt (`Before → Do → After`). Right place - for per-attempt logging, attempt-level tracing. - -### BeforeStep/AfterStep are orthogonal - -Interceptors are workflow-level; `BeforeStep`/`AfterStep` are step-level (per-step `StepConfig`). -They execute in different layers of the stack and are configured independently. No changes to the -existing `BeforeStep`/`AfterStep` API. - -### StepStatus vs EventType - -These are deliberately separate types: -- `StepStatus` is the orchestration engine's state machine, used by `Condition` evaluation. -- `EventType` is an observation stream for external consumers. `Running` has no `EventType` - equivalent — within it, multiple `Started` and `Retrying` events fire. - -### Retrying event delivery - -`Retrying` fires inside `backoff.RetryNotifyWithTimer`'s Notify callback — between two consecutive -`next()` calls, outside the interceptor chain. It is delivered via `wireNotify`, a side-channel -that assembles `ex.onRetry` from interceptors implementing the package-private `retryNotifier` -interface. The concrete type returned by `NewStepEventSink` implements this interface. - -### SubWorkflow propagation - -`SubWorkflow` implements `InterceptorReceiver`. `stepExecution` calls `PrependInterceptors` in -`executeWithRetry` (once per step, not per attempt) so parent interceptors wrap child interceptors. - -### stepExecution refactor - -The anonymous goroutine in `tick()` is extracted into a `stepExecution` struct. `tick()` becomes -a single-responsibility function: atomically claim a step with a private `scheduled` sentinel. -All lifecycle logic (condition evaluation, interceptor chain assembly, retry, event delivery) -moves into `stepExecution.run()`. - -## API Surface - -```go -// New on Workflow struct -StepInterceptors []StepInterceptor -AttemptInterceptors []AttemptInterceptor - -// New interfaces -type StepInterceptor interface { - InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error -} -type AttemptInterceptor interface { - InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error -} - -// Function adapters -type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error -type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error - -// Info types -type StepInfo struct { - Step Steper - TerminalReason StepStatus // Pending = will execute normally -} -type AttemptInfo struct { - StepInfo - Attempt uint64 -} - -// Event types -type EventType string // Scheduled / Started / Retrying / EventSucceeded / EventFailed / EventCanceled / EventSkipped -type WorkflowEvent struct { Step Steper; Type EventType; Attempt uint64; Err error; Duration, BackoffDuration time.Duration } - -// Built-in adapters -func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor -func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor - -// SubWorkflow propagation -type InterceptorReceiver interface { - PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) -} -``` - -## No Breaking Changes - -All existing APIs (`BeforeStep`, `AfterStep`, `Condition`, `RetryOption`, `SubWorkflow` embedding) -are unchanged. The new fields on `Workflow` are zero-value safe — workflows without interceptors -behave identically to before. diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/proposal.md b/openspec/changes/archive/2026-05-06-structured-event-sink/proposal.md deleted file mode 100644 index 89e3412..0000000 --- a/openspec/changes/archive/2026-05-06-structured-event-sink/proposal.md +++ /dev/null @@ -1,75 +0,0 @@ -## Why - -Currently, observability in go-workflow relies on users wiring up `BeforeStep`/`AfterStep` -callbacks manually on every step they care about. There is no structured way to observe -all steps globally — no lifecycle events, no attempt count, no timing, no retry visibility. - -In production, you need to answer: which step is running right now? How many retries has step X -done? How long did step Y take? None of these are answerable today without bespoke instrumentation. - -Temporal exposes a full Event History for every workflow execution. go-workflow should offer a -lightweight equivalent: a global `EventSink` that receives structured events for every step -lifecycle transition. - -## What Changes - -- A `WorkflowEvent` struct capturing step identity, event type, attempt, error, and timestamp. -- An `EventSink` interface (or a simple `func`) that the `Workflow` calls on every transition. -- A field on `Workflow` to register the sink. - -## Capabilities - -### New Capabilities - -- **Structured events**: every meaningful transition emits a `WorkflowEvent`: - - `Scheduled` — step is ready to run (all upstreams terminated, condition evaluated to Running) - - `Started` — goroutine launched, `Do()` about to be called - - `Retrying` — `Do()` returned an error, backoff is sleeping before next attempt - - `Succeeded` / `Failed` / `Canceled` / `Skipped` — terminal transitions - - `HeartbeatReceived` — if heartbeat feature lands (see heartbeat-and-liveness change) - -- **EventSink integration**: `Workflow.EventSink` is a function `func(WorkflowEvent)` (or an - interface). Simple function type avoids an extra abstraction layer and is trivially composable - (fan-out = call multiple funcs). - -- **Zero-cost when unset**: if `EventSink` is nil, no allocations occur on the hot path. - -- **Out-of-box adapters (separate package or examples)**: - - `slog` adapter — logs each event as a structured log line - - OpenTelemetry span adapter — wraps each step attempt in a trace span - - Prometheus metrics adapter — increments counters and records histograms - -### Example sketch - -```go -w := &flow.Workflow{ - EventSink: func(e flow.WorkflowEvent) { - slog.Info("step event", - "step", flow.String(e.Step), - "event", e.Type, - "attempt", e.Attempt, - "err", e.Err, - "duration", e.Duration, - ) - }, -} -``` - -### Open Questions - -- `WorkflowEvent.Step` is a `Steper` (interface/pointer). For logging we need a stable string - name. Should `WorkflowEvent` also carry a pre-computed `StepName string` (from `flow.String()`)? - Probably yes, to avoid callers doing it themselves. - -- Should `Retrying` carry the backoff duration so callers can log "retrying in 2s"? - -- Should the sink be called synchronously on the step goroutine, or dispatched async? - Synchronous is simpler and predictable; async risks hiding slow sinks. - -## Impact - -- New `WorkflowEvent` struct and `EventType` constants. -- `Workflow` struct — add `EventSink func(WorkflowEvent)` field. -- `workflow.go` — call sink at each status transition (in `tick` and `runStep`). -- New spec: `openspec/specs/event-sink/spec.md`. -- No breaking changes. diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/specs/step-interceptor/spec.md b/openspec/changes/archive/2026-05-06-structured-event-sink/specs/step-interceptor/spec.md deleted file mode 100644 index 0eb3e9d..0000000 --- a/openspec/changes/archive/2026-05-06-structured-event-sink/specs/step-interceptor/spec.md +++ /dev/null @@ -1,137 +0,0 @@ -# Step Interceptor Spec - -## Overview - -Two-layer interceptor system for global structured observability in go-workflow. -Registered on `Workflow`; applies to all steps automatically. - -## Types - -### StepInterceptor - -Wraps the full lifecycle of one step execution (all retry attempts). - -```go -type StepInterceptor interface { - InterceptStep(ctx context.Context, info StepInfo, next func(context.Context) error) error -} -type StepInterceptorFunc func(ctx context.Context, info StepInfo, next func(context.Context) error) error -``` - -- Called once per step regardless of retry count -- `info.TerminalReason != Pending` means step is Skipped/Canceled; **must not** call `next` -- `next` calls into the retry loop → AttemptInterceptors → BeforeStep → Do → AfterStep - -### AttemptInterceptor - -Wraps each individual attempt (`BeforeStep → Do → AfterStep`). - -```go -type AttemptInterceptor interface { - InterceptAttempt(ctx context.Context, info AttemptInfo, next func(context.Context) error) error -} -type AttemptInterceptorFunc func(ctx context.Context, info AttemptInfo, next func(context.Context) error) error -``` - -- Called once per attempt (including retried attempts) -- `info.Attempt` is 0-indexed; increments after each attempt - -### StepInfo / AttemptInfo - -```go -type StepInfo struct { - Step Steper // canonical identifier (same pointer as Workflow map key) - TerminalReason StepStatus // Pending = will execute; Skipped/Canceled = will not -} -type AttemptInfo struct { - StepInfo - Attempt uint64 -} -``` - -Callers wanting a human-readable name call `flow.String(info.Step)`. - -### EventType / WorkflowEvent - -```go -type EventType string -const ( - Scheduled EventType = "Scheduled" - Started EventType = "Started" - Retrying EventType = "Retrying" - EventSucceeded EventType = "Succeeded" - EventFailed EventType = "Failed" - EventCanceled EventType = "Canceled" - EventSkipped EventType = "Skipped" -) - -type WorkflowEvent struct { - Step Steper - Type EventType - Attempt uint64 - Err error - Duration time.Duration - BackoffDuration time.Duration // non-zero only for Retrying -} -``` - -`EventType` is a distinct named type from `StepStatus`. Terminal `EventType` constants are -prefixed with `Event` to avoid redeclaration conflicts with `StepStatus` constants. - -### InterceptorReceiver - -```go -type InterceptorReceiver interface { - PrependInterceptors(step []StepInterceptor, attempt []AttemptInterceptor) -} -``` - -Steps embedding `SubWorkflow` implement this interface. `stepExecution` calls it in -`executeWithRetry` (once per step, before the retry loop) to propagate parent interceptors. - -## Workflow Integration - -```go -type Workflow struct { - // ... existing fields ... - StepInterceptors []StepInterceptor // [0] outermost, [len-1] innermost - AttemptInterceptors []AttemptInterceptor // [0] outermost; BeforeStep/AfterStep always innermost -} -``` - -Zero-value safe: nil slices mean no interceptors; existing behaviour is unchanged. - -## Built-in Adapters - -```go -func NewStepEventSink(sink func(WorkflowEvent)) StepInterceptor -func NewAttemptEventSink(sink func(WorkflowEvent)) AttemptInterceptor -``` - -`NewStepEventSink` emits: `Scheduled` → (Retrying events via side-channel) → terminal event. -`NewAttemptEventSink` emits: `Started` per attempt. - -`NewStepEventSink` also implements the package-private `retryNotifier` interface so `wireNotify` -can deliver `Retrying` events (which bypass the chain) to the sink. - -## Execution Stack - -``` -StepInterceptor[0] - └── StepInterceptor[1] - └── [retry loop — Notify wired to onRetry] - └── AttemptInterceptor[0] - └── AttemptInterceptor[1] - └── BeforeStep callbacks (from StepConfig) - └── step.Do(ctx) - └── AfterStep callbacks -``` - -## Invariants - -- `StepInterceptor` fires exactly once per step execution -- `AttemptInterceptor` fires exactly once per attempt -- `Retrying` event `Attempt` field matches the attempt that just failed (0-indexed) -- `SubWorkflow` parent interceptors execute outside child interceptors -- `PrependInterceptors` called once per step (in `executeWithRetry`), not per attempt -- `State.Option()` allocates a fresh `*StepOption` + `*RetryOption` each call — `wireNotify` mutations are safe and do not persist across `Reset()`+`Do()` runs diff --git a/openspec/changes/archive/2026-05-06-structured-event-sink/tasks.md b/openspec/changes/archive/2026-05-06-structured-event-sink/tasks.md deleted file mode 100644 index b5b3bed..0000000 --- a/openspec/changes/archive/2026-05-06-structured-event-sink/tasks.md +++ /dev/null @@ -1,33 +0,0 @@ -# Tasks: structured-event-sink - -## Implementation - -- [x] Define public types in `event.go` (`EventType`, `WorkflowEvent`, `StepInterceptor`, `AttemptInterceptor`, `StepInterceptorFunc`, `AttemptInterceptorFunc`, `StepInfo`, `AttemptInfo`, `InterceptorReceiver`, `retryNotifier`) -- [x] Implement `NewStepEventSink` and `NewAttemptEventSink` in `event.go` -- [x] Add `StepInterceptors`/`AttemptInterceptors` fields to `Workflow` struct -- [x] Introduce `stepExecution` struct; simplify `tick()` to only claim step via `scheduled` sentinel -- [x] Implement `stepExecution.run()`, `executeWithRetry()`, `buildAttemptChain()`, `runAttempt()`, `wireNotify()` -- [x] Delete `makeDoForStep()` and `runStep()` from `workflow.go` -- [x] Implement `SubWorkflow.PrependInterceptors` in `wrap.go` - -## Tests - -- [x] Unit tests for `EventType` constants and `StepInterceptorFunc`/`AttemptInterceptorFunc` adapters -- [x] Unit tests for `NewStepEventSink` (Succeeded, Failed, Skipped, OnRetry) -- [x] Unit tests for `NewAttemptEventSink` (Started event) -- [x] Integration test: basic step success with StepInterceptor -- [x] Integration test: StepInterceptor chain ordering (A→B→B→A) -- [x] Integration test: AttemptInterceptor chain ordering (X→Y→Y→X) -- [x] Integration test: Skipped step enters interceptor chain with TerminalReason -- [x] Integration test: Retrying events with correct attempt numbers -- [x] Integration test: SubWorkflow interceptor propagation -- [x] Integration test: child interceptor preserved alongside parent -- [x] Integration test: `PrependInterceptors` not duplicated on retry (`TestSubWorkflow_InterceptorNotDuplicatedOnRetry`) -- [x] Regression test: zero-interceptor workflow unchanged -- [x] Race detector clean (`go test -race ./...`) - -## Bug Fixes (found during review) - -- [x] Fix C1: `PrependInterceptors` moved from `runAttempt` (per-attempt) to `executeWithRetry` (once per step) -- [x] Fix wireNotify timing: `Retrying.Attempt` uses `ex.attempt - 1` (defer in `runAttempt` fires before Notify) -- [x] Fix `EventType` to be a distinct named type (`type EventType string`), not a type alias From c86e8627691c49b0566fdf2c796f8d91608612ed Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Thu, 7 May 2026 02:30:31 +0000 Subject: [PATCH 27/29] clarify per-iteration locals in interceptor chain builders + UT guards MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Copilot reported a suspected loop-variable-capture bug in the chain builders: it worried that `next := stepNext` (and `next := chain`) might be reused across iterations and cause the closure to recurse on itself or call the wrong interceptor. It is NOT a bug. `ic` and `next` are declared inside the loop body with `:=`, so each iteration produces fresh variables and the closure captures each iteration's instance independently. This was already implicitly covered by TestStepExecution_StepInterceptorOrder and TestStepExecution_AttemptInterceptorOrder (both use 2 ICs and verify exact ordering — a capture bug would have failed them). Even so: - Add TestStepExecution_StepInterceptorChain_NoVariableCapture (4 ICs, asserts exact A→B→C→D→D→C→B→A nesting and that the inner step runs exactly once — would catch self-recursion or reordering). - Add TestStepExecution_AttemptInterceptorChain_NoVariableCapture (3 ICs × 3 retried attempts, asserts the full 18-event sequence). - Rename the loop-local `next` to `nextLocal` and add a comment explicitly noting the per-iteration scoping, so future reviewers don't have to re-derive the safety of the closure. - Drop the redundant `icLocal := ic` in buildAttemptChain (ic is already loop-body-local), unifying style with the StepInterceptor chain builder. - Use assert.Equalf in TestSubWorkflow_PrependInterceptorsIdempotentAcrossDo so the printf-style message actually formats. Co-Authored-By: Claude Sonnet 4.6 --- openspec/specs/step-interceptor/spec.md | 50 ++++++++++++++--- workflow.go | 20 ++++--- workflow_test.go | 72 +++++++++++++++++++++++++ wrap_test.go | 2 +- 4 files changed, 129 insertions(+), 15 deletions(-) diff --git a/openspec/specs/step-interceptor/spec.md b/openspec/specs/step-interceptor/spec.md index 669ce45..e061ef2 100644 --- a/openspec/specs/step-interceptor/spec.md +++ b/openspec/specs/step-interceptor/spec.md @@ -136,11 +136,16 @@ type InterceptorReceiver interface { ``` `stepExecution` calls `PrependInterceptors` exactly once per step, in `executeWithRetry` -before the retry loop begins. The implementation SHALL use `make`+`copy` to construct fresh -slices, so: +before the retry loop begins. Inheritance is **per-run scoped**: -- Parent backing arrays are never aliased. -- Repeated `Do()` runs (across `Reset()` cycles) do not accumulate prepended interceptors. +- The user-supplied `StepInterceptors` / `AttemptInterceptors` slices SHALL NOT be mutated. +- The inherited prefix SHALL be stored on private `inheritedStep` / `inheritedAttempt` + fields and combined with the base only when constructing the run-time chain. +- The inherited fields SHALL be cleared via `defer` at the start of every `Do()` so all + exit paths (success, preflight error, panic) reset the per-run state. +- The public `Reset()` method SHALL also clear the inherited fields. The internal + `reset()` (called by `Do()` itself) SHALL NOT, since clearing there would wipe the + prefix the parent just wrote and break inheritance. `SubWorkflow.PrependInterceptors` SHALL delegate to the embedded `Workflow.PrependInterceptors`. @@ -160,9 +165,11 @@ slices, so: - **WHEN** a sub-workflow step is retried N times - **THEN** parent interceptors are prepended exactly once, not N times -#### Scenario: PrependInterceptors does not accumulate across Reset -- **WHEN** a workflow that received prepended interceptors is reset and run again as a child -- **THEN** the parent's interceptors are present exactly once, not duplicated +#### Scenario: PrependInterceptors does not accumulate across repeated Do() runs +- **GIVEN** a parent containing a child sub-workflow +- **WHEN** the parent's `Do()` is invoked N times in succession +- **THEN** each invocation results in the parent's interceptors firing exactly once per + step (no compounding across runs) --- @@ -203,3 +210,32 @@ regardless of interceptor behaviour. #### Scenario: Attempt counter increments even when interceptor short-circuits - **WHEN** an `AttemptInterceptor` returns without calling `next` - **THEN** the next attempt (if retried) still receives `attempt = previous + 1` + +--- + +### Requirement: DontPanic protects interceptor panics + +When `Workflow.DontPanic` is `true`, panics raised inside user-provided `StepInterceptor` +or `AttemptInterceptor` implementations SHALL be caught and converted to errors using the +same `catchPanicAsError` mechanism already applied to `Before` / `Do` / `After`. This +prevents: + +- Process crashes from a faulty user interceptor. +- `MaxConcurrency` lease leaks (an unrecovered panic skips the deferred `unlease`). +- Loss of `signalStatusChange`, which would otherwise hang the main `Do()` loop. + +When `DontPanic` is `false` (the default), interceptor panics propagate as in normal Go +semantics. + +#### Scenario: Panicking StepInterceptor under DontPanic +- **GIVEN** a Workflow with `DontPanic = true` and a `StepInterceptor` that panics +- **WHEN** the Workflow runs +- **THEN** `Do()` returns an error within a bounded time +- **AND** the step's `StepResult.Err` carries the panic value +- **AND** the workflow does not hang waiting for a status signal + +#### Scenario: Panicking AttemptInterceptor under DontPanic +- **GIVEN** a Workflow with `DontPanic = true` and an `AttemptInterceptor` that panics +- **WHEN** the Workflow runs +- **THEN** `Do()` returns an error within a bounded time +- **AND** the step's `StepResult.Err` carries the panic value diff --git a/workflow.go b/workflow.go index bcaa5ab..0ccf6a4 100644 --- a/workflow.go +++ b/workflow.go @@ -540,15 +540,19 @@ func (ex *stepExecution) run(ctx context.Context) { stepNext := func(ctx context.Context) error { return ex.executeWithRetry(ctx) } stepICs := ex.w.effectiveStepInterceptors() for i := len(stepICs) - 1; i >= 0; i-- { + // ic and nextLocal are declared inside the loop body with :=, so they + // are fresh variables on every iteration and the closure below captures + // each iteration's instance independently. The explicit naming is to + // make the per-iteration scoping obvious to readers. ic := stepICs[i] - next := stepNext + nextLocal := stepNext stepNext = func(ctx context.Context) error { if ex.w.DontPanic { return catchPanicAsError(func() error { - return ic.InterceptStep(ctx, ex.step, next) + return ic.InterceptStep(ctx, ex.step, nextLocal) }) } - return ic.InterceptStep(ctx, ex.step, next) + return ic.InterceptStep(ctx, ex.step, nextLocal) } } @@ -603,16 +607,18 @@ func (ex *stepExecution) buildAttemptChain() func(context.Context) error { } attemptICs := ex.w.effectiveAttemptInterceptors() for i := len(attemptICs) - 1; i >= 0; i-- { + // ic and nextLocal are declared inside the loop body with :=, so they + // are fresh variables on every iteration and the closure below captures + // each iteration's instance independently. ic := attemptICs[i] - next := chain - icLocal := ic + nextLocal := chain chain = func(ctx context.Context) error { if ex.w.DontPanic { return catchPanicAsError(func() error { - return icLocal.InterceptAttempt(ctx, ex.step, ex.attempt, next) + return ic.InterceptAttempt(ctx, ex.step, ex.attempt, nextLocal) }) } - return icLocal.InterceptAttempt(ctx, ex.step, ex.attempt, next) + return ic.InterceptAttempt(ctx, ex.step, ex.attempt, nextLocal) } } // Wrap the full attempt chain (including interceptors) so ex.attempt is always diff --git a/workflow_test.go b/workflow_test.go index 8bb3a17..48e4958 100644 --- a/workflow_test.go +++ b/workflow_test.go @@ -342,6 +342,78 @@ func TestStepExecution_StepInterceptorOrder(t *testing.T) { assert.Equal(t, []string{"A:before", "B:before", "B:after", "A:after"}, order) } +// TestStepExecution_StepInterceptorChain_NoVariableCapture guards against the +// classic Go closure-over-loop-variable bug in the chain builder. With 3+ +// interceptors, a buggy closure would either reorder calls, call the same +// interceptor multiple times, or self-recurse via `next`. We verify (a) the +// exact order of before/after across 4 interceptors, (b) the inner step runs +// exactly once, and (c) no stack explosion. +func TestStepExecution_StepInterceptorChain_NoVariableCapture(t *testing.T) { + t.Parallel() + var order []string + var stepRan int + makeIC := func(name string) StepInterceptor { + return StepInterceptorFunc(func(ctx context.Context, s Steper, next func(context.Context) error) error { + order = append(order, name+":before") + err := next(ctx) + order = append(order, name+":after") + return err + }) + } + step := Func("s", func(ctx context.Context) error { + stepRan++ + return nil + }) + w := &Workflow{ + StepInterceptors: []StepInterceptor{makeIC("A"), makeIC("B"), makeIC("C"), makeIC("D")}, + } + w.Add(Step(step)) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, 1, stepRan, "inner step must run exactly once") + assert.Equal(t, []string{ + "A:before", "B:before", "C:before", "D:before", + "D:after", "C:after", "B:after", "A:after", + }, order) +} + +// TestStepExecution_AttemptInterceptorChain_NoVariableCapture mirrors the above +// for AttemptInterceptors. With retries, the chain is built once but invoked +// per attempt — any closure capture bug would surface as wrong order, missing +// before/after pairs, or recursion. +func TestStepExecution_AttemptInterceptorChain_NoVariableCapture(t *testing.T) { + t.Parallel() + var order []string + makeIC := func(name string) AttemptInterceptor { + return AttemptInterceptorFunc(func(ctx context.Context, s Steper, attempt uint64, next func(context.Context) error) error { + order = append(order, fmt.Sprintf("%s:before:%d", name, attempt)) + err := next(ctx) + order = append(order, fmt.Sprintf("%s:after:%d", name, attempt)) + return err + }) + } + calls := 0 + step := Func("s", func(ctx context.Context) error { + calls++ + if calls < 3 { + return errors.New("boom") + } + return nil + }) + w := &Workflow{ + AttemptInterceptors: []AttemptInterceptor{makeIC("X"), makeIC("Y"), makeIC("Z")}, + } + w.Add(Step(step).Retry(func(o *RetryOption) { + o.Attempts = 3 + o.Backoff = &backoff.ZeroBackOff{} + })) + assert.NoError(t, w.Do(context.Background())) + assert.Equal(t, []string{ + "X:before:0", "Y:before:0", "Z:before:0", "Z:after:0", "Y:after:0", "X:after:0", + "X:before:1", "Y:before:1", "Z:before:1", "Z:after:1", "Y:after:1", "X:after:1", + "X:before:2", "Y:before:2", "Z:before:2", "Z:after:2", "Y:after:2", "X:after:2", + }, order) +} + func TestStepExecution_AttemptInterceptorOrder(t *testing.T) { t.Parallel() var order []string diff --git a/wrap_test.go b/wrap_test.go index 4f1d45f..6d8bd0d 100644 --- a/wrap_test.go +++ b/wrap_test.go @@ -315,7 +315,7 @@ func TestSubWorkflow_PrependInterceptorsIdempotentAcrossDo(t *testing.T) { // reset both parent and child step states so the workflow is re-runnable assert.NoError(t, parent.Reset()) assert.NoError(t, parent.Do(context.Background())) - assert.Equal(t, int32(2), count.Load(), + assert.Equalf(t, int32(2), count.Load(), "run %d: parent interceptor must fire exactly 2 times (outer sub + inner), accumulation detected", i) } } From 561ebf23f43981f3ed2ffe87052188e0c63e5c82 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Thu, 7 May 2026 02:32:19 +0000 Subject: [PATCH 28/29] restore #73 archived change accidentally deleted MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md and openspec/changes/archive/2026-05-05-errworkflow-execution-order/ were introduced by PR #73 (sort ErrWorkflow by FinishedAt) and merged into main. The previous cleanup commit removed them by mistake — my diff-against-main check listed them as additions on this branch, but they were genuinely on main as well. Restore from 2c88ec2. Co-Authored-By: Claude Sonnet 4.6 --- .../2026-05-05-errworkflow-execution-order.md | 567 ++++++++++++++++++ .../.openspec.yaml | 2 + .../design.md | 54 ++ .../proposal.md | 27 + .../specs/execution-model/spec.md | 73 +++ .../tasks.md | 31 + 6 files changed, 754 insertions(+) create mode 100644 docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md create mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/.openspec.yaml create mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/design.md create mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/proposal.md create mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/specs/execution-model/spec.md create mode 100644 openspec/changes/archive/2026-05-05-errworkflow-execution-order/tasks.md diff --git a/docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md b/docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md new file mode 100644 index 0000000..ac38abe --- /dev/null +++ b/docs/superpowers/plans/2026-05-05-errworkflow-execution-order.md @@ -0,0 +1,567 @@ +# ErrWorkflow Execution Order Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Make `ErrWorkflow.Error()` output steps in execution-finish order by adding a `FinishedAt time.Time` field to `StepResult` and sorting on output. + +**Architecture:** Add `FinishedAt` to the public `StepResult` struct; record it via a new `State.SetFinishedAt` setter in the single step-goroutine defer that already calls `SetStatus`/`SetError`; for condition-skipped steps record it at the `state.SetStatus(nextStatus)` site in `tick()`; sort in `ErrWorkflow.Error()` and `Unwrap()` using a shared helper. + +**Tech Stack:** Go stdlib (`sort`, `time`), `github.com/benbjohnson/clock` (already a project dependency, used for testable time) + +--- + +## Files + +| File | Change | +|------|--------| +| `error.go` | Add `FinishedAt time.Time` to `StepResult`; add `sort` import; add `sortedSteps` helper; rewrite `Error()` and `Unwrap()` | +| `state.go` | Add `SetFinishedAt(time.Time)` and `GetFinishedAt() time.Time` methods to `State` | +| `workflow.go` | Call `state.SetFinishedAt(w.Clock.Now())` in the step-goroutine defer and in the condition-skip branch of `tick()` | +| `error_test.go` | Add tests for `ErrWorkflow` ordering | +| `execution_model_test.go` | Add test that `StepResult.FinishedAt` is populated after execution | +| `condition_test.go` | Update `StepResult{...}` literals to use field names | + +--- + +## Task 1: Add `FinishedAt` to `StepResult` + +**Files:** +- Modify: `error.go` (around line 106 — the `StepResult` struct) + +- [ ] **Step 1: Add the field** + + In `error.go`, update `StepResult`: + + ```go + // StepResult contains the status and error of a Step. + type StepResult struct { + Status StepStatus + Err error + FinishedAt time.Time + } + ``` + + Also add `"time"` to the import block at the top of `error.go`: + + ```go + import ( + "fmt" + "runtime" + "sort" + "strings" + "time" + ) + ``` + +- [ ] **Step 2: Verify it compiles** + + ```bash + go build ./... + ``` + + Expected: compile error in `condition_test.go` about `StepResult` composite literal — **this is expected**, we'll fix it in Task 4. + +--- + +## Task 2: Add `SetFinishedAt` / `GetFinishedAt` to `State` + +**Files:** +- Modify: `state.go` (after `SetError`/`GetError` around line 33) + +- [ ] **Step 1: Add setter and getter** + + In `state.go`, after the `SetError` method, add: + + ```go + func (s *State) GetFinishedAt() time.Time { + s.RLock() + defer s.RUnlock() + return s.FinishedAt + } + func (s *State) SetFinishedAt(t time.Time) { + s.Lock() + defer s.Unlock() + s.FinishedAt = t + } + ``` + + Add `"time"` to the import block in `state.go`: + + ```go + import ( + "context" + "sync" + "time" + ) + ``` + +- [ ] **Step 2: Verify it compiles (ignoring test errors)** + + ```bash + go build ./... + ``` + + Expected: same compile errors as before in tests only. Core package builds. + +--- + +## Task 3: Record `FinishedAt` at Step Termination + +**Files:** +- Modify: `workflow.go` + +There are **two termination sites** to update: + +### Site A — step goroutine defer (running steps) + +Around line 402–406 in `workflow.go`: + +```go +defer func() { + state.SetStatus(status) + state.SetError(err) + w.unlease() + w.signalStatusChange() +}() +``` + +### Site B — condition-skip in `tick()` (steps skipped before running) + +Around line 387–393: + +```go +if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { + state.SetStatus(nextStatus) + w.waitGroup.Add(1) + go func() { + defer w.waitGroup.Done() + w.signalStatusChange() + }() + continue +} +``` + +- [ ] **Step 1: Update the step-goroutine defer (Site A)** + + Change the defer to record `FinishedAt` before setting status: + + ```go + defer func() { + state.SetFinishedAt(w.Clock.Now()) + state.SetStatus(status) + state.SetError(err) + w.unlease() + w.signalStatusChange() + }() + ``` + +- [ ] **Step 2: Update the condition-skip branch (Site B)** + + Change the condition-skip block to also record `FinishedAt`: + + ```go + if nextStatus := cond(ctx, ups); nextStatus.IsTerminated() { + state.SetFinishedAt(w.Clock.Now()) + state.SetStatus(nextStatus) + w.waitGroup.Add(1) + go func() { + defer w.waitGroup.Done() + w.signalStatusChange() + }() + continue + } + ``` + +- [ ] **Step 3: Verify it compiles** + + ```bash + go build ./... + ``` + + Expected: same test-only errors, core package builds. + +--- + +## Task 4: Fix `condition_test.go` Struct Literal + +**Files:** +- Modify: `condition_test.go` (around line 28) + +The composite literal `flow.StepResult{Status: ..., Err: ...}` already uses field names, so it will compile without changes. Verify this: + +- [ ] **Step 1: Check the literal** + + ```bash + grep -n "StepResult{" condition_test.go + ``` + + Expected output: + ``` + 28: ups[s] = flow.StepResult{ + ``` + + The block at line 28–31 uses named fields (`Status:`, `Err:`), so it is fine. No edit needed. + +- [ ] **Step 2: Verify tests compile** + + ```bash + go test -run NOMATCH ./... 2>&1 | grep -v "^ok" + ``` + + Expected: no compile errors. + +--- + +## Task 5: Implement `sortedSteps` and Rewrite `Error()` / `Unwrap()` + +**Files:** +- Modify: `error.go` + +- [ ] **Step 1: Write a failing test first** + + In `error_test.go`, add: + + ```go + func TestErrWorkflowErrorOrdering(t *testing.T) { + t.Run("sorted by FinishedAt ascending", func(t *testing.T) { + now := time.Now() + type namedStep struct{ name string } + // We need actual Steper values; use the existing test helpers. + // Build ErrWorkflow manually with known FinishedAt values. + a := &namedStep{"A-step"} + b := &namedStep{"B-step"} + c := &namedStep{"C-step"} + + // Use a real workflow so steps are proper Steper instances. + // Instead, we test via a real workflow run with mock clock. + mockClock := clock.NewMock() + w := &flow.Workflow{Clock: mockClock} + + errA := fmt.Errorf("A failed") + errB := fmt.Errorf("B failed") + errC := fmt.Errorf("C failed") + _ = a; _ = b; _ = c; _ = errA; _ = errB; _ = errC; _ = now + // Real ordering test is in TestErrWorkflowOrderingIntegration below. + _ = w + }) + } + ``` + + > Actually, skip the manual struct test — the integration test below is more meaningful. Remove the stub above and add only the integration test. + + Replace the above with this in `error_test.go`: + + ```go + import ( + "context" + "fmt" + "strings" + "testing" + "time" + + flow "github.com/Azure/go-workflow" + "github.com/benbjohnson/clock" + "github.com/stretchr/testify/assert" + "github.com/stretchr/testify/require" + ) + + // serialStep is a step that signals when it starts and waits to be released. + type serialStep struct { + name string + started chan struct{} + release chan struct{} + err error + } + + func newSerialStep(name string, err error) *serialStep { + return &serialStep{name: name, started: make(chan struct{}, 1), release: make(chan struct{}), err: err} + } + func (s *serialStep) Do(_ context.Context) error { + s.started <- struct{}{} + <-s.release + return s.err + } + func (s *serialStep) String() string { return s.name } + + func TestErrWorkflowOutputOrdering(t *testing.T) { + // Build a 3-step serial chain: A -> B -> C + // Step names are chosen so alphabetical != execution order: C, A, B + mockClock := clock.NewMock() + stepC := newSerialStep("C-first", fmt.Errorf("C failed")) + stepA := newSerialStep("A-second", fmt.Errorf("A failed")) + stepB := newSerialStep("B-third", fmt.Errorf("B failed")) + + w := &flow.Workflow{Clock: mockClock} + w.Add( + flow.Step(stepC), + flow.Step(stepA).DependsOn(stepC), + flow.Step(stepB).DependsOn(stepA), + ) + + done := make(chan error, 1) + go func() { done <- w.Do(context.Background()) }() + + // C runs first — let it finish + <-stepC.started + mockClock.Add(time.Second) + close(stepC.release) + + // A runs second + <-stepA.started + mockClock.Add(time.Second) + close(stepA.release) + + // B runs third + <-stepB.started + mockClock.Add(time.Second) + close(stepB.release) + + err := <-done + require.Error(t, err) + + var errW flow.ErrWorkflow + require.ErrorAs(t, err, &errW) + + output := errW.Error() + posC := strings.Index(output, "C-first") + posA := strings.Index(output, "A-second") + posB := strings.Index(output, "B-third") + + assert.Greater(t, posA, posC, "A-second should appear after C-first in output") + assert.Greater(t, posB, posA, "B-third should appear after A-second in output") + } + + func TestErrWorkflowTieBreakByName(t *testing.T) { + // Two steps with identical FinishedAt → sort by name + mockClock := clock.NewMock() + now := mockClock.Now() + + e := flow.ErrWorkflow{ + // We can't easily construct Steper keys without running a workflow. + // Test via integration: two parallel steps finishing at same clock tick. + } + _ = e; _ = now + // See TestErrWorkflowTieBreakIntegration below. + } + + func TestErrWorkflowTieBreakIntegration(t *testing.T) { + // Two parallel steps, both fail at the same clock tick → output is alphabetical. + mockClock := clock.NewMock() + stepZ := newSerialStep("Z-step", fmt.Errorf("Z failed")) + stepA := newSerialStep("A-step", fmt.Errorf("A failed")) + + w := &flow.Workflow{Clock: mockClock} + w.Add(flow.Step(stepZ), flow.Step(stepA)) + + done := make(chan error, 1) + go func() { done <- w.Do(context.Background()) }() + + // Both steps start in parallel; release them before advancing clock + <-stepZ.started + <-stepA.started + // Advance clock THEN release — both get same timestamp + mockClock.Add(time.Second) + close(stepZ.release) + close(stepA.release) + + err := <-done + require.Error(t, err) + + var errW flow.ErrWorkflow + require.ErrorAs(t, err, &errW) + + output := errW.Error() + posA := strings.Index(output, "A-step") + posZ := strings.Index(output, "Z-step") + assert.Less(t, posA, posZ, "A-step should appear before Z-step (tie-break by name)") + } + ``` + +- [ ] **Step 2: Run test to verify it fails** + + ```bash + go test -run "TestErrWorkflow" ./... -v 2>&1 | tail -20 + ``` + + Expected: FAIL — output ordering assertions fail (map iteration is random). + +- [ ] **Step 3: Implement `sortedSteps` and update `Error()` / `Unwrap()`** + + In `error.go`, add the helper and update both methods: + + ```go + // sortedSteps returns the steps in ErrWorkflow sorted by FinishedAt ascending. + // Steps with zero FinishedAt (never ran) sort last. + // Tie-break: lexicographic order of String(step). + func sortedSteps(e ErrWorkflow) []Steper { + steps := make([]Steper, 0, len(e)) + for step := range e { + steps = append(steps, step) + } + sort.Slice(steps, func(i, j int) bool { + ti := e[steps[i]].FinishedAt + tj := e[steps[j]].FinishedAt + zeroI := ti.IsZero() + zeroJ := tj.IsZero() + if zeroI != zeroJ { + return !zeroI // non-zero before zero + } + if !ti.Equal(tj) { + return ti.Before(tj) + } + return String(steps[i]) < String(steps[j]) + }) + return steps + } + + func (e ErrWorkflow) Unwrap() []error { + steps := sortedSteps(e) + rv := make([]error, 0, len(e)) + for _, step := range steps { + rv = append(rv, e[step].Err) + } + return rv + } + + // ErrWorkflow will be printed as: + // + // Step: [Status] + // error message + func (e ErrWorkflow) Error() string { + var builder strings.Builder + for _, step := range sortedSteps(e) { + builder.WriteString(fmt.Sprintf("%s: ", String(step))) + builder.WriteString(fmt.Sprintln(e[step].Error())) + } + return builder.String() + } + ``` + +- [ ] **Step 4: Run the ordering tests** + + ```bash + go test -run "TestErrWorkflow" ./... -v 2>&1 | tail -30 + ``` + + Expected: all pass. + +- [ ] **Step 5: Commit** + + ```bash + git add error.go state.go workflow.go error_test.go condition_test.go + git commit -m "feat: add FinishedAt to StepResult, sort ErrWorkflow output by execution order" + ``` + +--- + +## Task 6: Test `FinishedAt` Population in `execution_model_test.go` + +**Files:** +- Modify: `execution_model_test.go` + +- [ ] **Step 1: Write failing test** + + Add to `execution_model_test.go`: + + ```go + func TestStepResultFinishedAtPopulated(t *testing.T) { + mockClock := clock.NewMock() + step := &succeededStep{} // uses the existing test helper in testutil_test.go + w := &Workflow{Clock: mockClock} + w.Add(Step(step)) + + mockClock.Add(time.Second) // advance so FinishedAt is non-zero + err := w.Do(context.Background()) + assert.NoError(t, err) + + state := w.StateOf(step) + result := state.GetStepResult() + assert.False(t, result.FinishedAt.IsZero(), "FinishedAt should be populated after step execution") + assert.Equal(t, mockClock.Now(), result.FinishedAt) + } + ``` + + Check what helpers exist in `testutil_test.go`: + + ```bash + grep -n "succeededStep\|failedStep\|type.*Step" testutil_test.go | head -20 + ``` + + Adjust the step type name based on what you see. + +- [ ] **Step 2: Run to verify it fails** + + ```bash + go test -run "TestStepResultFinishedAtPopulated" ./... -v + ``` + + Expected: FAIL — `FinishedAt` is zero because we haven't wired it up yet. + + > **Note:** If Task 3 is already done, this test may already pass. In that case, skip to Step 4. + +- [ ] **Step 3: Verify wiring from Task 3 makes it pass** + + The `SetFinishedAt` calls added in Task 3 should make this pass. Run: + + ```bash + go test -run "TestStepResultFinishedAtPopulated" ./... -v + ``` + + Expected: PASS. + +- [ ] **Step 4: Commit** + + ```bash + git add execution_model_test.go + git commit -m "test: verify StepResult.FinishedAt is populated after step execution" + ``` + +--- + +## Task 7: Full Test Suite and Final Verification + +- [ ] **Step 1: Run all tests** + + ```bash + go test ./... -count=1 + ``` + + Expected: all pass, no failures. + +- [ ] **Step 2: Run vet** + + ```bash + go vet ./... + ``` + + Expected: no output (no issues). + +- [ ] **Step 3: Run build** + + ```bash + go build ./... + ``` + + Expected: no output (clean build). + +- [ ] **Step 4: Final commit if anything was adjusted** + + ```bash + git status + ``` + + If there are uncommitted changes: + + ```bash + git add -p + git commit -m "fix: address review feedback" + ``` + +--- + +## Self-Review Notes + +- **`condition_test.go`**: The literal at line 28 already uses named fields (`Status:`, `Err:`), so it compiles without edits. The plan reflects this (Task 4 is a verify-only step). +- **Clock timing in tests**: `clock.NewMock()` starts at a fixed non-zero time, so `mockClock.Now()` after `Add(time.Second)` gives a consistent value. The test in Task 6 advances before the run — but `FinishedAt` is recorded *during* the run at whatever `Clock.Now()` returns then. Adjust the assertion to `assert.False(t, result.FinishedAt.IsZero())` if the exact value is hard to pin. +- **Parallel steps tie-break test**: both goroutines call `w.Clock.Now()` in their defers. With `clock.Mock`, concurrent calls return the same value, which is exactly what we need for the tie-break test. +- **`StateOf` visibility**: `w.StateOf(step)` is used in existing tests — it's an exported method so it's accessible from `_test` package. diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/.openspec.yaml b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/.openspec.yaml new file mode 100644 index 0000000..eebe4d8 --- /dev/null +++ b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-05-05 diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/design.md b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/design.md new file mode 100644 index 0000000..bcdc89b --- /dev/null +++ b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/design.md @@ -0,0 +1,54 @@ +## Context + +`ErrWorkflow` is defined as `map[Steper]StepResult`. The `Error()` method iterates this map directly, producing non-deterministic output. `StepResult` currently holds only `Status StepStatus` and `Err error` — no timing information. + +The workflow already has an injected `clock.Clock` field (from `github.com/benbjohnson/clock`) used for retry/timeout, so clock-based timestamping is available without new dependencies. + +Step goroutines terminate via a shared `defer` block in `workflow.go` that calls `state.SetStatus(status)` and `state.SetError(err)` before signalling. This is the single canonical termination point — the right place to record `FinishedAt`. + +## Goals / Non-Goals + +**Goals:** +- `ErrWorkflow.Error()` produces deterministic, execution-finish-time-ordered output. +- `ErrWorkflow.Unwrap()` returns errors in the same order. +- `StepResult.FinishedAt` is populated for all terminated steps and available to `Condition` functions and external observers. +- Tests remain deterministic via the existing `clock.Clock` injection. + +**Non-Goals:** +- Recording `StartedAt` — out of scope for this change. +- Changing the `ErrWorkflow` underlying type from `map` to an ordered structure — the map stays; sorting happens only at output time. +- Displaying the timestamp in `Error()` output — ordering is the goal, not showing timestamps to users. + +## Decisions + +### D1: Add `FinishedAt time.Time` to `StepResult` (not to `State` separately) + +`StepResult` is the public snapshot type returned from `GetStepResult()`, passed into `Condition` functions, and embedded in `ErrWorkflow`. Adding the field here makes it available to all consumers with no additional API surface. + +Alternative considered: add a separate `finishedAt` field to `State` only and use it just for sorting in `ErrWorkflow.Error()`. Rejected — this hides useful information from `Condition` authors and duplicates the timestamp concept. + +### D2: Record timestamp at the step goroutine's defer, using `w.Clock.Now()` + +The defer in the step goroutine is the single termination point for all outcomes (success, failure, cancel, panic). Recording `FinishedAt` there — alongside `SetStatus` and `SetError` — is atomic from the workflow's perspective and covers all code paths. + +`w.Clock.Now()` keeps tests deterministic; tests using `clock.NewMock()` already control time for retry/timeout assertions. + +### D3: Sort in `Error()` and `Unwrap()` by `FinishedAt` ascending; zero-time steps last, then by `String(step)` for stability + +Steps that never executed (Skipped by condition before running, Pending) will have zero `FinishedAt`. They sort to the end. Among steps with the same timestamp (possible with mocked clocks or extremely fast steps), `String(step)` provides a stable secondary sort. This matches the mental model: "what ran first, then what failed." + +Alternative: sort only in `Error()`, leave `Unwrap()` unordered. Rejected — consistency between the two methods avoids surprising behavior when callers use `errors.As`/`errors.Is` traversal order for logic. + +### D4: No new exported helper to set `FinishedAt` — set it directly in the workflow goroutine + +`State` already has unexported setters (`SetStatus`, `SetError`). We add `SetFinishedAt(t time.Time)` following the same pattern, called only from the workflow's internal goroutine. This keeps the mutation path narrow. + +## Risks / Trade-offs + +- **Struct literal breakage**: Any code constructing `StepResult{val1, val2}` positionally (without field names) will fail to compile after the new field is added. This is caught at compile time and is trivially fixable. It is unlikely in practice since `StepResult` is a library type. +- **Mock clock in condition tests**: `Condition` unit tests that construct `StepResult` manually (e.g., `condition_test.go`) will need to populate `FinishedAt` explicitly if they care about ordering. Tests that don't check `ErrWorkflow.Error()` output need no changes. +- **Tied to wall clock resolution**: On systems with low-resolution clocks, two steps terminating in the same tick will fall back to name-based ordering. This is acceptable — the output is still deterministic. + +## Migration Plan + +No migration needed. `FinishedAt` is an additive field. Existing `StepResult` values constructed by field name (`StepResult{Status: ..., Err: ...}`) get a zero `FinishedAt` and continue to compile. The sort in `Error()` is purely cosmetic — no behavioral changes to workflow execution. diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/proposal.md b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/proposal.md new file mode 100644 index 0000000..cc0a23a --- /dev/null +++ b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/proposal.md @@ -0,0 +1,27 @@ +## Why + +When a workflow fails, `ErrWorkflow.Error()` outputs steps in random order because `ErrWorkflow` is a `map[Steper]StepResult` and Go map iteration is non-deterministic. This makes failure traces hard to read and impossible to compare across runs, hindering debugging. + +## What Changes + +- Add `FinishedAt time.Time` field to `StepResult` to record when each step terminated. +- Record the finish timestamp (using the workflow's injected `clock.Clock`) in the step goroutine, just before signalling status change. +- `ErrWorkflow.Error()` sorts steps by `FinishedAt` ascending (steps that never ran sort last, then by name for stability). +- `ErrWorkflow.Unwrap()` returns errors in the same sorted order for consistency. + +## Capabilities + +### New Capabilities + +_(none)_ + +### Modified Capabilities + +- `execution-model`: `StepResult` gains a `FinishedAt time.Time` field populated at step termination; `ErrWorkflow.Error()` and `Unwrap()` now produce output in execution-finish order instead of random map iteration order. + +## Impact + +- `StepResult` gains a new exported field — additive, not breaking for existing code that constructs or reads `StepResult` by field name. Code using `StepResult{Status: ..., Err: ...}` struct literals (without field names) would break at compile time, but that pattern is unlikely and easily fixed. +- `Condition` functions receive `map[Steper]StepResult` — the new field is available to condition authors at no extra cost. +- The workflow's `clock.Clock` field (already present) is used for timestamping, keeping tests deterministic. +- No new dependencies. No API removals. diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/specs/execution-model/spec.md b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/specs/execution-model/spec.md new file mode 100644 index 0000000..3974ee3 --- /dev/null +++ b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/specs/execution-model/spec.md @@ -0,0 +1,73 @@ +## ADDED Requirements + +### Requirement: StepResult carries finish timestamp + +`StepResult` SHALL include a `FinishedAt time.Time` field that records the moment +the step goroutine transitioned to a terminal status (`Succeeded`, `Failed`, `Canceled`, +or `Skipped`). + +The timestamp SHALL be recorded using the Workflow's injected `clock.Clock`, so that +tests using a mock clock produce deterministic values. + +Steps that are never executed (e.g., never transitioned to `Running`) SHALL have a +zero `FinishedAt` value. + +#### Scenario: Succeeded step has FinishedAt set +- **WHEN** `step.Do(ctx)` returns `nil` and the step transitions to `Succeeded` +- **THEN** `StepResult.FinishedAt` is set to `clock.Now()` at the moment of transition + +#### Scenario: Failed step has FinishedAt set +- **WHEN** `step.Do(ctx)` returns a non-nil error and the step transitions to `Failed` +- **THEN** `StepResult.FinishedAt` is set to `clock.Now()` at the moment of transition + +#### Scenario: Canceled step has FinishedAt set +- **WHEN** a step is canceled and transitions to `Canceled` +- **THEN** `StepResult.FinishedAt` is set to `clock.Now()` at the moment of transition + +#### Scenario: Skipped step has FinishedAt set +- **WHEN** a step's Condition evaluates to `Skipped` and the step never runs +- **THEN** `StepResult.FinishedAt` is set to `clock.Now()` at the moment of the skip transition + +#### Scenario: Never-executed step has zero FinishedAt +- **WHEN** a step remains `Pending` at the end of workflow execution +- **THEN** `StepResult.FinishedAt` is the zero value of `time.Time` + +#### Scenario: FinishedAt available in Condition functions +- **WHEN** a Condition function receives `map[Steper]StepResult` for upstream steps +- **THEN** `FinishedAt` is populated for all terminated upstream steps and available to the condition logic + +## MODIFIED Requirements + +### Requirement: ErrWorkflow error output ordering + +`ErrWorkflow.Error()` SHALL output steps sorted by `StepResult.FinishedAt` in ascending +order (earliest-finishing step first), so that the error message reflects the execution +timeline. + +Steps with a zero `FinishedAt` (i.e., steps that never executed) SHALL appear last. + +When two or more steps share an identical `FinishedAt` value, they SHALL be sorted +by their string name (`flow.String(step)`) in ascending lexicographic order to produce +a stable, deterministic output. + +`ErrWorkflow.Unwrap()` SHALL return errors in the same sorted order. + +#### Scenario: Single-step workflow failure output +- **WHEN** a workflow with one failed step produces `ErrWorkflow` +- **THEN** `ErrWorkflow.Error()` contains exactly that step's output + +#### Scenario: Multi-step output is sorted by finish time +- **WHEN** steps A, B, C finish in that order (A earliest, C latest) +- **THEN** `ErrWorkflow.Error()` lists them A, B, C regardless of map iteration order + +#### Scenario: Never-executed steps appear last +- **WHEN** some steps have zero `FinishedAt` (never ran) and others have non-zero timestamps +- **THEN** `ErrWorkflow.Error()` lists all non-zero-timestamp steps first, zero-timestamp steps last + +#### Scenario: Tie-breaking by name +- **WHEN** two steps have identical `FinishedAt` values +- **THEN** `ErrWorkflow.Error()` lists them in ascending lexicographic order by step name + +#### Scenario: Unwrap order matches Error order +- **WHEN** `ErrWorkflow.Unwrap()` is called +- **THEN** the returned error slice is in the same order as `ErrWorkflow.Error()` output diff --git a/openspec/changes/archive/2026-05-05-errworkflow-execution-order/tasks.md b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/tasks.md new file mode 100644 index 0000000..ba87c73 --- /dev/null +++ b/openspec/changes/archive/2026-05-05-errworkflow-execution-order/tasks.md @@ -0,0 +1,31 @@ +## 1. Extend StepResult with FinishedAt + +- [ ] 1.1 Add `FinishedAt time.Time` field to `StepResult` struct in `error.go` +- [ ] 1.2 Add `SetFinishedAt(t time.Time)` method to `State` in `state.go`, following the same pattern as `SetStatus`/`SetError` +- [ ] 1.3 Add `GetFinishedAt() time.Time` method to `State` if needed for symmetry + +## 2. Record Timestamp at Step Termination + +- [ ] 2.1 In the step goroutine defer block in `workflow.go`, call `state.SetFinishedAt(w.Clock.Now())` just before `state.SetStatus(status)` and `state.SetError(err)` +- [ ] 2.2 For condition-evaluated skips (where a step is skipped without running), ensure `SetFinishedAt` is also called at the point the skip status is assigned in `tick()` + +## 3. Sort ErrWorkflow Output + +- [ ] 3.1 Add `sort` import to `error.go` +- [ ] 3.2 Implement a helper `sortedSteps(e ErrWorkflow) []Steper` that returns steps sorted by `FinishedAt` ascending, zero-time last, tie-broken by `String(step)` lexicographically +- [ ] 3.3 Rewrite `ErrWorkflow.Error()` to use `sortedSteps` for iteration +- [ ] 3.4 Rewrite `ErrWorkflow.Unwrap()` to use `sortedSteps` for iteration + +## 4. Tests + +- [ ] 4.1 Add a test asserting `StepResult.FinishedAt` is populated after workflow execution (use `clock.NewMock()` and advance time between steps) +- [ ] 4.2 Add a test for `ErrWorkflow.Error()` output ordering: run a 3-step serial workflow, verify output is in execution order regardless of step name sort order +- [ ] 4.3 Add a test for tie-breaking: construct an `ErrWorkflow` with two steps sharing identical `FinishedAt`, verify alphabetical order in output +- [ ] 4.4 Add a test for zero-`FinishedAt` steps appearing last in output +- [ ] 4.5 Verify existing tests in `condition_test.go` that construct `StepResult` manually still compile and pass (update literals to use field names if needed) + +## 5. Verify + +- [ ] 5.1 Run `go build ./...` — no compile errors +- [ ] 5.2 Run `go test ./...` — all tests pass +- [ ] 5.3 Run `go vet ./...` — no issues From faba9db2802bc4b90750d8697d7f0d3a71ecb741 Mon Sep 17 00:00:00 2001 From: Xingfei Xu Date: Thu, 7 May 2026 02:38:04 +0000 Subject: [PATCH 29/29] move comment --- workflow.go | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/workflow.go b/workflow.go index 0ccf6a4..9f9f487 100644 --- a/workflow.go +++ b/workflow.go @@ -254,18 +254,17 @@ func (w *Workflow) IsTerminated() bool { } // Reset resets the Workflow to ready for a new run. -// -// Unlike the internal reset() (which Do() calls at its own start), Reset() also -// clears interceptors inherited from a parent during a previous run. The internal -// reset() must not clear them, because the parent writes them just before calling -// child.Do(), and child.Do() then calls reset() — clearing there would wipe the -// just-written prefix and break inheritance. func (w *Workflow) Reset() error { if !w.isRunning.TryLock() { return ErrWorkflowIsRunning } defer w.isRunning.Unlock() w.reset() + // Unlike the internal reset() (which Do() calls at its own start), Reset() also + // clears interceptors inherited from a parent during a previous run. The internal + // reset() must not clear them, because the parent writes them just before calling + // child.Do(), and child.Do() then calls reset() — clearing there would wipe the + // just-written prefix and break inheritance. w.inheritedStep = nil w.inheritedAttempt = nil return nil