fix: render claude-code streams in chat + correct session view#408
fix: render claude-code streams in chat + correct session view#408ljagiello wants to merge 5 commits into
Conversation
The atlas/claude-code agent consumed @anthropic-ai/claude-agent-sdk's
query() stream directly and only emitted `data-tool-progress` chips —
so the chat UI saw no text, no tool-input cards, no tool-output cards
for anything claude-code did. The original user report ("I don't see
any streamed output from my claude-code agent") matched a log where 24
data-tool-progress events fired over a 152s run and nothing else.
Translate each SDK block into the AI SDK v6 chunk vocabulary the chat
already renders:
- assistant text blocks → text-start / text-delta / text-end
- assistant tool_use blocks → tool-input-start + tool-input-available
- user tool_result blocks → tool-output-available
(matched to the call via tool_use_id)
The supplemental data-tool-progress chip is kept after the tool_use
emit so users still see the friendly "what's happening" summary.
Two new vitest cases mock the SDK stream and assert the exact chunk
sequence with matched ids; 29/29 tests pass.
Three correlated bugs in the session view reducer surfaced when running a job whose FSM action invokes an atlas/claude-code agent: 1. Duplicate "Unknown" block alongside the planned one. When the FSM action's actionId drifted, `mapActionToStepStart` fell back to "unknown" while the planned block carried the real agent name. reduceStepStart's findIndex on agentName failed, so the reducer appended a fresh "Unknown" running block — leaving the original planned block stuck in pending (then "skipped" at session:complete). Fix: fall back to matching by `stateId` (a stable join key both events carry) in reduceStepStart, and keep the planned agentName when matched via the fallback. 2. step:complete couldn't bind to its planned block. The schema only carried `stepNumber` — no `stateId`. When step:start hadn't updated the planned block's stepNumber (cf. bug #1), step:complete synthesized a second "unknown" placeholder instead of completing the existing pending one. Fix: add optional `stateId` to StepCompleteEventSchema, populate it in mapActionToStepComplete from `event.data.state`, and use it as a fallback identifier in reduceStepComplete. Backfill stepNumber onto the matched block so subsequent ephemeral chunks attach. 3. block.input was always missing on the persisted view. step:start could arrive BEFORE session:start because both events publish to the same JetStream subject in fast succession and ordering isn't guaranteed for tightly-spaced writes. When step:start landed first, reduceStepStart correctly created a running block with the signal input — but the subsequent session:start handler unconditionally overwrote `agentBlocks` with the planned (empty) ones, destroying the input. Fix: session:start now merges. If a planned step's stateId matches an existing running block, keep the existing block (with its input, startedAt, status) and only overlay the planned `task` label when the existing one is empty. 3 new vitest cases pin each fallback path; 63/63 tests pass. Verified end-to-end against a live PR-review run — the session view now shows one "Pr Reviewer" block with input { pr_url: "..." } populated.
Addresses code review feedback (docs/reviews/2026-05-20-claude-code-stream-and-session-view.md):
Critical:
- session:start now UNIONS planned + existing blocks instead of overwriting.
Off-plan running blocks (out-of-order step:start for a state not in
plannedSteps, or session:start with no plannedSteps at all) survive
intact. Pre-fix the merge silently dropped them — same shape of bug the
PR claimed to fix, just moved to a corner case.
- claude-code agent now normalizes `tool_result.content` before emitting
`tool-output-available`. The SDK delivers `content` as
`string | Array<TextBlock|ImageBlock|...> | undefined`; passing the
array through leaked raw JS objects into the chat UI for any tool that
returns non-text blocks (images, search results, multi-block returns).
Text blocks concatenate; non-text blocks collapse to a `[type]` marker.
Important:
- mapActionToStepStart's agentName fallback chain is now
`actionId ?? state ?? "unknown"`. `state` is always set for agent
actions and matches what `extractPlannedSteps` uses on the planned
block, eliminating the "unknown" leak at the source. The reducer's
stateId fallback stays as defense-in-depth.
- session:start merge comment now points at the actual race source
(`NatsSessionStream.emit` fire-and-forget publish) rather than
misattributing it to JetStream subject ordering.
- Dropped the dead `existing.actionType ?? plannedBlock.actionType`
branch — actionType is required on a running block.
Tests:
- claude-code agent.test.ts:
* Stub `smallLLM` so `generateProgress` returns deterministically and
`data-tool-progress` always fires — lets us assert the FULL chunk
order (7 events) including the progress chip, instead of filtering
it out and masking ordering regressions.
* New text-only assistant message case — exercises the text-emit
branch in isolation.
* New table-driven content-normalization cases — string, undefined,
structured array — pin the new normalizer's contract.
* Replaced `as { type: string; id: string }` cast with
`toMatchObject` + `expect.assert` for narrowing.
- session-reducer.test.ts:
* Off-plan running block preserved through session:start merge.
* session:start with `plannedSteps: undefined` preserves all blocks.
* Inverse agentName drift — when planned is "unknown",
event.agentName wins on step:start.
* step:complete match priority — stepNumber beats stateId fallback.
- event-emission-mapper.test.ts: updated the "actionId missing" case
to assert the new state-based fallback.
100/100 tests pass across the three affected files.
|
Addressed the review in a3a3e39. Highlights: Critical
Important
Tests (100/100 pass)
Review doc: |
The deterministic-order version of the chunk-translation test relied on mocking both `ai.generateObject` and `@atlas/llm.smallLLM` so the `data-tool-progress` chip would always land at a predictable index. Those partial mocks bring real risk in parallel test pools — `vi.mock` with `importOriginal` re-evaluates the module under transform, and any top-level side effect in `@atlas/llm` (which loads provider clients) or `ai` (which the workspace touches everywhere) is double-evaluated. The 8-minute CI run that flaked on this PR pointed in that direction even though it turned out to be unrelated; this change removes the surface area regardless. Replace the strict positional assertion with a conditional position guard: when `data-tool-progress` IS emitted, assert it sits strictly between `tool-input-available` and `tool-output-available`. When the stub LLM rejects and the chip is skipped, the test still passes via the filtered `core` array (text/tool-input/tool-output positions locked). Same ordering invariant, no module mocks. Tests: 33/33 in this file, 6668/6668 in the full suite.
Critical:
- Widen the `reduceStepStart` dup guard from `(stepNumber, agentName)` to
`(stepNumber, status !== "pending")`. The agentName-drift fallback +
stateId fallback introduced two reachable corner cases that re-created
the duplicate-block class of bug this PR set out to fix:
* step:start replay after stateId fallback: the block's agentName is
preserved (e.g. "pr-reviewer") while the replay still carries
"unknown" — the original `(stepNumber, agentName)` guard misses,
both pending searches miss (block is now "running"), and a fresh
"unknown" block gets appended.
* step:complete-before-step:start with drift: step:complete completes
the planned block via stateId fallback, then a late step:start
falls through all pending searches (block is "completed") and
appends a duplicate.
Keying on stepNumber alone subsumes the original J2/H1 protection and
catches both new cases. FSM stepNumber is monotonic per execution, so
this doesn't regress legitimate FSM loops (they emit different
stepNumbers).
Important:
- Drop the dead `?? "unknown"` tail in `mapActionToStepStart`'s agentName
fallback chain. `FSMActionExecutionEvent.state` is `string` (required),
so `event.data.actionId ?? event.data.state` is total. The reducer's
own placeholder branch in `reduceStepComplete` remains as the genuine
ultimate fallback.
- Drop the speculative `pending.agentName &&` falsy check in the
preservation expression. `AgentBlockSchema.agentName` is `z.string()`,
so the empty-string case is unreachable without a contract violation.
- Delete the conditional `data-tool-progress` position guard in
agent.test.ts. `createStubPlatformModels` returns a `LanguageModelV3`
whose `doGenerate`/`doStream` always reject, so `generateProgress`
throws and the production try/catch swallows it — the chip is never
emitted under stub models, and `progressIdx === -1` makes the guarded
block dead code. Replaced with an updated comment explaining the
absence; the integration smoke test covers the chip's runtime position.
- Add a 4th `it.each` case for `normalizeToolResultContent`: a
`{ type: "text", text: 123 }` block must collapse to `[text]` (the
two-stage guard's `typeof c.text === "string"` check). Pins the
defensive branch against future flattening.
Tests:
- Two new regression tests in session-reducer.test.ts pin the widened
dup guard:
* "replay step:start after stateId-fallback is a no-op"
* "step:start after step:complete (already transitioned) is a no-op"
103/103 in the affected files. Full repo typecheck clean.
## Progress
- Task: PR #408 round-2 review feedback (docs/reviews/2026-05-20-claude-code-stream-and-session-view-r2.md)
- Decisions:
* Widen on stepNumber alone (not stepNumber+stateId) — stateId-only
dedup would break legitimate FSM loops that revisit the same state
with a different stepNumber. The asymmetry between "transitioned"
and "pending" status is the right discriminator.
* Delete the position guard rather than stubbing `smallLLM` — the
integration smoke test covers the chip's runtime position; adding
a per-role LanguageModelV3 mock just to make a conditional
unconditional adds surface area without real coverage gain.
* Left the "duplicate-stateId planned steps" (multi-entry FSMs) and
"per-text-block UUID fragmentation" findings as open decisions —
both require product judgement, not just code.
- Key Learnings:
* `createStubPlatformModels.get(role)` returns a `LanguageModelV3`
whose `doGenerate`/`doStream` always reject. Tests that route
through `smallLLM`/`generateProgress` will silently fail the
inner call and trip the production try/catch. Conditional
positional assertions around the swallowed-error event never
fire in CI — write a dedicated test with a per-role override
model, or assert on the absence directly.
* For the session reducer's dup guard, `stepNumber` alone is a
stronger key than `(stepNumber, agentName)` once any agentName
fallback exists. The drift creates an asymmetry the original
composite key can't see — the block's preserved name no longer
matches the replay's drifted name.
- Files:
* packages/core/src/session/session-reducer.ts
* packages/core/src/session/session-reducer.test.ts
* packages/core/src/session/event-emission-mapper.ts
* packages/bundled-agents/src/claude-code/agent.test.ts
|
Hi! Just hit the same "Unknown / unknown" symptom on Workspace jobs:
summarize-kan:
triggers: [{ signal: summarize }]
fsm:
initial: idle
states:
idle: { on: { summarize: { target: run } } }
run:
type: final
entry:
- type: agent
agentId: jira-summarizer # llm agent, anthropic:claude-sonnet-4-6
outputTo: summarySession view renders two rows: |
Summary
Two correlated fixes for the live agent-streaming pipeline that the user surfaced by asking "why don't I see any output from my claude-code agent?":
data-tool-progresschips, so the chat UI had no text bubbles or tool-call cards to render. It now emits the full AI SDK v6 chunk vocabulary (text-*,tool-input-*,tool-output-*) translated from the Claude Code SDK's block stream. The supplemental progress chip is kept.inputfield on the block, and a "task: ''" rendering. Root cause was the reducer not handling agent-name drift, stepNumber mismatch on step:complete, and out-of-order session:start (which arrived AFTER step:start because both events publish to the same JetStream subject in fast succession).See individual commit messages for full diagnostics.
Test plan
deno task test packages/bundled-agents/src/claude-code/agent.test.ts— 29/29 (2 new for the chunk translation)deno task test packages/core/src/session/session-reducer.test.ts packages/core/src/session/event-emission-mapper.test.ts— 63/63 (3 new covering each fallback path)qa-claude-code-streamworkspace — claude-code emit verified end-to-end viaMCPStreamEmitterlog + chat UI rendering nested Bash tool cards from a live PR review runvanilla_apple/ PR Code Review workspace — session view now shows a singlepr-reviewerblock (Pending → Running → Completed) withinput: { pr_url: "…" }populated, instead of the prior duplicatePr Reviewer · Skipped + Unknown · Succeededpair