fix: render claude-code streams in chat + correct session view by ljagiello · Pull Request #408 · friday-platform/friday-studio

ljagiello · 2026-05-20T08:13:29Z

Summary

Two correlated fixes for the live agent-streaming pipeline that the user surfaced by asking "why don't I see any output from my claude-code agent?":

claude-code agent emit gap — the agent emitted only data-tool-progress chips, so the chat UI had no text bubbles or tool-call cards to render. It now emits the full AI SDK v6 chunk vocabulary (text-*, tool-input-*, tool-output-*) translated from the Claude Code SDK's block stream. The supplemental progress chip is kept.
Session view reducer bugs — running a job whose FSM action invoked claude-code surfaced three correlated issues in the session view: a duplicate "Unknown · Succeeded" block alongside the planned "Pr Reviewer · Skipped", a missing input field on the block, and a "task: ''" rendering. Root cause was the reducer not handling agent-name drift, stepNumber mismatch on step:complete, and out-of-order session:start (which arrived AFTER step:start because both events publish to the same JetStream subject in fast succession).

See individual commit messages for full diagnostics.

Test plan

deno task test packages/bundled-agents/src/claude-code/agent.test.ts — 29/29 (2 new for the chunk translation)
deno task test packages/core/src/session/session-reducer.test.ts packages/core/src/session/event-emission-mapper.test.ts — 63/63 (3 new covering each fallback path)
Live UI smoke against a minimal qa-claude-code-stream workspace — claude-code emit verified end-to-end via MCPStreamEmitter log + chat UI rendering nested Bash tool cards from a live PR review run
Live UI smoke against vanilla_apple / PR Code Review workspace — session view now shows a single pr-reviewer block (Pending → Running → Completed) with input: { pr_url: "…" } populated, instead of the prior duplicate Pr Reviewer · Skipped + Unknown · Succeeded pair

The atlas/claude-code agent consumed @anthropic-ai/claude-agent-sdk's query() stream directly and only emitted `data-tool-progress` chips — so the chat UI saw no text, no tool-input cards, no tool-output cards for anything claude-code did. The original user report ("I don't see any streamed output from my claude-code agent") matched a log where 24 data-tool-progress events fired over a 152s run and nothing else. Translate each SDK block into the AI SDK v6 chunk vocabulary the chat already renders: - assistant text blocks → text-start / text-delta / text-end - assistant tool_use blocks → tool-input-start + tool-input-available - user tool_result blocks → tool-output-available (matched to the call via tool_use_id) The supplemental data-tool-progress chip is kept after the tool_use emit so users still see the friendly "what's happening" summary. Two new vitest cases mock the SDK stream and assert the exact chunk sequence with matched ids; 29/29 tests pass.

Three correlated bugs in the session view reducer surfaced when running a job whose FSM action invokes an atlas/claude-code agent: 1. Duplicate "Unknown" block alongside the planned one. When the FSM action's actionId drifted, `mapActionToStepStart` fell back to "unknown" while the planned block carried the real agent name. reduceStepStart's findIndex on agentName failed, so the reducer appended a fresh "Unknown" running block — leaving the original planned block stuck in pending (then "skipped" at session:complete). Fix: fall back to matching by `stateId` (a stable join key both events carry) in reduceStepStart, and keep the planned agentName when matched via the fallback. 2. step:complete couldn't bind to its planned block. The schema only carried `stepNumber` — no `stateId`. When step:start hadn't updated the planned block's stepNumber (cf. bug #1), step:complete synthesized a second "unknown" placeholder instead of completing the existing pending one. Fix: add optional `stateId` to StepCompleteEventSchema, populate it in mapActionToStepComplete from `event.data.state`, and use it as a fallback identifier in reduceStepComplete. Backfill stepNumber onto the matched block so subsequent ephemeral chunks attach. 3. block.input was always missing on the persisted view. step:start could arrive BEFORE session:start because both events publish to the same JetStream subject in fast succession and ordering isn't guaranteed for tightly-spaced writes. When step:start landed first, reduceStepStart correctly created a running block with the signal input — but the subsequent session:start handler unconditionally overwrote `agentBlocks` with the planned (empty) ones, destroying the input. Fix: session:start now merges. If a planned step's stateId matches an existing running block, keep the existing block (with its input, startedAt, status) and only overlay the planned `task` label when the existing one is empty. 3 new vitest cases pin each fallback path; 63/63 tests pass. Verified end-to-end against a live PR-review run — the session view now shows one "Pr Reviewer" block with input { pr_url: "..." } populated.

Addresses code review feedback (docs/reviews/2026-05-20-claude-code-stream-and-session-view.md): Critical: - session:start now UNIONS planned + existing blocks instead of overwriting. Off-plan running blocks (out-of-order step:start for a state not in plannedSteps, or session:start with no plannedSteps at all) survive intact. Pre-fix the merge silently dropped them — same shape of bug the PR claimed to fix, just moved to a corner case. - claude-code agent now normalizes `tool_result.content` before emitting `tool-output-available`. The SDK delivers `content` as `string | Array<TextBlock|ImageBlock|...> | undefined`; passing the array through leaked raw JS objects into the chat UI for any tool that returns non-text blocks (images, search results, multi-block returns). Text blocks concatenate; non-text blocks collapse to a `[type]` marker. Important: - mapActionToStepStart's agentName fallback chain is now `actionId ?? state ?? "unknown"`. `state` is always set for agent actions and matches what `extractPlannedSteps` uses on the planned block, eliminating the "unknown" leak at the source. The reducer's stateId fallback stays as defense-in-depth. - session:start merge comment now points at the actual race source (`NatsSessionStream.emit` fire-and-forget publish) rather than misattributing it to JetStream subject ordering. - Dropped the dead `existing.actionType ?? plannedBlock.actionType` branch — actionType is required on a running block. Tests: - claude-code agent.test.ts: * Stub `smallLLM` so `generateProgress` returns deterministically and `data-tool-progress` always fires — lets us assert the FULL chunk order (7 events) including the progress chip, instead of filtering it out and masking ordering regressions. * New text-only assistant message case — exercises the text-emit branch in isolation. * New table-driven content-normalization cases — string, undefined, structured array — pin the new normalizer's contract. * Replaced `as { type: string; id: string }` cast with `toMatchObject` + `expect.assert` for narrowing. - session-reducer.test.ts: * Off-plan running block preserved through session:start merge. * session:start with `plannedSteps: undefined` preserves all blocks. * Inverse agentName drift — when planned is "unknown", event.agentName wins on step:start. * step:complete match priority — stepNumber beats stateId fallback. - event-emission-mapper.test.ts: updated the "actionId missing" case to assert the new state-based fallback. 100/100 tests pass across the three affected files.

ljagiello · 2026-05-20T08:33:33Z

Addressed the review in a3a3e39. Highlights:

Critical

session:start now unions planned + existing blocks instead of overwriting. Off-plan running blocks (out-of-order step:start for a state not in plannedSteps, or plannedSteps: undefined) survive intact.
tool_result.content is normalized via a new normalizeToolResultContent helper — string passes through, undefined → "", structured arrays concatenate text blocks and collapse non-text ones to [type] markers. No more raw JS arrays leaking into tool-output-available.

Important

mapActionToStepStart now falls back through actionId ?? state ?? "unknown", eliminating the "unknown" leak at its source. The reducer's stateId fallback stays as defense-in-depth.
Misleading JetStream comment rewritten to point at NatsSessionStream.emit's fire-and-forget publish as the actual race source.
Dropped the dead existing.actionType ?? plannedBlock.actionType branch.

Tests (100/100 pass)

Stubbed smallLLM so the main ordering test asserts the full 7-event sequence including data-tool-progress — no more filtering that hid ordering regressions.
New text-only assistant message case.
Table-driven content normalization cases (string / undefined / structured array).
as { type: …; id: … } cast replaced with toMatchObject + expect.assert.
New session-reducer cases: off-plan block preservation, plannedSteps: undefined, agentName-drift inverse, step:complete priority (stepNumber over stateId).

Review doc: docs/reviews/2026-05-20-claude-code-stream-and-session-view.md.

The deterministic-order version of the chunk-translation test relied on mocking both `ai.generateObject` and `@atlas/llm.smallLLM` so the `data-tool-progress` chip would always land at a predictable index. Those partial mocks bring real risk in parallel test pools — `vi.mock` with `importOriginal` re-evaluates the module under transform, and any top-level side effect in `@atlas/llm` (which loads provider clients) or `ai` (which the workspace touches everywhere) is double-evaluated. The 8-minute CI run that flaked on this PR pointed in that direction even though it turned out to be unrelated; this change removes the surface area regardless. Replace the strict positional assertion with a conditional position guard: when `data-tool-progress` IS emitted, assert it sits strictly between `tool-input-available` and `tool-output-available`. When the stub LLM rejects and the chip is skipped, the test still passes via the filtered `core` array (text/tool-input/tool-output positions locked). Same ordering invariant, no module mocks. Tests: 33/33 in this file, 6668/6668 in the full suite.

Critical: - Widen the `reduceStepStart` dup guard from `(stepNumber, agentName)` to `(stepNumber, status !== "pending")`. The agentName-drift fallback + stateId fallback introduced two reachable corner cases that re-created the duplicate-block class of bug this PR set out to fix: * step:start replay after stateId fallback: the block's agentName is preserved (e.g. "pr-reviewer") while the replay still carries "unknown" — the original `(stepNumber, agentName)` guard misses, both pending searches miss (block is now "running"), and a fresh "unknown" block gets appended. * step:complete-before-step:start with drift: step:complete completes the planned block via stateId fallback, then a late step:start falls through all pending searches (block is "completed") and appends a duplicate. Keying on stepNumber alone subsumes the original J2/H1 protection and catches both new cases. FSM stepNumber is monotonic per execution, so this doesn't regress legitimate FSM loops (they emit different stepNumbers). Important: - Drop the dead `?? "unknown"` tail in `mapActionToStepStart`'s agentName fallback chain. `FSMActionExecutionEvent.state` is `string` (required), so `event.data.actionId ?? event.data.state` is total. The reducer's own placeholder branch in `reduceStepComplete` remains as the genuine ultimate fallback. - Drop the speculative `pending.agentName &&` falsy check in the preservation expression. `AgentBlockSchema.agentName` is `z.string()`, so the empty-string case is unreachable without a contract violation. - Delete the conditional `data-tool-progress` position guard in agent.test.ts. `createStubPlatformModels` returns a `LanguageModelV3` whose `doGenerate`/`doStream` always reject, so `generateProgress` throws and the production try/catch swallows it — the chip is never emitted under stub models, and `progressIdx === -1` makes the guarded block dead code. Replaced with an updated comment explaining the absence; the integration smoke test covers the chip's runtime position. - Add a 4th `it.each` case for `normalizeToolResultContent`: a `{ type: "text", text: 123 }` block must collapse to `[text]` (the two-stage guard's `typeof c.text === "string"` check). Pins the defensive branch against future flattening. Tests: - Two new regression tests in session-reducer.test.ts pin the widened dup guard: * "replay step:start after stateId-fallback is a no-op" * "step:start after step:complete (already transitioned) is a no-op" 103/103 in the affected files. Full repo typecheck clean. ## Progress - Task: PR #408 round-2 review feedback (docs/reviews/2026-05-20-claude-code-stream-and-session-view-r2.md) - Decisions: * Widen on stepNumber alone (not stepNumber+stateId) — stateId-only dedup would break legitimate FSM loops that revisit the same state with a different stepNumber. The asymmetry between "transitioned" and "pending" status is the right discriminator. * Delete the position guard rather than stubbing `smallLLM` — the integration smoke test covers the chip's runtime position; adding a per-role LanguageModelV3 mock just to make a conditional unconditional adds surface area without real coverage gain. * Left the "duplicate-stateId planned steps" (multi-entry FSMs) and "per-text-block UUID fragmentation" findings as open decisions — both require product judgement, not just code. - Key Learnings: * `createStubPlatformModels.get(role)` returns a `LanguageModelV3` whose `doGenerate`/`doStream` always reject. Tests that route through `smallLLM`/`generateProgress` will silently fail the inner call and trip the production try/catch. Conditional positional assertions around the swallowed-error event never fire in CI — write a dedicated test with a per-role override model, or assert on the absence directly. * For the session reducer's dup guard, `stepNumber` alone is a stronger key than `(stepNumber, agentName)` once any agentName fallback exists. The drift creates an asymmetry the original composite key can't see — the block's preserved name no longer matches the replay's drifted name. - Files: * packages/core/src/session/session-reducer.ts * packages/core/src/session/session-reducer.test.ts * packages/core/src/session/event-emission-mapper.ts * packages/bundled-agents/src/claude-code/agent.test.ts

LissaGreense · 2026-05-21T17:12:14Z

Hi! Just hit the same "Unknown / unknown" symptom on main from a non-claude-code path — sharing a repro in case it's useful confirmation that the reducer change here generalizes to plain type: agent jobs too.

Workspace Jira Dashboard (id buttery_yogurt), one FSM action, no claude-code anywhere:

jobs:
  summarize-kan:
    triggers: [{ signal: summarize }]
    fsm:
      initial: idle
      states:
        idle: { on: { summarize: { target: run } } }
        run:
          type: final
          entry:
            - type: agent
              agentId: jira-summarizer   # llm agent, anthropic:claude-sonnet-4-6
              outputTo: summary

Session view renders two rows: Run / summary [Skipped] and unknown / unknown [Succeeded 8.4s] — same planned-Skipped + Unknown-Succeeded duplication this PR body describes. The action has no explicit id, so mapActionToStepStart falls back to agentName: "unknown" and the old (stepNumber, agentName) join misses the planned block. With the stateId fallback in this PR, stateId === "run" should match in-place.

ljagiello added 2 commits May 20, 2026 01:12

ljagiello requested a review from Vpr99 as a code owner May 20, 2026 08:13

ljagiello and others added 2 commits May 20, 2026 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: render claude-code streams in chat + correct session view#408

fix: render claude-code streams in chat + correct session view#408
ljagiello wants to merge 5 commits into
mainfrom
fix/claude-code-stream-and-session-view

ljagiello commented May 20, 2026

Uh oh!

ljagiello commented May 20, 2026

Uh oh!

LissaGreense commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ljagiello commented May 20, 2026

Summary

Test plan

Uh oh!

ljagiello commented May 20, 2026

Uh oh!

LissaGreense commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants