Skip to content

fix: render claude-code streams in chat + correct session view#408

Open
ljagiello wants to merge 5 commits into
mainfrom
fix/claude-code-stream-and-session-view
Open

fix: render claude-code streams in chat + correct session view#408
ljagiello wants to merge 5 commits into
mainfrom
fix/claude-code-stream-and-session-view

Conversation

@ljagiello

Copy link
Copy Markdown
Contributor

Summary

Two correlated fixes for the live agent-streaming pipeline that the user surfaced by asking "why don't I see any output from my claude-code agent?":

  • claude-code agent emit gap — the agent emitted only data-tool-progress chips, so the chat UI had no text bubbles or tool-call cards to render. It now emits the full AI SDK v6 chunk vocabulary (text-*, tool-input-*, tool-output-*) translated from the Claude Code SDK's block stream. The supplemental progress chip is kept.
  • Session view reducer bugs — running a job whose FSM action invoked claude-code surfaced three correlated issues in the session view: a duplicate "Unknown · Succeeded" block alongside the planned "Pr Reviewer · Skipped", a missing input field on the block, and a "task: ''" rendering. Root cause was the reducer not handling agent-name drift, stepNumber mismatch on step:complete, and out-of-order session:start (which arrived AFTER step:start because both events publish to the same JetStream subject in fast succession).

See individual commit messages for full diagnostics.

Test plan

  • deno task test packages/bundled-agents/src/claude-code/agent.test.ts — 29/29 (2 new for the chunk translation)
  • deno task test packages/core/src/session/session-reducer.test.ts packages/core/src/session/event-emission-mapper.test.ts — 63/63 (3 new covering each fallback path)
  • Live UI smoke against a minimal qa-claude-code-stream workspace — claude-code emit verified end-to-end via MCPStreamEmitter log + chat UI rendering nested Bash tool cards from a live PR review run
  • Live UI smoke against vanilla_apple / PR Code Review workspace — session view now shows a single pr-reviewer block (Pending → Running → Completed) with input: { pr_url: "…" } populated, instead of the prior duplicate Pr Reviewer · Skipped + Unknown · Succeeded pair

ljagiello added 2 commits May 20, 2026 01:12
The atlas/claude-code agent consumed @anthropic-ai/claude-agent-sdk's
query() stream directly and only emitted `data-tool-progress` chips —
so the chat UI saw no text, no tool-input cards, no tool-output cards
for anything claude-code did. The original user report ("I don't see
any streamed output from my claude-code agent") matched a log where 24
data-tool-progress events fired over a 152s run and nothing else.

Translate each SDK block into the AI SDK v6 chunk vocabulary the chat
already renders:

- assistant text blocks → text-start / text-delta / text-end
- assistant tool_use blocks → tool-input-start + tool-input-available
- user tool_result blocks → tool-output-available
  (matched to the call via tool_use_id)

The supplemental data-tool-progress chip is kept after the tool_use
emit so users still see the friendly "what's happening" summary.

Two new vitest cases mock the SDK stream and assert the exact chunk
sequence with matched ids; 29/29 tests pass.
Three correlated bugs in the session view reducer surfaced when running
a job whose FSM action invokes an atlas/claude-code agent:

1. Duplicate "Unknown" block alongside the planned one. When the FSM
   action's actionId drifted, `mapActionToStepStart` fell back to
   "unknown" while the planned block carried the real agent name.
   reduceStepStart's findIndex on agentName failed, so the reducer
   appended a fresh "Unknown" running block — leaving the original
   planned block stuck in pending (then "skipped" at session:complete).

   Fix: fall back to matching by `stateId` (a stable join key both
   events carry) in reduceStepStart, and keep the planned agentName
   when matched via the fallback.

2. step:complete couldn't bind to its planned block. The schema only
   carried `stepNumber` — no `stateId`. When step:start hadn't
   updated the planned block's stepNumber (cf. bug #1), step:complete
   synthesized a second "unknown" placeholder instead of completing
   the existing pending one.

   Fix: add optional `stateId` to StepCompleteEventSchema, populate it
   in mapActionToStepComplete from `event.data.state`, and use it as
   a fallback identifier in reduceStepComplete. Backfill stepNumber
   onto the matched block so subsequent ephemeral chunks attach.

3. block.input was always missing on the persisted view. step:start
   could arrive BEFORE session:start because both events publish to
   the same JetStream subject in fast succession and ordering isn't
   guaranteed for tightly-spaced writes. When step:start landed first,
   reduceStepStart correctly created a running block with the signal
   input — but the subsequent session:start handler unconditionally
   overwrote `agentBlocks` with the planned (empty) ones, destroying
   the input.

   Fix: session:start now merges. If a planned step's stateId matches
   an existing running block, keep the existing block (with its input,
   startedAt, status) and only overlay the planned `task` label when
   the existing one is empty.

3 new vitest cases pin each fallback path; 63/63 tests pass. Verified
end-to-end against a live PR-review run — the session view now shows
one "Pr Reviewer" block with input { pr_url: "..." } populated.
@ljagiello ljagiello requested a review from Vpr99 as a code owner May 20, 2026 08:13
Addresses code review feedback (docs/reviews/2026-05-20-claude-code-stream-and-session-view.md):

Critical:
- session:start now UNIONS planned + existing blocks instead of overwriting.
  Off-plan running blocks (out-of-order step:start for a state not in
  plannedSteps, or session:start with no plannedSteps at all) survive
  intact. Pre-fix the merge silently dropped them — same shape of bug the
  PR claimed to fix, just moved to a corner case.
- claude-code agent now normalizes `tool_result.content` before emitting
  `tool-output-available`. The SDK delivers `content` as
  `string | Array<TextBlock|ImageBlock|...> | undefined`; passing the
  array through leaked raw JS objects into the chat UI for any tool that
  returns non-text blocks (images, search results, multi-block returns).
  Text blocks concatenate; non-text blocks collapse to a `[type]` marker.

Important:
- mapActionToStepStart's agentName fallback chain is now
  `actionId ?? state ?? "unknown"`. `state` is always set for agent
  actions and matches what `extractPlannedSteps` uses on the planned
  block, eliminating the "unknown" leak at the source. The reducer's
  stateId fallback stays as defense-in-depth.
- session:start merge comment now points at the actual race source
  (`NatsSessionStream.emit` fire-and-forget publish) rather than
  misattributing it to JetStream subject ordering.
- Dropped the dead `existing.actionType ?? plannedBlock.actionType`
  branch — actionType is required on a running block.

Tests:
- claude-code agent.test.ts:
  * Stub `smallLLM` so `generateProgress` returns deterministically and
    `data-tool-progress` always fires — lets us assert the FULL chunk
    order (7 events) including the progress chip, instead of filtering
    it out and masking ordering regressions.
  * New text-only assistant message case — exercises the text-emit
    branch in isolation.
  * New table-driven content-normalization cases — string, undefined,
    structured array — pin the new normalizer's contract.
  * Replaced `as { type: string; id: string }` cast with
    `toMatchObject` + `expect.assert` for narrowing.
- session-reducer.test.ts:
  * Off-plan running block preserved through session:start merge.
  * session:start with `plannedSteps: undefined` preserves all blocks.
  * Inverse agentName drift — when planned is "unknown",
    event.agentName wins on step:start.
  * step:complete match priority — stepNumber beats stateId fallback.
- event-emission-mapper.test.ts: updated the "actionId missing" case
  to assert the new state-based fallback.

100/100 tests pass across the three affected files.
@ljagiello

Copy link
Copy Markdown
Contributor Author

Addressed the review in a3a3e39. Highlights:

Critical

  • session:start now unions planned + existing blocks instead of overwriting. Off-plan running blocks (out-of-order step:start for a state not in plannedSteps, or plannedSteps: undefined) survive intact.
  • tool_result.content is normalized via a new normalizeToolResultContent helper — string passes through, undefined → "", structured arrays concatenate text blocks and collapse non-text ones to [type] markers. No more raw JS arrays leaking into tool-output-available.

Important

  • mapActionToStepStart now falls back through actionId ?? state ?? "unknown", eliminating the "unknown" leak at its source. The reducer's stateId fallback stays as defense-in-depth.
  • Misleading JetStream comment rewritten to point at NatsSessionStream.emit's fire-and-forget publish as the actual race source.
  • Dropped the dead existing.actionType ?? plannedBlock.actionType branch.

Tests (100/100 pass)

  • Stubbed smallLLM so the main ordering test asserts the full 7-event sequence including data-tool-progress — no more filtering that hid ordering regressions.
  • New text-only assistant message case.
  • Table-driven content normalization cases (string / undefined / structured array).
  • as { type: …; id: … } cast replaced with toMatchObject + expect.assert.
  • New session-reducer cases: off-plan block preservation, plannedSteps: undefined, agentName-drift inverse, step:complete priority (stepNumber over stateId).

Review doc: docs/reviews/2026-05-20-claude-code-stream-and-session-view.md.

ljagiello and others added 2 commits May 20, 2026 01:52
The deterministic-order version of the chunk-translation test relied on
mocking both `ai.generateObject` and `@atlas/llm.smallLLM` so the
`data-tool-progress` chip would always land at a predictable index.
Those partial mocks bring real risk in parallel test pools — `vi.mock`
with `importOriginal` re-evaluates the module under transform, and any
top-level side effect in `@atlas/llm` (which loads provider clients) or
`ai` (which the workspace touches everywhere) is double-evaluated. The
8-minute CI run that flaked on this PR pointed in that direction even
though it turned out to be unrelated; this change removes the surface
area regardless.

Replace the strict positional assertion with a conditional position
guard: when `data-tool-progress` IS emitted, assert it sits strictly
between `tool-input-available` and `tool-output-available`. When the
stub LLM rejects and the chip is skipped, the test still passes via the
filtered `core` array (text/tool-input/tool-output positions locked).
Same ordering invariant, no module mocks.

Tests: 33/33 in this file, 6668/6668 in the full suite.
Critical:
- Widen the `reduceStepStart` dup guard from `(stepNumber, agentName)` to
  `(stepNumber, status !== "pending")`. The agentName-drift fallback +
  stateId fallback introduced two reachable corner cases that re-created
  the duplicate-block class of bug this PR set out to fix:
  * step:start replay after stateId fallback: the block's agentName is
    preserved (e.g. "pr-reviewer") while the replay still carries
    "unknown" — the original `(stepNumber, agentName)` guard misses,
    both pending searches miss (block is now "running"), and a fresh
    "unknown" block gets appended.
  * step:complete-before-step:start with drift: step:complete completes
    the planned block via stateId fallback, then a late step:start
    falls through all pending searches (block is "completed") and
    appends a duplicate.
  Keying on stepNumber alone subsumes the original J2/H1 protection and
  catches both new cases. FSM stepNumber is monotonic per execution, so
  this doesn't regress legitimate FSM loops (they emit different
  stepNumbers).

Important:
- Drop the dead `?? "unknown"` tail in `mapActionToStepStart`'s agentName
  fallback chain. `FSMActionExecutionEvent.state` is `string` (required),
  so `event.data.actionId ?? event.data.state` is total. The reducer's
  own placeholder branch in `reduceStepComplete` remains as the genuine
  ultimate fallback.
- Drop the speculative `pending.agentName &&` falsy check in the
  preservation expression. `AgentBlockSchema.agentName` is `z.string()`,
  so the empty-string case is unreachable without a contract violation.
- Delete the conditional `data-tool-progress` position guard in
  agent.test.ts. `createStubPlatformModels` returns a `LanguageModelV3`
  whose `doGenerate`/`doStream` always reject, so `generateProgress`
  throws and the production try/catch swallows it — the chip is never
  emitted under stub models, and `progressIdx === -1` makes the guarded
  block dead code. Replaced with an updated comment explaining the
  absence; the integration smoke test covers the chip's runtime position.
- Add a 4th `it.each` case for `normalizeToolResultContent`: a
  `{ type: "text", text: 123 }` block must collapse to `[text]` (the
  two-stage guard's `typeof c.text === "string"` check). Pins the
  defensive branch against future flattening.

Tests:
- Two new regression tests in session-reducer.test.ts pin the widened
  dup guard:
  * "replay step:start after stateId-fallback is a no-op"
  * "step:start after step:complete (already transitioned) is a no-op"

103/103 in the affected files. Full repo typecheck clean.

## Progress
- Task: PR #408 round-2 review feedback (docs/reviews/2026-05-20-claude-code-stream-and-session-view-r2.md)
- Decisions:
  * Widen on stepNumber alone (not stepNumber+stateId) — stateId-only
    dedup would break legitimate FSM loops that revisit the same state
    with a different stepNumber. The asymmetry between "transitioned"
    and "pending" status is the right discriminator.
  * Delete the position guard rather than stubbing `smallLLM` — the
    integration smoke test covers the chip's runtime position; adding
    a per-role LanguageModelV3 mock just to make a conditional
    unconditional adds surface area without real coverage gain.
  * Left the "duplicate-stateId planned steps" (multi-entry FSMs) and
    "per-text-block UUID fragmentation" findings as open decisions —
    both require product judgement, not just code.
- Key Learnings:
  * `createStubPlatformModels.get(role)` returns a `LanguageModelV3`
    whose `doGenerate`/`doStream` always reject. Tests that route
    through `smallLLM`/`generateProgress` will silently fail the
    inner call and trip the production try/catch. Conditional
    positional assertions around the swallowed-error event never
    fire in CI — write a dedicated test with a per-role override
    model, or assert on the absence directly.
  * For the session reducer's dup guard, `stepNumber` alone is a
    stronger key than `(stepNumber, agentName)` once any agentName
    fallback exists. The drift creates an asymmetry the original
    composite key can't see — the block's preserved name no longer
    matches the replay's drifted name.
- Files:
  * packages/core/src/session/session-reducer.ts
  * packages/core/src/session/session-reducer.test.ts
  * packages/core/src/session/event-emission-mapper.ts
  * packages/bundled-agents/src/claude-code/agent.test.ts
@LissaGreense

Copy link
Copy Markdown
Contributor

Hi! Just hit the same "Unknown / unknown" symptom on main from a non-claude-code path — sharing a repro in case it's useful confirmation that the reducer change here generalizes to plain type: agent jobs too.

Workspace Jira Dashboard (id buttery_yogurt), one FSM action, no claude-code anywhere:

jobs:
  summarize-kan:
    triggers: [{ signal: summarize }]
    fsm:
      initial: idle
      states:
        idle: { on: { summarize: { target: run } } }
        run:
          type: final
          entry:
            - type: agent
              agentId: jira-summarizer   # llm agent, anthropic:claude-sonnet-4-6
              outputTo: summary

Session view renders two rows: Run / summary [Skipped] and unknown / unknown [Succeeded 8.4s] — same planned-Skipped + Unknown-Succeeded duplication this PR body describes. The action has no explicit id, so mapActionToStepStart falls back to agentName: "unknown" and the old (stepNumber, agentName) join misses the planned block. With the stateId fallback in this PR, stateId === "run" should match in-place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants