test(e2e): replicate the real user experience across subsystems by agjs · Pull Request #53 · agjs/tsforge

agjs · 2026-06-27T10:25:47Z

Generalizes the editor's VirtualScreen win so the real user experience is replayable in tests — asserting observable behavior (rendered screen, tool runs, file changes, gate verdicts), not just logical state. Goal: catch breakage in CI before a human hits it by hand.

What's covered

Reusable harnesses

ScriptedModel (tests/helpers/scripted-model.ts) — a deterministic IProvider driving the REAL agent loop from scripted turns; a turn can be a function of the conversation so far, so it reacts to gate feedback / tool results.
runScriptedSession (tests/helpers/session-harness.ts) — wires it to a real Session (real tools, real gate, temp cwd), captures the ILoopEvent stream + file changes.
VirtualScreen (shipped in v0.24.0) — renders emitted bytes to a screen grid; reused here for every TTY-render assertion.

Tier 1 — agent session loop (12 tests)

Conversational turn, create→disk, passing gate→done, failing gate→repair→green→done, out-of-scope rejection, run-tool shell exec, edit-tool snippet replace, read round-trip (tool result reaches the model), multi-file build, plan-mode write rejection (the read-only guarantee), plan-mode read allowed, auto-fix runs before re-gate.

Tier 3 — interactive TTY overlays (11 tests)

Wizard rendered at each step (title, cursor gutter, checkbox toggle, overview), command palette (filtered list + selection), @-file picker dropdown, and overlay-shrink → no ghost rows (the same bug class as the editor, now guarded for the picker). These close the render blind spot: the existing wizard tests assert reducer STATE only.

Findings (no duplication added)

Scaffolder (Tier 2) — already comprehensively covered by scaffold-run.test.ts (full runScaffold: clone+configure+boot+gate, skipBoot, invalid-config refusal, manifest-source-of-truth, astro). No gap.
Review (Tier 4) — already covered: review-change.test.ts drives reviewChange with stub providers (find/verify passes, non-JSON). No gap.

The two genuine gaps were the session-loop e2e and the TTY-render e2e — both filled here.

Stacked off main after v0.24.0.

Foundation for replicating the real user experience in tests, reusable by every later flow. ScriptedModel is a deterministic IProvider that drives the REAL agent loop from a sequence of turns; runScriptedSession wires it to a real Session with real tools + real gate over a temp cwd and captures the observable event stream (ILoopEvent) + file changes. 6 Tier-1 tests prove the loop end-to-end: conversational turn (no changes), create-tool → file on disk, passing gate → done, failing gate → repair → green → done, out-of-scope create rejected, run tool executes a shell command. No live LLM, fully deterministic.

3 more session e2e tests: the edit tool replaces a snippet in a seeded file; a read tool result flows back into the conversation (the reactive turn copies the read value into a new file, proving the tool-result→model loop); a multi-file build creates every file in one session. Reuses the scripted-model harness.

gemini-code-assist

Code Review

This pull request introduces a deterministic end-to-end testing harness for the agent loop, featuring a scripted model provider (scripted-model.ts), a session runner helper (session-harness.ts), and an integration test suite (session-e2e.test.ts). The feedback recommends guarding the scripted model against infinite loops by throwing an error when calls exceed the defined turns, and replacing Unix-specific shell commands like true and test -f with cross-platform Node.js commands to ensure compatibility with Windows environments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Drives the wizard reducer through real key-actions and renders each frame to a VirtualScreen (exactly as the driver does: CLEAR_HOME + renderFrame), asserting what the USER SEES — step title, cursor gutter on the right option, checkbox glyphs flipping on toggle, the overview. Same for the command palette's renderMenu (filtered list + selection). The existing wizard tests assert reducer STATE only; these close the render blind spot that hid the editor ghost bugs. 7 tests.

Harness gains policyMode. 3 tests guard safety-critical loop behavior: plan mode rejects a create (no write escapes — the read-only guarantee), plan mode still allows reads, and the auto-fix command runs before the gate re-validates (proven via a marker file the gate checks for).

4 tests: query narrows the dropdown + selection gutter renders; empty match shows the 'no matching file' hint; truncatePath fits the width; and StatusBar.setOverlay shrinking from 5→2 items leaves NO ghost rows — the same bug class as the editor, now guarded for the @-picker overlay too.

Exports actionFor so the keypress→action mapping is unit-testable (the raw-mode TTY plumbing around it stays PTY-only — a documented Bun limit). 5 tests guard the decode: arrows→nav, space→toggle, enter/return→confirm, escape/ctrl-c→cancel, b→back, q→cancel, unknown→ignored, and that a decoded action drives the reducer. Guards the key-mapping regression class (cf. the past wizard arrow-key bug).

…mmands + adversarial hunt - scripted-model: throw when called past the script (loop didn't terminate) so a misconfigured test fails fast instead of hanging (Gemini review). - session-e2e: replace Unix-only gate/fix commands (true, test -f, touch, exit) with portable `node -e` equivalents so the suite runs on Windows too (Gemini review). - add session-e2e-hunt.test.ts: 7 adversarial probes (non-matching/ambiguous edit, create-over-existing, failing gate command, maxTurns exhaustion, malformed tool args, non-zero exit) — all pass, so the loop is robust on these edges (now guarded).

agjs added 2 commits June 27, 2026 12:25

gemini-code-assist Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread packages/core/tests/helpers/scripted-model.ts

Comment thread packages/core/tests/session-e2e.test.ts Outdated

Comment thread packages/core/tests/session-e2e.test.ts Outdated

agjs added 3 commits June 27, 2026 12:33

agjs marked this pull request as ready for review June 27, 2026 10:40

agjs added 2 commits June 27, 2026 12:42

agjs merged commit e9c1b5f into main Jun 27, 2026
8 checks passed

agjs deleted the feat/e2e-harness branch June 27, 2026 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): replicate the real user experience across subsystems#53

test(e2e): replicate the real user experience across subsystems#53
agjs merged 7 commits into
mainfrom
feat/e2e-harness

agjs commented Jun 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

agjs commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's covered

Reusable harnesses

Tier 1 — agent session loop (12 tests)

Tier 3 — interactive TTY overlays (11 tests)

Findings (no duplication added)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

agjs commented Jun 27, 2026 •

edited

Loading