feat: loop evolution — accept-rate metrics, post-green review-repair, greenfield multi-role harness by agjs · Pull Request #44 · agjs/tsforge

agjs · 2026-06-22T11:54:14Z

Why

Studied Anthropic's long-running-agent workshop + the Kopadze "loops" model and reviewed a Cursor-drafted roadmap. A codebase audit showed tsforge already implements most of that plan (deterministic gate + exit-code oracle, JSONL ledger + /trace, adversarial review find→verify, quality snapshot/revert, TTSR memory, browser oracle, two-phase buildStaged). So this PR builds only the genuinely missing, high-leverage pieces — and encodes the workshop's design rules as ratchets, not comments.

Five design rules adopted: verifier-is-the-heart, evaluator-blind-to-generator-traces (now test-enforced), filesystem-state > context, planner-stays-high-level, harness-co-evolves (every new layer flag/knob-gated).

What

Stage 1 — accept-rate metrics + post-green review-repair (6eca962)

New reverted loop event → edit_reverted ledger type; quality.ts emits it on both rollback paths. analyzeEvents derives editsReverted / acceptRate / costPerAcceptedChange, surfaced in /trace. (The data was already there — reverts just weren't reported.)
withReview recipe knob / --with-review: after a one-shot run goes green, run the adversarial review and feed verified findings into one repair cycle, reverting it if it breaks the gate. New loop/review-repair.ts.
Extracted shared loop/file-snapshot.ts (snapshot/restore) used by both quality- and review-repair.

Stage 2 — greenfield filesystem-state outer loop (ba08dca)

loop/greenfield/: runGreenfield drives a .tsforge/greenfield/features.json checklist to all-green one feature at a time, persisting after every step (resumable; per-feature stuck guard so one feature can't wedge the build).
evaluateFeature = layered gate → browser(renderCheck) → judge, short-circuiting. Gate stays authority; browser is skip-tolerant; judge sees only the built artifact.
Recipe mode: "greenfield" knob — orchestration lives in a new loop entry, not in recipes (they stay declarative data).

Stage 3 — planner + reject-by-default judge + model routing + trace-blindness ratchet (c2e0383)

Planner turns a one-line goal into spec + checklist; reject-by-default feature judge (mirrors review's VERIFY_SYSTEM, fail-closed).
resolveModelByName + recipe plannerModel/workModel/evaluatorModel → per-role models, falling back to the active model (single-endpoint setups still work).
JUDGE_INPUT_SHAPE: Record<keyof IJudgeInput, true> + test: adding any field to the judge input is a compile error until it's listed, where the test rejects trace-ish names. Design-rule Feature/guardrail packs local uplift #2 can't silently regress.
prepareState (resume-first, else plan) + greenfieldMode CLI entry + --greenfield flag / recipe dispatch.

Stage 4 — experimental contract negotiation + scheduling (5287648)

loop/greenfield/contract.ts: TSFORGE_CONTRACT-gated (off by default) propose↔object negotiation; the evaluator sees only the proposal + feature (rule Feature/guardrail packs local uplift #2). Persisted to .tsforge/greenfield/contracts/<feature>.md. The workshop itself flagged this as unproven, so it's opt-in.
--notify <cmd>: best-effort shell ping on greenfield completion, outcome in $TSFORGE_STATUS (cron/unattended runs).

Docs (5287648, 1d89cb1) — loop/greenfield.mdx + updates to reference/commands, cli/recipes, observability/trace, reference/flags.

Notable scope decisions

Did not rebuild what exists (no "breadcrumbs" file — the ledger already records the failure gradient; evaluator reuses gate+oracle+judge; contract seeds off buildStaged).
The greenfield CLI implement uses runTask per feature against the global gate; a Session.send-based implement would be more precise but needs live-model testing. Noted as a follow-up.

Verification

bun run validate green — 1420 tests, 0 fail.
Every new regression test was flip-and-confirm-failed (incl. the trace-blindness ratchet, the revert logic, and the per-feature stuck guard).
astro build clean — 44 pages.
The full greenfield end-to-end path needs a live model + Playwright; its orchestration core is fully unit-tested with mocks.

…ecipe Stage 1 of loop-evolution: - 1A: emit a reverted accounting event from quality-repair rollbacks, map it to an edit_reverted ledger type, derive editsReverted / acceptRate / costPerAcceptedChange in analyzeEvents + /trace. - 1B: withReview recipe knob + --with-review flag: after a one-shot run goes green, run reviewChange and feed verified findings into ONE repair cycle (reverts if it breaks the gate). - Extract shared snapshot/restore to loop/file-snapshot.ts.

…ator Stage 2 of loop-evolution: - loop/greenfield/: runGreenfield drives a features.json checklist to all-green one feature at a time, persisting state after every step (resumable, per-feature stuck guard so one feature can't wedge the run). - evaluateFeature: layered gate -> browser(renderCheck) -> judge, short- circuiting; gate stays authority, browser skip-tolerant, judge sees only the built artifact (design-rule #2). All layers injected (testable). - state.ts: features.json/spec.md/progress.md read/write + pure renderProgress. - recipe knob mode:'greenfield' selects the outer loop (orchestration stays out of recipes - they remain declarative data).

… trace-blindness ratchet Stage 3 of loop-evolution: - greenfield/plan.ts: planner role turns a one-line goal into a high-level spec.md + feature checklist (parsePlan pure/tested). - greenfield/judge.ts: harsh reject-by-default feature judge (mirrors review-change VERIFY_SYSTEM; fail-closed on unparseable). - resolveModelByName + recipe plannerModel/workModel/evaluatorModel: route each greenfield role to its own model, falling back to active (single endpoint still works). Evaluator stays trace-blind regardless. - JUDGE_INPUT_SHAPE: Record<keyof IJudgeInput,true> ratchet + test enforcing design-rule #2 (evaluator never sees the generator's trace). - prepareState (resume-first, else plan) + greenfieldMode CLI entry + --greenfield flag / recipe mode dispatch.

…ify + docs Stage 4 of loop-evolution: - greenfield/contract.ts: TSFORGE_CONTRACT-gated (off by default) propose<->object negotiation; generator proposes a build contract, evaluator pushes back to 'agreed' or maxRounds. Evaluator sees only the proposal+feature (rule #2). Persisted to .tsforge/greenfield/contracts/<feature>.md. Wired into the greenfield implement step behind the flag. - --notify <cmd>: best-effort shell ping on greenfield completion with the outcome in $TSFORGE_STATUS (for cron/unattended runs). - docs: loop/greenfield.mdx (greenfield mode, model routing, contract flag, scheduling/cron + --notify) + sidebar entry. Full `bun run validate` green (1420 tests, 0 fail).

- commands.mdx: greenfield build section, --with-review one-shot note, accept-rate/cost in the trace summary line. - cli/recipes.mdx: withReview, mode, planner/work/evaluatorModel fields. - observability/trace.mdx: accept rate + cost/accepted in the example + table. - reference/flags.mdx: TSFORGE_CONTRACT. Docs site builds clean (44 pages).

gemini-code-assist

Code Review

This pull request introduces 'Greenfield builds' to tsforge, enabling feature-by-feature application development driven by a filesystem-tracked checklist, role-based model routing, and a layered evaluator (gate, browser, and judge). It also adds post-green review-and-repair capabilities, CLI flags like --greenfield and --with-review, and metrics for tracking edit accept rates. The review feedback highlights several critical improvements: expanding the default */ file scope in snapshotFiles and scopeCode to prevent literal path errors, passing the previous contract during negotiation rounds so the generator can make informed revisions, adding defensive checks on browser.errors to avoid runtime TypeErrors, and wrapping the review-repair loop in a try...catch block to guarantee workspace rollback on failure.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

…contract revision context Address PR #44 review: - snapshotFiles: expand globs via resolveScopeFiles before snapshotting. The whole-repo default scope ['**/*'] was snapshotted literally (never exists), so review-repair's revert was a silent no-op by default. (file-snapshot.test.ts) - reviewRepair: wrap implement+fix+gate in try/catch — a throw mid-repair now rolls the workspace back and rethrows, instead of leaving a half-applied edit. - negotiateContract: pass the generator its OWN previous proposal on a revision round (the stateless call couldn't revise without it). Not changed (false positives): scopeCode '**/*' (readFiles already glob-expands); browser.errors?. (errors is a required string[] on IRenderResult).

[P1] greenfield implement no-op on green gate: runTask is RED-first and returns before the model when the gate already passes. Add IRunOptions.requireRed (default true); greenfield passes false. Extracted redPrecheck() helper. [P1] rollback misses created files + 400-file cap: file-snapshot records the pre-existing path SET (uncapped) + contents; restore tombstones files that appeared after the snapshot. resolveScopeFiles/expandGlob take a limit. [P2] planner/evaluator fall back to args.model (not just workModel). [P2] greenfield per-feature runTask honours maxTurns + thinkingBudget. [P3] contract prefix no longer claims 'Agreed' when negotiation didn't converge.

… setup` runWizard used emitKeypressEvents + a keypress listener but never enabled raw mode — it assumed an already-raw caller (the REPL's readline, for /setup). Standalone `tsforge setup` runs with cooked stdin, so arrow keys echoed as raw ^[[A/^[[B and never moved the selection. runWizard now enables raw mode (and resumes stdin) when there were no prior keypress listeners (i.e. nobody else owns stdin), restoring it on exit; the REPL /setup path (raw already on, with saved listeners) is left untouched.

agjs added 5 commits June 22, 2026 12:25

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

agjs added 3 commits June 22, 2026 14:07

agjs merged commit c377869 into main Jun 22, 2026
8 checks passed

agjs deleted the feat/loop-evolution branch June 22, 2026 13:01

agjs mentioned this pull request Jun 22, 2026

fix: harden greenfield + repair paths (2nd-pass review — 2×P1, 2×P2) #45

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: loop evolution — accept-rate metrics, post-green review-repair, greenfield multi-role harness#44

feat: loop evolution — accept-rate metrics, post-green review-repair, greenfield multi-role harness#44
agjs merged 8 commits into
mainfrom
feat/loop-evolution

agjs commented Jun 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

agjs commented Jun 22, 2026

Why

What

Notable scope decisions

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant