feat: loop evolution — accept-rate metrics, post-green review-repair, greenfield multi-role harness#44
Conversation
…ecipe Stage 1 of loop-evolution: - 1A: emit a reverted accounting event from quality-repair rollbacks, map it to an edit_reverted ledger type, derive editsReverted / acceptRate / costPerAcceptedChange in analyzeEvents + /trace. - 1B: withReview recipe knob + --with-review flag: after a one-shot run goes green, run reviewChange and feed verified findings into ONE repair cycle (reverts if it breaks the gate). - Extract shared snapshot/restore to loop/file-snapshot.ts.
…ator Stage 2 of loop-evolution: - loop/greenfield/: runGreenfield drives a features.json checklist to all-green one feature at a time, persisting state after every step (resumable, per-feature stuck guard so one feature can't wedge the run). - evaluateFeature: layered gate -> browser(renderCheck) -> judge, short- circuiting; gate stays authority, browser skip-tolerant, judge sees only the built artifact (design-rule #2). All layers injected (testable). - state.ts: features.json/spec.md/progress.md read/write + pure renderProgress. - recipe knob mode:'greenfield' selects the outer loop (orchestration stays out of recipes - they remain declarative data).
… trace-blindness ratchet Stage 3 of loop-evolution: - greenfield/plan.ts: planner role turns a one-line goal into a high-level spec.md + feature checklist (parsePlan pure/tested). - greenfield/judge.ts: harsh reject-by-default feature judge (mirrors review-change VERIFY_SYSTEM; fail-closed on unparseable). - resolveModelByName + recipe plannerModel/workModel/evaluatorModel: route each greenfield role to its own model, falling back to active (single endpoint still works). Evaluator stays trace-blind regardless. - JUDGE_INPUT_SHAPE: Record<keyof IJudgeInput,true> ratchet + test enforcing design-rule #2 (evaluator never sees the generator's trace). - prepareState (resume-first, else plan) + greenfieldMode CLI entry + --greenfield flag / recipe mode dispatch.
…ify + docs Stage 4 of loop-evolution: - greenfield/contract.ts: TSFORGE_CONTRACT-gated (off by default) propose<->object negotiation; generator proposes a build contract, evaluator pushes back to 'agreed' or maxRounds. Evaluator sees only the proposal+feature (rule #2). Persisted to .tsforge/greenfield/contracts/<feature>.md. Wired into the greenfield implement step behind the flag. - --notify <cmd>: best-effort shell ping on greenfield completion with the outcome in $TSFORGE_STATUS (for cron/unattended runs). - docs: loop/greenfield.mdx (greenfield mode, model routing, contract flag, scheduling/cron + --notify) + sidebar entry. Full `bun run validate` green (1420 tests, 0 fail).
- commands.mdx: greenfield build section, --with-review one-shot note, accept-rate/cost in the trace summary line. - cli/recipes.mdx: withReview, mode, planner/work/evaluatorModel fields. - observability/trace.mdx: accept rate + cost/accepted in the example + table. - reference/flags.mdx: TSFORGE_CONTRACT. Docs site builds clean (44 pages).
There was a problem hiding this comment.
Code Review
This pull request introduces 'Greenfield builds' to tsforge, enabling feature-by-feature application development driven by a filesystem-tracked checklist, role-based model routing, and a layered evaluator (gate, browser, and judge). It also adds post-green review-and-repair capabilities, CLI flags like --greenfield and --with-review, and metrics for tracking edit accept rates. The review feedback highlights several critical improvements: expanding the default */ file scope in snapshotFiles and scopeCode to prevent literal path errors, passing the previous contract during negotiation rounds so the generator can make informed revisions, adding defensive checks on browser.errors to avoid runtime TypeErrors, and wrapping the review-repair loop in a try...catch block to guarantee workspace rollback on failure.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…contract revision context Address PR #44 review: - snapshotFiles: expand globs via resolveScopeFiles before snapshotting. The whole-repo default scope ['**/*'] was snapshotted literally (never exists), so review-repair's revert was a silent no-op by default. (file-snapshot.test.ts) - reviewRepair: wrap implement+fix+gate in try/catch — a throw mid-repair now rolls the workspace back and rethrows, instead of leaving a half-applied edit. - negotiateContract: pass the generator its OWN previous proposal on a revision round (the stateless call couldn't revise without it). Not changed (false positives): scopeCode '**/*' (readFiles already glob-expands); browser.errors?. (errors is a required string[] on IRenderResult).
[P1] greenfield implement no-op on green gate: runTask is RED-first and returns before the model when the gate already passes. Add IRunOptions.requireRed (default true); greenfield passes false. Extracted redPrecheck() helper. [P1] rollback misses created files + 400-file cap: file-snapshot records the pre-existing path SET (uncapped) + contents; restore tombstones files that appeared after the snapshot. resolveScopeFiles/expandGlob take a limit. [P2] planner/evaluator fall back to args.model (not just workModel). [P2] greenfield per-feature runTask honours maxTurns + thinkingBudget. [P3] contract prefix no longer claims 'Agreed' when negotiation didn't converge.
… setup` runWizard used emitKeypressEvents + a keypress listener but never enabled raw mode — it assumed an already-raw caller (the REPL's readline, for /setup). Standalone `tsforge setup` runs with cooked stdin, so arrow keys echoed as raw ^[[A/^[[B and never moved the selection. runWizard now enables raw mode (and resumes stdin) when there were no prior keypress listeners (i.e. nobody else owns stdin), restoring it on exit; the REPL /setup path (raw already on, with saved listeners) is left untouched.
Why
Studied Anthropic's long-running-agent workshop + the Kopadze "loops" model and reviewed a Cursor-drafted roadmap. A codebase audit showed tsforge already implements most of that plan (deterministic gate + exit-code oracle, JSONL ledger +
/trace, adversarial review find→verify, quality snapshot/revert, TTSR memory, browser oracle, two-phasebuildStaged). So this PR builds only the genuinely missing, high-leverage pieces — and encodes the workshop's design rules as ratchets, not comments.Five design rules adopted: verifier-is-the-heart, evaluator-blind-to-generator-traces (now test-enforced), filesystem-state > context, planner-stays-high-level, harness-co-evolves (every new layer flag/knob-gated).
What
Stage 1 — accept-rate metrics + post-green review-repair (
6eca962)revertedloop event →edit_revertedledger type;quality.tsemits it on both rollback paths.analyzeEventsderiveseditsReverted/acceptRate/costPerAcceptedChange, surfaced in/trace. (The data was already there — reverts just weren't reported.)withReviewrecipe knob /--with-review: after a one-shot run goes green, run the adversarial review and feed verified findings into one repair cycle, reverting it if it breaks the gate. Newloop/review-repair.ts.loop/file-snapshot.ts(snapshot/restore) used by both quality- and review-repair.Stage 2 — greenfield filesystem-state outer loop (
ba08dca)loop/greenfield/:runGreenfielddrives a.tsforge/greenfield/features.jsonchecklist to all-green one feature at a time, persisting after every step (resumable; per-feature stuck guard so one feature can't wedge the build).evaluateFeature= layered gate → browser(renderCheck) → judge, short-circuiting. Gate stays authority; browser is skip-tolerant; judge sees only the built artifact.mode: "greenfield"knob — orchestration lives in a new loop entry, not in recipes (they stay declarative data).Stage 3 — planner + reject-by-default judge + model routing + trace-blindness ratchet (
c2e0383)VERIFY_SYSTEM, fail-closed).resolveModelByName+ recipeplannerModel/workModel/evaluatorModel→ per-role models, falling back to the active model (single-endpoint setups still work).JUDGE_INPUT_SHAPE: Record<keyof IJudgeInput, true>+ test: adding any field to the judge input is a compile error until it's listed, where the test rejects trace-ish names. Design-rule Feature/guardrail packs local uplift #2 can't silently regress.prepareState(resume-first, else plan) +greenfieldModeCLI entry +--greenfieldflag / recipe dispatch.Stage 4 — experimental contract negotiation + scheduling (
5287648)loop/greenfield/contract.ts:TSFORGE_CONTRACT-gated (off by default) propose↔object negotiation; the evaluator sees only the proposal + feature (rule Feature/guardrail packs local uplift #2). Persisted to.tsforge/greenfield/contracts/<feature>.md. The workshop itself flagged this as unproven, so it's opt-in.--notify <cmd>: best-effort shell ping on greenfield completion, outcome in$TSFORGE_STATUS(cron/unattended runs).Docs (
5287648,1d89cb1) —loop/greenfield.mdx+ updates toreference/commands,cli/recipes,observability/trace,reference/flags.Notable scope decisions
buildStaged).implementusesrunTaskper feature against the global gate; aSession.send-based implement would be more precise but needs live-model testing. Noted as a follow-up.Verification
bun run validategreen — 1420 tests, 0 fail.astro buildclean — 44 pages.