Skip to content

Conversation

@TechNickAI
Copy link
Owner

Summary

Major overhaul of /autotask - Nick's most-used workflow (dozens of times daily). Transforms it from a linear PR-creation workflow into an orchestrator-first, completion-verified, complexity-scaled autonomous development system.

Core problems solved:

  • 95% of runs stopped after PR creation (didn't complete bot feedback loop)
  • Context window filled with exploratory work (now delegates to sub-agents)
  • No complexity scaling (same process for trivial and complex tasks)
  • No plan-phase review for complex work
  • TODOs lost on compaction (now file-based state persistence)

Changes

autotask.md v2.0.2

  • Complexity levels: auto (default), quick, balanced, deep
  • State persistence: autotask-state.md in project root survives compaction
  • Mandatory completion: Task not done until /address-pr-comments executed
  • Plan-phase review: Deep mode runs /multi-review on the PLAN before implementation
  • Error recovery: Expanded with specific failure modes (git, gh, sub-agents)
  • State validation: On resume, validates branch exists, PR open, files modified

multi-review.md v2.2.0

  • Depth-aware scaling (accepts quick/balanced/deep or explicit count)
  • Respects complexity level from autotask when called as sub-command

brainstorm-synthesis (NEW skill)

  • F-thread pattern: Launch N agents with different perspectives
  • Each returns approach + trade-offs + complexity estimate
  • Synthesizes best unified solution
  • For hard architectural decisions in deep mode

Design Decisions

Complexity as organizing principle: Rather than flags or options, complexity level (auto/quick/balanced/deep) determines planning depth, review intensity, validation thoroughness, and bot wait times.

State file in project root: Appears as uncommitted file, naturally prompts LLM to notice and read it on resume. Simpler than hidden directories.

Mandatory bot feedback: The most common failure mode was stopping after PR creation. Made /address-pr-comments non-optional.

Validation Performed

  • 6-agent deep multi-review (architecture, logic, style, prompt-engineering, error-handling, robustness)
  • Fixed all high/medium issues found
  • Follows prompt-engineering.mdc standards throughout

🤖 Generated with Claude Code

Major rewrite of the core autonomous workflow:

- Add complexity levels (auto/quick/balanced/deep) with auto-detection
- Add state persistence via autotask-state.md for compaction recovery
- Make bot feedback loop mandatory with /address-pr-comments
- Add plan-phase review for deep mode
- Expand error recovery with specific failure modes
- Add state validation on resume

Also includes:
- multi-review.md: Add depth-aware scaling
- brainstorm-synthesis skill: New F-thread pattern for hard decisions
- docs/plans/autotask-v2-plan.md: Design document

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@claude
Copy link

claude bot commented Jan 14, 2026

Code Review

This is an excellent overhaul of the workflow with solid architecture and clear thinking. The complexity scaling approach, state persistence, and mandatory completion verification address real pain points. Here's my detailed review:

Strengths

Architectural Design:

  • Complexity levels (quick/balanced/deep) as the organizing principle is elegant - it scales effort appropriately without flag proliferation
  • State persistence via autotask-state.md in project root is clever - uncommitted file naturally prompts LLM attention
  • Mandatory bot feedback loop addresses the critical 95% abandonment rate
  • Context preservation through delegation is essential for long-running tasks

Implementation Quality:

  • Follows prompt-engineering.mdc principles throughout (goal-focused, positive framing, clear structure)
  • XML tags use semantic names (not numbered phases)
  • Excellent use of examples showing correct patterns without anti-patterns
  • Integration between autotask, multi-review, and brainstorm-synthesis is well-designed

Documentation:

  • Clear progression from planning document to implementation
  • Complexity level triggers are well-documented and discoverable
  • Error recovery section covers failure modes comprehensively
  • PR description format template is helpful

Suggestions for Enhancement

1. State File Cleanup Timing

autotask.md:302: The instruction to delete autotask-state.md after completion could be problematic if the user wants to reference the decision log post-merge. Consider:

After reporting, optionally suggest deleting autotask-state.md or committing it to .gitignore if the user wants to preserve the decision log for future reference.

2. Bot Polling Implementation Details

autotask.md:254-258: The polling timeouts are specified but the polling interval isn't. Consider adding:

- quick: Poll for up to 2 minutes (check every 15s)
- balanced: Poll for up to 5 minutes (check every 30s)  
- deep: Poll for up to 15 minutes (check every 60s)

3. Brainstorm-Synthesis Integration Clarity

autotask.md:74: References /brainstorm-synthesis in deep mode, but the skill is marked as new. Consider making this conditional or marking it more clearly as optional until the skill is battle-tested:

- Use /brainstorm-synthesis for hard architectural decisions during exploration (when multiple valid approaches exist)

4. Multi-Review Depth Auto-Detection

multi-review.md:41: "When called from /autotask, respect the complexity level already determined" is good, but consider how this is communicated. Does autotask pass this as context, or does multi-review read the state file? Explicit mechanism would help.

5. State Validation on Resume

autotask.md:124-131: State validation is well-specified, but consider adding a validation checklist format to the state file itself:

## State Validation (for resume)
- [ ] Branch exists: [branch-name]
- [ ] PR open: [PR-number] 
- [ ] Modified files match: [list]

Minor Issues

Consistency:

  • autotask.md:238: References .cursor/rules/git-commit-message.mdc but CLAUDE.md mentions git-commit-message.mdc is in .cursor/rules/. Verify path.
  • brainstorm-synthesis/SKILL.md:122: "Signal when the decision is too close to call" - excellent principle but could use an example of what this looks like in practice.

Typos/Grammar:

  • None found - documentation is clean and professional

Security & Risk Considerations

Low Risk Overall:

  • State file in project root could theoretically contain sensitive info if task description includes secrets - consider a warning in the state file template
  • All git operations respect hooks and avoid --no-verify appropriately
  • Bot feedback loop correctly declines suggestions rather than blindly applying them

Test Coverage

The plan mentions 6-agent deep multi-review was performed on this PR. Consider documenting what issues were found and fixed to validate the multi-review process itself works as designed.

Performance Considerations

No significant performance concerns. Polling for bot feedback is appropriately scaled to complexity. Sub-agent delegation should prevent context window exhaustion.

Recommendations

Must Fix:

  • None - this is ready to merge as-is

Should Consider:

  • Add polling interval specs to bot feedback loop
  • Clarify state file cleanup behavior (delete vs preserve)
  • Document how complexity level flows from autotask to multi-review

Nice to Have:

  • Example of "too close to call" in brainstorm-synthesis
  • State validation checklist in state file template
  • Mention of what multi-review found during this PR's own review

Conclusion

This is production-ready work that substantially improves the autotask workflow. The architectural decisions are sound, the implementation follows project standards, and the documentation is comprehensive. The mandatory completion verification should dramatically improve the user experience by preventing abandoned PRs.

Recommendation: Approve and merge

The suggestions above are enhancements for future iterations, not blockers. This PR delivers significant value and is well-executed.


Great work on this overhaul! The complexity scaling approach is particularly elegant.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df71ab6401

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 126 to 129
- Does the branch still exist?
- If PR listed, is it still open?
- Are referenced files still modified?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't treat clean working tree as stale state

The resume validation requires Are referenced files still modified?, but after committing and pushing (a normal step before PR checks/bot feedback), the working tree is expected to be clean even though the branch and PR are still active. This will incorrectly label the state as stale and trigger a “start fresh,” which risks abandoning an in‑flight PR or redoing work after compaction. Consider validating with branch/PR existence plus a stored commit SHA in the state file instead of relying on modified files.

Useful? React with 👍 / 👎.

Codex correctly caught that validating 'Are referenced files still modified?' would incorrectly flag state as stale after normal git operations. A clean working tree after commits is expected - validate branch existence and commit history instead.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@TechNickAI
Copy link
Owner Author

Excellent catch on the state validation logic! Fixed in 0bea974 - now validates branch existence and commit history instead of modified files. Clean working tree after commits is expected behavior. ❤️

@claude
Copy link

claude bot commented Jan 14, 2026

PR Review: Autotask v2.0

Summary

This is a thoughtfully designed major upgrade to /autotask that addresses real pain points from heavy usage. The complexity-scaling architecture, state persistence, and mandatory completion verification are all solid design decisions. The code quality is high, and the PR description accurately reflects the changes.

Strengths

Architecture & Design

  • Complexity levels (quick/balanced/deep) provide excellent scaling mechanism - simple tasks stay simple, complex tasks get appropriate scrutiny
  • State persistence (autotask-state.md in project root) is clever - uncommitted file naturally prompts LLM attention, survives compaction
  • Mandatory bot feedback addresses the "95% stop after PR creation" problem directly
  • Context preservation philosophy (delegate exploration, orchestrate at top level) prevents context bloat
  • Plan-phase review for deep mode catches architectural issues before implementation starts

Prompt Engineering Excellence

The prompts follow prompt-engineering.mdc standards very well:

  • Goal-focused rather than over-prescriptive (autotask.md:169-178 planning section)
  • Clear XML boundaries with semantic names (not numbered)
  • Examples would be helpful for state file format, but template is clear enough
  • Positive framing throughout ("Scale validation to complexity" not "Don't over-validate")
  • Explains motivation ("Why:" in plan doc Phase 2.1)

Integration & Composability

  • brainstorm-synthesis skill integrates cleanly into deep mode workflow
  • multi-review depth awareness respects autotask's complexity level
  • Error recovery covers specific failure modes (git, gh, sub-agents, state file)

Areas for Improvement

1. State Validation Edge Cases (Medium Priority)

Location: autotask.md:124-132

The state validation logic is good but has a subtle issue:

If state is stale (branch merged, PR closed, or no work done), report and start fresh
rather than continuing with invalid context. A clean working tree is normal after
commits - don't treat it as stale.

Issue: "no work done" is ambiguous. What if the branch exists, has commits, but those commits were from a different session/LLM? The state file might not reflect those changes.

Suggestion: Add explicit validation:

  • Compare commit timestamps to state file timestamp
  • Check if HEAD commit message matches any decisions in state file
  • If mismatch detected, offer to "adopt" existing work (read commits, update state) vs. start fresh

2. Bot Polling Timeout Behavior (Low Priority)

Location: autotask.md:255-262

If timeout reached with checks still pending, proceed with available feedback and note incomplete checks.

Issue: This could lead to missed critical feedback (e.g., security scans that take 10+ minutes). The "note incomplete checks" isn't actionable.

Suggestion: For deep mode specifically:

  • Warn user that critical checks are incomplete
  • Offer to wait longer or proceed with caveat
  • Log which checks were incomplete in PR description under "Validation Performed"

3. Brainstorm-Synthesis Prompt Clarity (Low Priority)

Location: brainstorm-synthesis/SKILL.md:58-76

The execution section is goal-focused (good) but could use one concrete example of problem framing:

Suggestion: Add example after line 59:

Example problem statement:
"We need to add rate limiting to our API. Constraints: must handle 100k req/s, existing
Redis infrastructure, can't break current clients. Success: API stays responsive under
attack while legitimate traffic flows."

4. Multi-Review Depth Auto-Detection (Medium Priority)

Location: multi-review.md:37-42

Auto-detect depth from context: single-file change with clear purpose → quick;
multi-file implementation → balanced; architectural changes, new patterns, security-
sensitive code → deep.

Issue: This duplicates complexity detection logic from autotask. If autotask's detection evolves, these will diverge.

Suggestion:

  • Extract complexity detection to shared context document
  • Reference it from both files
  • Or: multi-review always defers to autotask's complexity when called as sub-command (which line 41 does, but could be more explicit)

5. State File Cleanup Timing (Low Priority)

Location: autotask.md:303

After reporting, delete `autotask-state.md` to prevent stale state in future sessions.

Issue: If user wants to reference the state file (e.g., to understand what decisions were made), it's immediately deleted. Also, if the LLM crashes during deletion, stale state remains.

Suggestion:

  • Rename to autotask-state-[timestamp].md on completion
  • Keep last 3 completed state files
  • Clean up files older than 7 days
  • This provides history while preventing accumulation

Security Considerations

No security concerns identified. The state file contains task context but no secrets. Error recovery correctly refuses --no-verify (line 308).

Performance Considerations

Parallel agent execution is correctly specified throughout (brainstorm-synthesis line 61, multi-review line 80-81). This is critical for deep mode with 5+ agents.

Bot polling could waste time. The timeout strategy is reasonable, but consider exponential backoff:

  • First check: immediate
  • Checks 2-5: every 30 seconds
  • Checks 6+: every 60 seconds
  • Max timeout as specified

Test Coverage

Missing: No test scenario examples for how autotask should behave in edge cases:

  • What happens if /brainstorm-synthesis returns "too close to call"?
  • What if state file exists but current branch doesn't match?
  • What if PR already exists for the branch?

Recommendation: Add a "Test Scenarios" section to the plan doc or create a separate test plan.

Naming & Conventions

All naming follows project conventions:

  • Semantic XML tags (not numbered)
  • Complexity levels are consistent (quick/balanced/deep)
  • Version bump to 2.0.2 is appropriate for major rewrite
  • # prettier-ignore used correctly for long descriptions

Documentation Quality

PR description is excellent - clear summary, design decisions explained, validation noted. The plan doc (427 lines) is thorough and well-structured.

Minor: The plan doc asks 4 questions at the end (line 422-427), but only one is resolved. Consider updating or removing the questions section before merge.

Recommendations

Must Fix Before Merge

None - this is production-ready.

Should Fix Soon

  1. State validation edge cases (Bump actions/checkout from 4 to 5 #1 above) - prevents confusing behavior when resuming
  2. Multi-review depth auto-detection duplication (♻️ Refactor all prompts for LLM-to-LLM communication best practices #4) - prevents divergence

Nice to Have

  1. Bot polling timeout improvements (feat: Transform into Claude Code plugin marketplace #2)
  2. Brainstorm-synthesis example (Add adaptive review workflow and XML naming standards #3)
  3. State file cleanup timing (Create Claude slash command for llms.txt generation #5)
  4. Update/remove questions section in plan doc

Final Assessment

Approve with minor suggestions. This is high-quality work that solves real problems. The architecture is sound, the implementation follows best practices, and the integration points are clean. The suggestions above are refinements, not blockers.

The mandatory bot feedback loop alone makes this a significant improvement. Combined with complexity scaling and state persistence, this transforms /autotask from a PR creator into a true autonomous development system.

Complexity Level: deep (appropriate - this is architectural change introducing new patterns)

Estimated Value: High - solves the "95% don't complete" problem and enables context-preserving workflows

@claude
Copy link

claude bot commented Jan 14, 2026

Code Review: Autotask v2.0

Summary

This is a well-designed major overhaul that addresses real pain points with /autotask. The complexity scaling, state persistence, and mandatory completion verification are valuable improvements. The code follows the repository's prompt engineering standards effectively.

Strengths

1. Clear problem identification and solution

  • The 95% incomplete rate is a concrete metric justifying the mandatory bot feedback loop
  • State persistence addresses a real issue with context compaction
  • Complexity scaling prevents over-engineering simple tasks

2. Excellent adherence to prompt-engineering.mdc

  • Goal-focused over process-driven (lines 169-178 in autotask.md)
  • Consistent XML tag structure with semantic names (<planning>, <implementation> not <phase-1>)
  • Positive framing throughout ("Trust git hooks" vs "Don't bypass hooks")
  • Assumes executing model is smart (delegates to sub-agents rather than micro-managing)

3. Well-structured state file design

  • Placing autotask-state.md in project root is clever - uncommitted files naturally prompt LLM attention
  • State validation on resume (lines 124-132) prevents stale context bugs
  • Proper cleanup after completion (line 303)

4. Strong error recovery

  • Specific failure modes for git, GitHub CLI, sub-agents, state files (lines 306-320)
  • "Never swallow errors silently" is the right principle

Issues Found

High Priority

1. State file deletion race condition (autotask.md:303)

After reporting, delete `autotask-state.md` to prevent stale state in future sessions.

Issue: If the LLM crashes or session ends before deletion, the stale state file persists and will be read on next invocation, causing confusion.

Fix: Add timestamp validation. On resume, if state file is >24 hours old OR branch mentioned doesn't exist, treat as stale automatically.

**On resume, validate state matches reality:**
- Does the branch still exist?
- If PR listed, is it still open?
- Are there commits on the branch since it was created?
- Is the state file less than 24 hours old?

If state is stale (old timestamp, branch merged, PR closed), archive to `autotask-state-[timestamp].bak` and start fresh.

2. Bot polling timeout guidance unclear (autotask.md:256-262)

The polling timeouts are arbitrary without context:

  • Why 2/5/15 minutes specifically?
  • What if bots are configured but take 20 minutes?
  • No guidance on what "proceed with available feedback" means

Fix: Add guidance on checking if bots are configured and clarify what to do on timeout:

After PR creation, check configured checks with `gh pr checks --json name`:

- If no checks configured → skip directly to completion
- If checks exist, poll with `gh pr checks --watch`:
  - quick: Poll for up to 2 minutes (fast linters only)
  - balanced: Poll for up to 5 minutes (linters + unit tests)
  - deep: Poll for up to 15 minutes (full CI suite)

If timeout reached with checks still pending:
- In quick/balanced: Proceed with available results
- In deep: Report pending checks to user, ask whether to wait or proceed

Always note which checks completed and which didn't in the final report.

Medium Priority

3. Missing fallback if /address-pr-comments doesn't exist (autotask.md:264)

The command states "/address-pr-comments... is not optional" but doesn't handle if that command isn't installed.

Fix: Add fallback behavior:

Execute /address-pr-comments on the PR. If that command is not available, manually review bot feedback:
- Read PR comments with `gh pr view [number] --comments`
- Fix valid issues
- Reply to declined items with rationale

4. Complexity auto-detection criteria too vague (autotask.md:28-42)

"Single file → quick. Multi-file → balanced" is too simplistic. A single file touching auth is higher risk than 10 files updating docs.

Fix: Weight risk higher than file count:

Auto-detect complexity in this order (first match wins):

1. **Risk factors** (→ deep): auth, payments, data migrations, crypto, permissions
2. **Scope** (→ deep): >5 files OR cross-cutting changes
3. **Novelty** (→ balanced): New patterns or uncertainty in approach  
4. **Simplicity** (→ quick): Single file, clear fix, established pattern
5. **Default** (→ balanced): When in doubt

5. brainstorm-synthesis.md lacks concrete example (SKILL.md:92-112)

The output format is described but there's no concrete example. Following prompt-engineering.mdc: "Examples teach patterns" - this skill needs one.

Fix: Add an example synthesis result in the output format section showing what a real synthesis looks like.

Low Priority

6. Minor: Inconsistent terminology for "depth" vs "complexity"

multi-review.md uses "depth" (lines 21-42) while autotask.md uses "complexity". Pick one term for both.

Suggestion: Use "complexity" everywhere since it's more intuitive for users.

7. Version bumps look correct

All three files bumped to appropriate versions:

  • autotask.md: 1.2.1 → 2.0.2 (major version, appropriate)
  • multi-review.md: appears to be → 2.2.0 (minor feature add, appropriate)
  • brainstorm-synthesis: 1.0.0 (new skill, appropriate)

Testing Recommendations

  1. Test state file recovery: Manually kill a session mid-task, resume, verify state file is read and validated correctly
  2. Test complexity detection: Run with tasks explicitly labeled "quick"/"deep" and verify appropriate flows
  3. Test bot timeout: Create PR in repo with slow checks, verify timeout behavior
  4. Test missing /address-pr-comments: Run in environment without that command installed

Architecture Assessment

The design is sound. Key architectural decisions are well-reasoned:

  • Complexity as organizing principle: Better than boolean flags or separate modes
  • Orchestrator-first delegation: Preserves context, addresses the feat: Transform into Claude Code plugin marketplace #2 problem directly
  • Mandatory completion: Directly fixes the 95% incomplete rate
  • File-based state: Simpler than database or hidden directories

The only concern is the growing complexity of autotask.md (357 lines). Consider whether some sections (like error-recovery) could be extracted to referenced files to keep the main flow clearer. Not required for this PR, but worth considering for maintainability.

Verdict

Approve with suggested fixes for high-priority issues.

The high-priority issues are edge cases that will cause frustration when hit:

  • Stale state files will cause confusing failures
  • Timeout ambiguity will leave users unsure what to do
  • Missing /address-pr-comments fallback breaks the mandatory completion flow

Fix these three, and this is a solid improvement to autotask.

The design is excellent, the implementation follows project standards well, and the problem-solution fit is clear. Nice work on the comprehensive plan document - it made the review much easier.


Generated with Claude Code

@TechNickAI TechNickAI merged commit e0ad668 into main Jan 14, 2026
3 checks passed
@TechNickAI TechNickAI deleted the feature/autotask-v2 branch January 14, 2026 22:28

Iterate on bot feedback until critical issues are resolved. </bot-feedback-loop>
- PR created with proper description
- Review bots have completed (or confirmed none configured)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completion criteria inconsistent with timeout handling

Medium Severity

The <bot-feedback-loop> section explicitly allows proceeding when timeout is reached with checks still pending ("proceed with available feedback and note incomplete checks"), but the <completion-verification> criterion requires "Review bots have completed (or confirmed none configured)." When the timeout path is taken, bots haven't completed, so the completion check would fail even though this scenario is explicitly permitted by the workflow. The completion criteria need to account for the timeout case.

Additional Locations (1)

Fix in Cursor Fix in Web

## Blockers

[populated if encountered]
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State file template missing branch and PR fields

Medium Severity

The state file template lacks fields for branch name, PR number, and worktree path, yet the validation logic on resume requires checking "Does the branch still exist?", "If PR listed, is it still open?", and "Are there commits on the branch since it was created?" After context compaction, the LLM reading this state file wouldn't have the branch name needed to perform these validations, potentially causing incorrect resume behavior or validation against the wrong branch.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants