-
Notifications
You must be signed in to change notification settings - Fork 1
✨ Autotask v2.0: Complexity scaling, state persistence, mandatory completion #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Major rewrite of the core autonomous workflow: - Add complexity levels (auto/quick/balanced/deep) with auto-detection - Add state persistence via autotask-state.md for compaction recovery - Make bot feedback loop mandatory with /address-pr-comments - Add plan-phase review for deep mode - Expand error recovery with specific failure modes - Add state validation on resume Also includes: - multi-review.md: Add depth-aware scaling - brainstorm-synthesis skill: New F-thread pattern for hard decisions - docs/plans/autotask-v2-plan.md: Design document Co-Authored-By: Claude Opus 4.5 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Code ReviewThis is an excellent overhaul of the workflow with solid architecture and clear thinking. The complexity scaling approach, state persistence, and mandatory completion verification address real pain points. Here's my detailed review: StrengthsArchitectural Design:
Implementation Quality:
Documentation:
Suggestions for Enhancement1. State File Cleanup Timing
After reporting, optionally suggest deleting autotask-state.md or committing it to .gitignore if the user wants to preserve the decision log for future reference.2. Bot Polling Implementation Details
- quick: Poll for up to 2 minutes (check every 15s)
- balanced: Poll for up to 5 minutes (check every 30s)
- deep: Poll for up to 15 minutes (check every 60s)3. Brainstorm-Synthesis Integration Clarity
- Use /brainstorm-synthesis for hard architectural decisions during exploration (when multiple valid approaches exist)4. Multi-Review Depth Auto-Detection
5. State Validation on Resume
## State Validation (for resume)
- [ ] Branch exists: [branch-name]
- [ ] PR open: [PR-number]
- [ ] Modified files match: [list]Minor IssuesConsistency:
Typos/Grammar:
Security & Risk ConsiderationsLow Risk Overall:
Test CoverageThe plan mentions 6-agent deep multi-review was performed on this PR. Consider documenting what issues were found and fixed to validate the multi-review process itself works as designed. Performance ConsiderationsNo significant performance concerns. Polling for bot feedback is appropriately scaled to complexity. Sub-agent delegation should prevent context window exhaustion. RecommendationsMust Fix:
Should Consider:
Nice to Have:
ConclusionThis is production-ready work that substantially improves the autotask workflow. The architectural decisions are sound, the implementation follows project standards, and the documentation is comprehensive. The mandatory completion verification should dramatically improve the user experience by preventing abandoned PRs. Recommendation: Approve and merge The suggestions above are enhancements for future iterations, not blockers. This PR delivers significant value and is well-executed. Great work on this overhaul! The complexity scaling approach is particularly elegant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df71ab6401
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - Does the branch still exist? | ||
| - If PR listed, is it still open? | ||
| - Are referenced files still modified? | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't treat clean working tree as stale state
The resume validation requires Are referenced files still modified?, but after committing and pushing (a normal step before PR checks/bot feedback), the working tree is expected to be clean even though the branch and PR are still active. This will incorrectly label the state as stale and trigger a “start fresh,” which risks abandoning an in‑flight PR or redoing work after compaction. Consider validating with branch/PR existence plus a stored commit SHA in the state file instead of relying on modified files.
Useful? React with 👍 / 👎.
Codex correctly caught that validating 'Are referenced files still modified?' would incorrectly flag state as stale after normal git operations. A clean working tree after commits is expected - validate branch existence and commit history instead. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Excellent catch on the state validation logic! Fixed in 0bea974 - now validates branch existence and commit history instead of modified files. Clean working tree after commits is expected behavior. ❤️ |
PR Review: Autotask v2.0SummaryThis is a thoughtfully designed major upgrade to StrengthsArchitecture & Design
Prompt Engineering ExcellenceThe prompts follow
Integration & Composability
Areas for Improvement1. State Validation Edge Cases (Medium Priority)Location: The state validation logic is good but has a subtle issue: If state is stale (branch merged, PR closed, or no work done), report and start fresh
rather than continuing with invalid context. A clean working tree is normal after
commits - don't treat it as stale.Issue: "no work done" is ambiguous. What if the branch exists, has commits, but those commits were from a different session/LLM? The state file might not reflect those changes. Suggestion: Add explicit validation:
2. Bot Polling Timeout Behavior (Low Priority)Location: If timeout reached with checks still pending, proceed with available feedback and note incomplete checks.Issue: This could lead to missed critical feedback (e.g., security scans that take 10+ minutes). The "note incomplete checks" isn't actionable. Suggestion: For deep mode specifically:
3. Brainstorm-Synthesis Prompt Clarity (Low Priority)Location: The execution section is goal-focused (good) but could use one concrete example of problem framing: Suggestion: Add example after line 59: Example problem statement:
"We need to add rate limiting to our API. Constraints: must handle 100k req/s, existing
Redis infrastructure, can't break current clients. Success: API stays responsive under
attack while legitimate traffic flows."4. Multi-Review Depth Auto-Detection (Medium Priority)Location: Auto-detect depth from context: single-file change with clear purpose → quick;
multi-file implementation → balanced; architectural changes, new patterns, security-
sensitive code → deep.Issue: This duplicates complexity detection logic from autotask. If autotask's detection evolves, these will diverge. Suggestion:
5. State File Cleanup Timing (Low Priority)Location: After reporting, delete `autotask-state.md` to prevent stale state in future sessions.Issue: If user wants to reference the state file (e.g., to understand what decisions were made), it's immediately deleted. Also, if the LLM crashes during deletion, stale state remains. Suggestion:
Security ConsiderationsNo security concerns identified. The state file contains task context but no secrets. Error recovery correctly refuses Performance ConsiderationsParallel agent execution is correctly specified throughout (brainstorm-synthesis line 61, multi-review line 80-81). This is critical for deep mode with 5+ agents. Bot polling could waste time. The timeout strategy is reasonable, but consider exponential backoff:
Test CoverageMissing: No test scenario examples for how autotask should behave in edge cases:
Recommendation: Add a "Test Scenarios" section to the plan doc or create a separate test plan. Naming & ConventionsAll naming follows project conventions:
Documentation QualityPR description is excellent - clear summary, design decisions explained, validation noted. The plan doc (427 lines) is thorough and well-structured. Minor: The plan doc asks 4 questions at the end (line 422-427), but only one is resolved. Consider updating or removing the questions section before merge. RecommendationsMust Fix Before MergeNone - this is production-ready. Should Fix Soon
Nice to Have
Final AssessmentApprove with minor suggestions. This is high-quality work that solves real problems. The architecture is sound, the implementation follows best practices, and the integration points are clean. The suggestions above are refinements, not blockers. The mandatory bot feedback loop alone makes this a significant improvement. Combined with complexity scaling and state persistence, this transforms Complexity Level: deep (appropriate - this is architectural change introducing new patterns) Estimated Value: High - solves the "95% don't complete" problem and enables context-preserving workflows |
Co-Authored-By: Claude Opus 4.5 <[email protected]>
Code Review: Autotask v2.0SummaryThis is a well-designed major overhaul that addresses real pain points with Strengths1. Clear problem identification and solution
2. Excellent adherence to prompt-engineering.mdc
3. Well-structured state file design
4. Strong error recovery
Issues FoundHigh Priority1. State file deletion race condition (autotask.md:303) After reporting, delete `autotask-state.md` to prevent stale state in future sessions.Issue: If the LLM crashes or session ends before deletion, the stale state file persists and will be read on next invocation, causing confusion. Fix: Add timestamp validation. On resume, if state file is >24 hours old OR branch mentioned doesn't exist, treat as stale automatically. **On resume, validate state matches reality:**
- Does the branch still exist?
- If PR listed, is it still open?
- Are there commits on the branch since it was created?
- Is the state file less than 24 hours old?
If state is stale (old timestamp, branch merged, PR closed), archive to `autotask-state-[timestamp].bak` and start fresh.2. Bot polling timeout guidance unclear (autotask.md:256-262) The polling timeouts are arbitrary without context:
Fix: Add guidance on checking if bots are configured and clarify what to do on timeout: After PR creation, check configured checks with `gh pr checks --json name`:
- If no checks configured → skip directly to completion
- If checks exist, poll with `gh pr checks --watch`:
- quick: Poll for up to 2 minutes (fast linters only)
- balanced: Poll for up to 5 minutes (linters + unit tests)
- deep: Poll for up to 15 minutes (full CI suite)
If timeout reached with checks still pending:
- In quick/balanced: Proceed with available results
- In deep: Report pending checks to user, ask whether to wait or proceed
Always note which checks completed and which didn't in the final report.Medium Priority3. Missing fallback if /address-pr-comments doesn't exist (autotask.md:264) The command states "/address-pr-comments... is not optional" but doesn't handle if that command isn't installed. Fix: Add fallback behavior: Execute /address-pr-comments on the PR. If that command is not available, manually review bot feedback:
- Read PR comments with `gh pr view [number] --comments`
- Fix valid issues
- Reply to declined items with rationale4. Complexity auto-detection criteria too vague (autotask.md:28-42) "Single file → quick. Multi-file → balanced" is too simplistic. A single file touching auth is higher risk than 10 files updating docs. Fix: Weight risk higher than file count: Auto-detect complexity in this order (first match wins):
1. **Risk factors** (→ deep): auth, payments, data migrations, crypto, permissions
2. **Scope** (→ deep): >5 files OR cross-cutting changes
3. **Novelty** (→ balanced): New patterns or uncertainty in approach
4. **Simplicity** (→ quick): Single file, clear fix, established pattern
5. **Default** (→ balanced): When in doubt5. brainstorm-synthesis.md lacks concrete example (SKILL.md:92-112) The output format is described but there's no concrete example. Following prompt-engineering.mdc: "Examples teach patterns" - this skill needs one. Fix: Add an example synthesis result in the output format section showing what a real synthesis looks like. Low Priority6. Minor: Inconsistent terminology for "depth" vs "complexity" multi-review.md uses "depth" (lines 21-42) while autotask.md uses "complexity". Pick one term for both. Suggestion: Use "complexity" everywhere since it's more intuitive for users. 7. Version bumps look correct All three files bumped to appropriate versions:
Testing Recommendations
Architecture AssessmentThe design is sound. Key architectural decisions are well-reasoned:
The only concern is the growing complexity of autotask.md (357 lines). Consider whether some sections (like error-recovery) could be extracted to referenced files to keep the main flow clearer. Not required for this PR, but worth considering for maintainability. VerdictApprove with suggested fixes for high-priority issues. The high-priority issues are edge cases that will cause frustration when hit:
Fix these three, and this is a solid improvement to autotask. The design is excellent, the implementation follows project standards well, and the problem-solution fit is clear. Nice work on the comprehensive plan document - it made the review much easier. Generated with Claude Code |
|
|
||
| Iterate on bot feedback until critical issues are resolved. </bot-feedback-loop> | ||
| - PR created with proper description | ||
| - Review bots have completed (or confirmed none configured) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completion criteria inconsistent with timeout handling
Medium Severity
The <bot-feedback-loop> section explicitly allows proceeding when timeout is reached with checks still pending ("proceed with available feedback and note incomplete checks"), but the <completion-verification> criterion requires "Review bots have completed (or confirmed none configured)." When the timeout path is taken, bots haven't completed, so the completion check would fail even though this scenario is explicitly permitted by the workflow. The completion criteria need to account for the timeout case.
Additional Locations (1)
| ## Blockers | ||
|
|
||
| [populated if encountered] | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
State file template missing branch and PR fields
Medium Severity
The state file template lacks fields for branch name, PR number, and worktree path, yet the validation logic on resume requires checking "Does the branch still exist?", "If PR listed, is it still open?", and "Are there commits on the branch since it was created?" After context compaction, the LLM reading this state file wouldn't have the branch name needed to perform these validations, potentially causing incorrect resume behavior or validation against the wrong branch.
Summary
Major overhaul of
/autotask- Nick's most-used workflow (dozens of times daily). Transforms it from a linear PR-creation workflow into an orchestrator-first, completion-verified, complexity-scaled autonomous development system.Core problems solved:
Changes
autotask.md v2.0.2
autotask-state.mdin project root survives compactionmulti-review.md v2.2.0
brainstorm-synthesis (NEW skill)
Design Decisions
Complexity as organizing principle: Rather than flags or options, complexity level (auto/quick/balanced/deep) determines planning depth, review intensity, validation thoroughness, and bot wait times.
State file in project root: Appears as uncommitted file, naturally prompts LLM to notice and read it on resume. Simpler than hidden directories.
Mandatory bot feedback: The most common failure mode was stopping after PR creation. Made /address-pr-comments non-optional.
Validation Performed
🤖 Generated with Claude Code