BOOP

An opinionated, idea-to-software pipeline. Give Boop an idea. He plans it, architects it, builds it, tests it, reviews it, and hardens it. You steer the vision; he does the work.

Built on OpenClaw (agent runtime), BMAD (planning knowledge), and Ralph (autonomous execution loop).

How It Works

Boop chains three systems into a single automated pipeline:

Planning — Viability assessment, PRD, architecture decisions, and epic/story breakdown. Uses tested prompt templates derived from BMAD methodology, augmented with lessons learned from past projects.
Building — Autonomous agent loop picks up stories one at a time, implements, runs quality gates (typecheck + lint + test), and commits. One story per iteration, one epic at a time.
Reviewing — After each epic, a team of review agents runs in parallel: code reviewer, security scanner, and test hardener. Findings are verified against real code, then a fixer agent patches issues. The loop runs up to 3 iterations.
Learning — After every project, Boop runs a retrospective, captures architecture decisions, consolidates heuristics, and optionally evolves its own prompts. The next project benefits from what this one taught it.

The whole thing is wrapped in nested loops:

graph TD
    %% Global Inputs
    Idea[User Idea] --> PlanPhase
    Profile[(Developer Profile\nStack, CI/CD, DB)] -.-> PlanPhase
    Profile -.-> ScaffoldPhase
    Memory[(Global Memory\nHeuristics, Decisions)] -.-> PlanPhase

    %% Phase 1: Planning
    subgraph PlanPhase [1. Planning Phase]
        V[Viability Assessment] -- Proceed --> PRD[PRD Generation]
        PRD --> Arch[Architecture Decisions]
        Arch --> Epics[Epic & Story Breakdown]
    end

    PlanPhase --> ScaffoldPhase[2. Scaffolding\nGenerate Repo & Configs]
    ScaffoldPhase --> ProjectLoop

    %% Phase 2 & 3: The Nested Execution Loops
    subgraph ProjectLoop [3. The Project Loop]
        direction TB
        NextEpic{More Epics?}

        subgraph EpicLoop [The Epic Loop]
            direction TB
            NextStory{More Stories?}

            subgraph StoryLoop [The Story Loop - Ralph]
                Impl[Implement Story] --> Gates[Quality Gates:\nTypecheck, Lint, Test]
                Gates -- Fail --> Impl
                Gates -- Pass --> Commit[Git Commit]
            end

            NextStory -- Yes --> StoryLoop
            Commit --> NextStory

            NextStory -- No --> ReviewAgents

            subgraph ReviewLoop [Adversarial Review]
                ReviewAgents[Parallel Review Agents:\nCode, Security, Test Hardener]
                ReviewAgents --> Verifier[Verify Findings vs Code]
                Verifier --> Fixer[Refactor / Fix Code]
                Fixer -- Verify Fixes\nMax 3 Iterations --> ReviewAgents
            end
        end

        Epics --> NextEpic
        NextEpic -- Yes --> EpicLoop
        ReviewLoop --> EpicSignOff[Epic Sign-off]
        EpicSignOff --> NextEpic
    end

    %% Phase 4: Wrap Up
    ProjectLoop -- Project Complete --> Deploy[4. Deployment\nVercel, Docker, etc.]
    Deploy --> Retro[5. Retrospective\nAnalyze Pipeline Efficiency]
    Retro --> Evolve[6. Self-Improvement\nEvolve Prompts & Heuristics]
    Evolve --> Memory

Text version (for terminals)

┌─ PROJECT LOOP ──────────────────────────────────────────────┐
│  idea → viability → plan → build → deploy → retrospective   │
│                                    ↓                         │
│                            self-improvement                  │
│                     (evolve prompts & heuristics)            │
│                                                              │
│  ┌─ EPIC LOOP ───────────────────────────────────────────┐  │
│  │  stories complete →                                    │  │
│  │  code review + security scan + test hardener →         │  │
│  │  verify findings + fix code (up to 3 iterations) →    │  │
│  │  sign-off                                              │  │
│  │                                                        │  │
│  │  ┌─ STORY LOOP ────────────────────────────────────┐  │  │
│  │  │  pick story → implement → typecheck → test →     │  │  │
│  │  │  commit → next story                             │  │  │
│  │  └─────────────────────────────────────────────────┘  │  │
│  └────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────┘

Pipeline state is persisted to .boop/state.yaml after every transition. If the process dies, npx boop --resume picks up exactly where it left off.

Prerequisites

Node.js 22+
pnpm
Docker (optional, for agent sandboxing via --sandbox)
Git
Claude API key (ANTHROPIC_API_KEY environment variable)

Quick Start

# Clone and install
git clone https://github.com/gtdrag/boop.git
cd boop
pnpm install

# Set your API key
export ANTHROPIC_API_KEY=sk-ant-...

# Run it
npx boop "I want an app that tracks vinyl records"

First run, Boop detects no developer profile and walks you through onboarding:

Hey, I'm Boop. Let's get to know each other.

Name?                  > George
Languages?             > TypeScript, Python
Frontend framework?    > Next.js
Backend framework?     > Express
Database?              > PostgreSQL
Deployment?            > Vercel
Styling?               > Tailwind
Test runner?           > Vitest
CI/CD?                 > GitHub Actions
...

This generates ~/.boop/profile.yaml — your developer profile. Done once, used forever, editable with npx boop --profile.

Every project Boop builds will use your profile: the right framework, your preferred linter, your CI provider, your cloud target. No more boilerplate decisions.

CLI

npx boop "your idea"        # Full pipeline from idea to software
npx boop                    # Interactive mode (prompts for idea)
npx boop --profile          # Edit your developer profile
npx boop --status           # Check current pipeline state
npx boop --review           # Review and sign off on latest epic
npx boop --resume           # Resume an interrupted pipeline
npx boop --autonomous       # Run without sign-off gates
npx boop --sandbox          # Run build agents in Docker containers

# Benchmarking
npx boop benchmark run [suite]          # Run a benchmark suite (default: smoke)
npx boop benchmark run smoke --dry-run  # Dry-run with mock responses (free, fast)
npx boop benchmark run smoke --live     # Live run with real Claude API (costs money)
npx boop benchmark run smoke --json     # Output raw JSON to stdout
npx boop benchmark list                 # List available suites
npx boop benchmark list --runs          # List past benchmark runs
npx boop benchmark compare <base> [cur] # Compare two runs, detect regressions
npx boop benchmark baseline [suite]     # Run and mark result as blessed baseline

# Gauntlet (graduated complexity testing)
npx boop gauntlet run [definition]      # Run the gauntlet (default: gauntlet-v1)
npx boop gauntlet run --tier 3          # Only run up to tier 3
npx boop gauntlet run --start 4         # Resume from tier 4
npx boop gauntlet list                  # List available gauntlet definitions
npx boop gauntlet list --runs           # List past gauntlet runs
npx boop gauntlet report <runId>        # Display a past run's report
npx boop gauntlet diff <runId>          # Show cumulative drift from baseline

Pipeline Stages

Planning

Four sequential phases, each feeding into the next:

Phase	What it does	Output
Viability	Honest assessment — should this be built?	PROCEED / RECONSIDER / PIVOT
PRD	Requirements from your idea + profile context	`.boop/planning/prd.md`
Architecture	Tech decisions (auto-resolved from profile)	`.boop/planning/architecture.md`
Stories	Epics and stories with acceptance criteria	`.boop/planning/epics.md`

In interactive mode, you get a chance to review the viability assessment and decide whether to proceed. In autonomous mode (--autonomous), it runs straight through unless viability says RECONSIDER.

Planning is augmented by the self-improvement loop: heuristics from past projects, architecture decisions that worked (or didn't), and recurring review findings are injected into prompts so the same mistakes don't repeat.

Scaffolding

Runs once, on the first epic. Generates a project skeleton from your profile:

Directory structure (src, test, components, routes, etc.)
package.json with framework-specific dependencies
tsconfig.json with strict mode
Linter/formatter config (ESLint, Biome, or oxlint)
Test runner config (Vitest or Jest)
CI config (GitHub Actions, GitLab CI, or CircleCI)
Deployment config (Vercel, Railway, Fly.io, or Docker)
Risk-tiered review policy (.boop/risk-policy.json)
.gitignore
Git repo initialized with initial commit

Plus quality defaults that ship with every project:

SEO — Meta tags, Open Graph, structured data, sitemap, robots.txt
Analytics — Wired to your preferred provider (PostHog, Plausible, GA, Mixpanel)
Accessibility — Skip navigation, ARIA landmarks, focus management, color contrast
Security headers — CSP (strict by default), HSTS, X-Frame-Options, X-Content-Type-Options
Error tracking — Sentry or Bugsnag, based on your profile
Core Web Vitals — Monitoring wired up automatically

Building

The story loop runs autonomously. For each story:

Pick the highest-priority story that hasn't passed yet
Implement it (Claude writes the code)
Run quality gates: typecheck, lint, test
If green, commit and move to the next story
If red, retry with error context

Progress is tracked in .boop/progress.txt — an append-only log of what was built, what was learned, and what gotchas were encountered. Each iteration reads it so the agent learns from previous mistakes within the same project.

Reviewing

After all stories in an epic are done, an adversarial review loop runs:

3 review agents run in parallel — code reviewer, security scanner, and test hardener each independently scan the codebase and report findings
Verifier — checks each finding against the actual code to discard false positives
Fixer — patches all verified findings via Claude CLI
Repeat — the loop runs up to 3 iterations, catching regressions introduced by fixes

Each iteration typically finds fewer issues than the last. The loop converges when all findings are resolved or the max iteration count is reached.

Risk-tiered review — Review intensity is configurable per file path via .boop/risk-policy.json:

Tier	Files	Iterations	Auto-fix Severity	Agents
High	Auth, API routes, DB queries	3	Medium+	All 3
Medium	Components, routes	2	High+	2
Low	Everything else	1	Critical only	1

Approval gates — In interactive mode, you pick which findings to fix. Via WhatsApp/Telegram, you can reply with "approve", "skip", "abort", or filter specific findings.

Deployment

After sign-off, Boop deploys the project based on your developer profile's cloudProvider setting:

Provider	Method
Vercel	`npx vercel --yes --prod`
Railway	`railway up`
Fly.io	`fly deploy`
Docker	`docker build` + `docker run`
AWS/GCP/Azure	Claude agent with deploy instructions
`none`	Skipped

Deployment is non-blocking: the pipeline continues to the retrospective regardless of deploy outcome.

Retrospective

After the final epic, a retrospective analyzes the full build history:

Story metrics — Which stories needed multiple iterations, and why
Review patterns — Most common finding categories across all review iterations
Complexity analysis — Files changed per story, detecting overly large stories
Concrete suggestions — If avg files/story > 10, suggests smaller stories; if security findings > 5, suggests earlier scanning

Learnings are saved to ~/.boop/memory/ as structured YAML:

Coding patterns discovered during the project
Review findings and their resolutions
Process metrics (durations, retry counts, common blockers)
Architecture decisions — what was chosen and how it turned out

The next project Boop builds benefits from what this one taught it.

How It Learns

Boop has a self-improvement loop that runs after every project. It operates at four levels:

Outcome Injection

Recurring review findings (seen 3+ times across projects) are automatically injected into planning prompts as "Lessons from Past Reviews." If the security scanner keeps flagging missing rate limiting, future architecture documents will include rate limiting by default.

Architecture Decision Capture

Every retrospective extracts architecture decisions — what database was chosen, what auth pattern, what caching strategy — along with whether it worked out. Relevant decisions (matched by tech stack) are surfaced during planning for future projects.

Heuristic Consolidation

Memory entries, review rules, and architecture decisions are synthesized into condensed, confidence-scored heuristics via Claude. Heuristics decay over time (5% per month) so stale advice fades out. Before each planning phase, relevant heuristics are queried by phase and stack and injected into the prompt.

Prompt Evolution

When autoEvolvePrompts is enabled in your profile, Boop generates prompt improvement proposals after each project, validates them against a benchmark baseline to ensure no regressions, and promotes winning variants. Full version history is tracked in ~/.boop/memory/prompt-versions/ with rollback support.

Benchmarking

The benchmark harness validates the pipeline against a suite of test ideas:

npx boop benchmark run smoke --dry-run

Dry-run mode uses canned mock responses — free, fast (~seconds), validates wiring
Live mode calls real Claude API — accurate metrics, costs money
Captures per-phase metrics: timing, tokens, retries, success/fail
Generates a scorecard (JSON + markdown) and persists to ~/.boop/benchmarks/
Supports run comparison with regression detection (duration >50%, tokens >30%, status changes)

Built-in suites: smoke (1 trivial case, validates harness), planning-only (3 cases at different complexity levels).

Gauntlet

The gauntlet is a graduated complexity test. It runs Boop against a series of increasingly complex projects, from a simple to-do app up to a full SaaS starter kit. After each project, it pauses, shows you what it learned and what it wants to change, and waits for your approval before evolving.

Tier	Project	Stack
T1	To-do app	React + local storage, no backend
T2	Notes app	React + Express + SQLite, CRUD
T3	Blog with auth	Next.js + Express + PostgreSQL
T4	E-commerce	Next.js + PostgreSQL + Stripe
T5	Real-time dashboard	Next.js + WebSockets + charts
T6	SaaS starter	Next.js + PostgreSQL + multi-tenant

Safety model: The system never modifies itself without your explicit approval. Every checkpoint is tagged in git (gauntlet/v1-t1-post, gauntlet/v1-t1-evolved, etc.) so you can always diff, rollback, or compare. An evolution-log.yaml tracks cumulative drift.

# Run the full gauntlet
npx boop gauntlet run

# Run only the first 3 tiers
npx boop gauntlet run --tier 3

# See what changed across a run
npx boop gauntlet diff <runId>

At the approval gate between tiers, you choose:

Approve — apply prompt evolution, tag, continue
Skip — tag post-run state, move to next tier without evolving
Stop — end the gauntlet (everything is tagged, nothing lost)

Notifications

Boop can send status updates and wait for your input via WhatsApp or Telegram. Configure in your developer profile:

WhatsApp — Uses Baileys (QR code pairing on first connect). Set notificationChannel: whatsapp and phoneNumber in your profile.
Telegram — Uses grammy. Set notificationChannel: telegram, telegramBotToken, and telegramChatId in your profile.

Pipeline events that trigger notifications: planning complete, build started/complete, review complete, sign-off ready, deployment started/complete/failed, retrospective complete, errors.

Bidirectional: Boop can ask questions and wait for your reply (with a configurable timeout, default 5 minutes). Messages containing credentials are automatically blocked.

Security

Sandboxed agents (--sandbox) — Build and review agents run in Docker containers with read-only root filesystem, project-dir-only volume mount, memory limit (2GB), CPU limit (2 cores), PID limit (256). Falls back to policy-enforced local execution if Docker is unavailable.
Network restricted — Containers can only reach the Claude API via DNS + iptables rules
Policy engine — Blocks destructive commands (rm -rf, force push, reset --hard, shutdown, mkfs) and subshell injection patterns ($(), backticks) at the runtime level before they reach the shell
Path sandboxing — File access restricted to the project directory and ~/.boop/
Credential isolation — API keys stored with 0600 permissions, never written to project files or logs, redacted in all output, blocked from notification messages
No plugins — Closed system. No marketplace, no external extensions, no auto-downloading from public repos

Project Structure

~/.boop/                        # Global (one per machine)
├── profile.yaml                # Developer profile
├── credentials/                # API keys (mode 0600)
├── memory/                     # Cross-project learnings
│   ├── retrospective.yaml      # Coding patterns, review findings, metrics
│   ├── arch-decisions.yaml     # Architecture decision history
│   ├── heuristics.yaml         # Consolidated heuristics
│   └── prompt-versions/        # Prompt evolution history (per phase)
├── benchmarks/                 # Benchmark run history
│   ├── index.json
│   └── runs/
├── gauntlet/                   # Gauntlet run history
│   ├── index.json
│   └── runs/
└── logs/                       # JSON log files

<your-project>/.boop/           # Per-project (created by Boop)
├── state.yaml                  # Pipeline state (survives crashes)
├── prd.json                    # Stories in Ralph format
├── progress.txt                # Build iteration log
├── risk-policy.json            # Risk-tiered review config
├── planning/                   # Generated planning docs
├── reviews/                    # Review outputs per epic
└── retrospective/              # Retrospective report

Roadmap

Improve Mode (`boop --improve`) — Brownfield Support

Point boop at an existing codebase and iteratively improve it. Instead of building from an idea, it analyzes what's already there, generates improvement stories, builds the fixes, and reviews them.

boop --improve                           # improve current directory
boop --improve /path/to/project          # improve specific project
boop --improve --depth 5                 # up to 5 improvement cycles
boop --improve --focus security          # only security-related issues
boop --improve --focus tests             # only test coverage gaps

Each cycle scans the codebase, runs adversarial agents, fixes verified findings, then re-scans. Tracks a "findings memory" so resolved issues don't resurface. Reports a quality score trend across cycles (e.g., "Cycle 1: 47 issues -> Cycle 2: 12 -> Cycle 3: 3").

Status Dashboard (`boop --dashboard`)

A local web page showing real-time pipeline progress. No framework, no build step — a single self-contained HTML page served on localhost:3141 with SSE for live updates.

boop "my idea" --autonomous --dashboard  # run with dashboard open
boop --dashboard                          # attach to running pipeline

Shows current phase, epic/story progress bars, review findings (found/fixed/remaining per iteration), timeline with timestamps and durations, live log tail, token usage, and cost estimate.

See docs/roadmap.md for full details on planned features.

Development

pnpm install                    # Install dependencies
pnpm run dev                    # Run in development mode
pnpm run check                  # Format + typecheck + lint
pnpm run test                   # Run tests (1,558 tests)
pnpm run build                  # Build with tsdown

Configuration

Environment Variable	Description	Required
`ANTHROPIC_API_KEY`	Claude API key	Yes
`BOOP_HOME`	Override `~/.boop/` directory	No
`BOOP_STATE_DIR`	Override project state directory	No

Profile Options

Field	Description	Default
`name`	Your display name	—
`languages`	Programming languages (ordered by preference)	`["typescript"]`
`frontendFramework`	next, remix, astro, sveltekit, vite-react, etc.	`next`
`backendFramework`	express, fastify, hono, nest, koa	`express`
`database`	postgresql, mysql, sqlite, mongodb, supabase	`postgresql`
`cloudProvider`	vercel, aws, gcp, fly, railway, docker, none	`vercel`
`styling`	tailwind, css-modules, styled-components	`tailwind`
`stateManagement`	zustand, redux, jotai, pinia	`zustand`
`analytics`	posthog, plausible, google-analytics, mixpanel	`posthog`
`ciCd`	github-actions, gitlab-ci, circleci	`github-actions`
`packageManager`	pnpm, npm, yarn, bun	`pnpm`
`testRunner`	vitest, jest, mocha, playwright	`vitest`
`linter`	oxlint, eslint, biome	`oxlint`
`projectStructure`	monorepo, single-repo	`single-repo`
`errorTracker`	sentry, bugsnag, none	`sentry`
`aiModel`	Claude model to use	`claude-opus-4-6`
`autonomousByDefault`	Skip interactive gates	`false`
`autoEvolvePrompts`	Run prompt evolution after retrospective	`false`
`notificationChannel`	whatsapp, telegram, none	`none`
`notificationTimeout`	Seconds to wait for user reply (0 = no timeout)	`300`

Design Philosophy

Opinionated — Fixed voice, fixed personality, fixed workflow. Your profile customizes the tech stack, not the process.
Best practices are defaults — SEO, analytics, accessibility, security headers, error tracking ship with every project. The profile defines which providers, but the fact that they exist is non-negotiable.
Nested loops — Story loop (fast, silent), epic loop (review + hardening + sign-off), project loop (major gates, retrospective).
Learns from experience — Heuristics, architecture decisions, and review patterns accumulate across projects. Prompts evolve. Stale advice decays.
Controlled self-improvement — The gauntlet lets you watch the system evolve under increasing complexity, with approval gates and git tags at every checkpoint.
Communicative, not needy — Status updates at the right level. Knows when to ask and when to just handle it.
Security-first — Closed system. Every agent sandboxed. Credentials isolated. Destructive actions blocked at runtime.

Built On

OpenClaw — Agent runtime and gateway (forked, stripped to core)
BMAD — Planning knowledge (prompt templates, personas, checklists)
Ralph — Autonomous execution loop pattern
Claude — AI model powering all phases

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
benchmarks		benchmarks
docs		docs
prompts		prompts
scripts/ralph		scripts/ralph
src		src
test		test
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
prd.json		prd.json
tsconfig.json		tsconfig.json
tsconfig.test.json		tsconfig.test.json
tsdown.config.ts		tsdown.config.ts
vitest.config.ts		vitest.config.ts
vitest.unit.config.ts		vitest.unit.config.ts

Folders and files

Latest commit

History

Repository files navigation

BOOP

How It Works

Prerequisites

Quick Start

CLI

Pipeline Stages

Planning

Scaffolding

Building

Reviewing

Deployment

Retrospective

How It Learns

Outcome Injection

Architecture Decision Capture

Heuristic Consolidation

Prompt Evolution

Benchmarking

Gauntlet

Notifications

Security

Project Structure

Roadmap

Improve Mode (boop --improve) — Brownfield Support

Status Dashboard (boop --dashboard)

Development

Configuration

Profile Options

Design Philosophy

Built On

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Improve Mode (`boop --improve`) — Brownfield Support

Status Dashboard (`boop --dashboard`)

Packages