Skip to content

feat(run-prompt): devcontainer-based verification environment for --loop subagents #11

@cruzanstx

Description

@cruzanstx

Problem

When running /run-prompt --worktree --loop, subagents can build code but can't run the full stack to verify changes end-to-end. The --loop verification is limited to make build && make test && make lint — static checks only. There's no way for a subagent to:

  1. Spin up the app (backend + frontend + database + dependent services)
  2. Run Playwright or integration tests against the running stack
  3. Tear it down when done

This means verification in the loop catches compile errors and unit test failures but misses UI regressions, API integration issues, and anything that requires a running app.


Architecture: Three Layers

The clean shape is "projects define verification, executors define isolation, runners define agent entrypoints." This gives real end-to-end loop verification without locking the feature to one agent or one container layout.

Layer 1: Verification Config (Phase 1 — implement first)

A per-project .claude/verify.json that maps changed paths to build/test/lint/integration steps. This immediately improves --loop even on the host executor.

{
  "version": 1,
  "components": {
    "backend": {
      "paths": ["backend/"],
      "steps": {
        "build": { "cmd": "cd backend && make build", "required": true },
        "test": { "cmd": "cd backend && make test", "required": true },
        "lint": { "cmd": "cd backend && make lint", "required": true }
      }
    },
    "frontend": {
      "paths": ["frontend/"],
      "steps": {
        "build": { "cmd": "cd frontend && npm run build", "required": true },
        "check": { "cmd": "cd frontend && npm run check", "required": true },
        "lint": { "cmd": "cd frontend && npm run lint", "required": true },
        "test": { "cmd": "cd frontend && npm run test:unit", "required": false }
      }
    }
  },
  "integration": {
    "compose_file": "docker-compose.dev.yml",
    "steps": {
      "up": { "cmd": "docker compose -f docker-compose.dev.yml up -d --wait", "required": true },
      "test": { "cmd": "cd frontend && npx playwright test", "required": true },
      "down": { "cmd": "docker compose -f docker-compose.dev.yml down", "always_run": true }
    }
  }
}

Key behaviors:

  • Path-based auto-detection: git diff --name-only determines which components changed, only those get verified
  • required vs optional: flaky or service-dependent tests can be optional
  • Integration section: spins up full stack via compose, runs Playwright, tears down — even on failure (always_run)

Layer 2: Executor Abstraction (Phase 2)

Three executor backends, selectable via --executor:

Executor Isolation Use Case
host None (current behavior) Trusted repos, simple projects
sandbox Bubblewrap (see #9) Lightweight Linux sandboxing
devcontainer Full container + optional firewall Full-stack verification, untrusted repos

The devcontainer executor uses the Dev Container CLI which is scriptable and doesn't require VS Code. It supports both simple container setups and Docker Compose scenarios.

The --loop flow with devcontainer becomes:

  1. Create worktree
  2. devcontainer up --workspace-folder <worktree>
  3. Execute subagent inside the container (via devcontainer exec)
  4. Subagent has full stack running, can build + serve + Playwright test
  5. Tear down container on completion

Security Model

Modeled after Anthropic's reference devcontainer:

  • Run as non-root user, bind-mount workspace only
  • Persist shell history and agent state in named volumes
  • Add only NET_ADMIN and NET_RAW capabilities
  • Init firewall with default-drop policy, allowlist required domains
  • No Docker socket exposure to the agent
  • Rootless Docker where feasible

Important caveat: Anthropic explicitly warns that even their hardened devcontainer does not stop a malicious repo from exfiltrating anything reachable in the container. This approach is recommended only for trusted repositories. "Inside Docker" is not the entire threat model — Docker's daemon runs as root unless using Rootless mode, and bind mounts are writable by default.

Layer 3: Agent Runners (Phase 2)

Claude Code and OpenCode plug into the same executor model as runner adapters, rather than being special-cased in --loop.

Claude Code Runner

Concern Approach
Required mount Repo/worktree. Claude reads project instructions, settings, skills, subagents from project tree and ~/.claude
Persistent state ~/.claude/ for settings/skills/subagents; optionally ~/.claude.json for OAuth, MCP config, trust state
Stateless mode --bare + ANTHROPIC_API_KEY. Skips OAuth/keychain reads; recommended for scripted/SDK calls
Write restrictions Even in bypassPermissions, writes to .git, .claude, .vscode, .idea still prompt (except .claude/commands, .claude/agents, .claude/skills). Unattended loops must avoid protected targets

OpenCode Runner

Concern Approach
Required mount Repo/worktree + config + credentials
Entrypoints opencode run (programmatic), opencode serve (headless), opencode run --attach (reuse backend)
Config paths Global: ~/.config/opencode/opencode.json; Project: opencode.json at repo root; Override: OPENCODE_CONFIG env var
Credentials ~/.local/share/opencode/auth.json via opencode auth login, or env vars, or project .env
Shared paths OpenCode discovers AGENTS.md, CLAUDE.md, .opencode/skills, .claude/skills and their global equivalents — enables shared repo guidance across agents
Permissions Config-driven via permission block. For unattended runs, set project-local permission policy explicitly

Benefits

  • Subagents catch real bugs, not just compile errors
  • No more "it built but the page is broken" merges from worktree prompts
  • Projects define their own verification — daplug doesn't need to know app-specific details
  • Devcontainer approach gives full isolation per worktree (no port conflicts between parallel runs)
  • Architecture supports multiple agents without special-casing each one

Implementation Plan

Phase 1: Verification Config + Integration Lifecycle

  • Define .claude/verify.json schema
  • Add config discovery to executor.py (read verify.json, fall back to current static checks)
  • Implement path-based component detection via git diff --name-only
  • Implement integration lifecycle (up → wait → test → down with always_run teardown)
  • Add required vs optional step handling
  • Create verify.json for youtube_summaries as reference implementation

Phase 2: Executor Abstraction + Agent Runners

  • Define executor interface (host, sandbox, devcontainer)
  • Add --executor flag to run-prompt
  • Implement devcontainer executor using Dev Container CLI
  • Implement Claude Code runner adapter (mount profiles, --bare mode)
  • Implement OpenCode runner adapter (config paths, opencode run)
  • Pre-built image support and warm reuse (startup cost is the main product risk)
  • Port allocation strategy for parallel worktree devcontainers

Open Questions

  • Should the verify config be JSON, YAML, or just a Makefile convention (make verify)?
  • Can devcontainer startup be fast enough for iterative loops? Pre-built images and warm reuse should be part of design from day one.
  • How to handle projects that need external services (APIs, GPUs, etc.) that can't run in a container?
  • Port allocation strategy when running multiple worktree devcontainers in parallel?
  • Does this overlap with or complement --sandbox (feat(run-prompt): add --sandbox bubblewrap for Linux prompt execution #9)? Current thinking: sandbox = lightweight isolation, devcontainer = full-stack isolation. Complementary.

Context

Came up working on youtube_summaries which has 3 components (Go backend, Go processor, SvelteKit frontend) each with their own Dockerfiles and Makefiles but no unified dev compose or devcontainer.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions