Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

@poe-code/experiment-loop

Karpathy-style autonomous experiment loop. An agent makes a change, a metric script scores it, the loop keeps or discards via git, logs to a journal, and repeats.

Quickstart

# 1. Install the experiment skill
poe-code experiment install

# 2. Create an experiment doc using /poe-code-experiment-plan
#    e.g. "create experiment to optimize test duration"

# 3. Run the loop
poe-code experiment run

# 4. Check results
poe-code experiment journal

Example Experiment Doc

A markdown file with YAML frontmatter. The body is the agent's research brief.

---
agent: claude-code
metric:
  name: tests
  direction: maximize
baseline: null
status:
  state: open
  experiment: 0
  kept: 0
---
# Make the test suite faster

Reduce test execution time without removing coverage.
Focus on parallelization and removing unnecessary setup/teardown.

Multiple agents

Agents cycle round-robin across experiments:

agent:
  - claude-code
  - codex

From the CLI: poe-code experiment run --agent claude-code,codex

Specifying a model

Use agent:provider/model notation:

agent: claude-code:anthropic/claude-opus-4.7

Metric Scripts

Metric scripts decide what "better" means. Each must exit 0 on success and print a single number to stdout.

Register them as metric:* npm scripts:

{
  "scripts": {
    "metric:tests": "node scripts/metric-test-count.mjs",
    "metric:test_duration": "node scripts/metric-test-duration.mjs"
  }
}

Direction

  • maximize — higher is better (test count, coverage)
  • minimize — lower is better (duration, bundle size)
  • stable — must not change (test count during optimization)

Metric chains

All metrics must pass, scores are tracked independently:

metric:
  - name: tests
    direction: maximize
  - name: test_duration
    direction: minimize

How It Works

measure baseline -> loop:
  agent makes a change -> commit -> run metrics -> keep or discard -> journal -> repeat

The agent learns from past attempts through the journal — it sees what worked and what didn't.

Custom Experiment Directory

By default experiment docs are discovered from .poe-code/experiments/. To use a different directory:

# Set plan directory in project config (.poe-code/config.json)
# { "experiment": { "plan_directory": "docs/experiments" } }

# Or via env
POE_EXPERIMENT_PLAN_DIRECTORY=docs/experiments poe-code experiment run

# Or point to a specific doc directly
poe-code experiment run docs/experiments/optimize-tests.md

Dashboard Configuration

Experiment runs can use the live terminal dashboard.

# One-off flags
poe-code experiment run --tui
poe-code experiment run --no-tui

# Config default (.poe-code/config.json)
# { "experiment": { "tui": true } }

# Env override
POE_EXPERIMENT_TUI=true poe-code experiment run

CLI

poe-code experiment run [doc]       [--agent <name>] [--max-experiments <n>] [--tui|--no-tui]
poe-code experiment validate [doc]
poe-code experiment journal [doc]
poe-code experiment install