autorize is a generic iterative-improvement harness. You point it at a project, a
scoring command, and an agent CLI, and it runs the agent in sandboxed git worktrees
against the score — keeping improvements, discarding regressions — until a deadline
fires.
It generalizes the autoresearch pattern
into a small Rust CLI you can point at any repo.
For each iteration, autorize:
- Creates a fresh git worktree off the
autorize/<name>tracking branch. - Builds a prompt from your
program.md, the boundary rules, any operator guidance (autorize tell), and the last 10 iteration records (with a per-outcome reason and, if enabled, a model-written summary of each attempt). - Spawns your agent (any CLI — Claude Code, a shell script, anything)
inside the worktree with a hard wall-clock budget. On timeout the whole
process group gets
SIGTERM, thenSIGKILLafter 5 s. - Stages the agent's changes and rejects the iteration if its diff touches a
deny_pathsglob. - Runs your scoring command (raw float, regex capture, or JSONPath) and compares against the best score seen so far.
- Better? Commits onto
autorize/<name>and advances the tracking branch. Worse / no-op / denied / invalid? Discards the worktree. - Appends an
IterationRecordtoiterations.jsonland rewritesstate.jsonatomically so you canCtrl-C(or crash) at any point andautorize resumepicks up cleanly.
The loop exits when the total deadline fires, max_iterations is hit, or
max_consecutive_noops is reached.
Supported platforms: Linux (x86_64-unknown-linux-gnu) and macOS
(aarch64-apple-darwin, Apple Silicon).
From crates.io:
cargo install autorizePrebuilt binary (from the latest GitHub Release):
# Pick your target:
TARGET=x86_64-unknown-linux-gnu # or: aarch64-apple-darwin
# Resolve the latest tag, then download + extract:
TAG=$(curl -fsSL -o /dev/null -w '%{url_effective}' \
https://github.com/wbbradley/autorize/releases/latest | sed 's#.*/tag/##')
curl -fsSL "https://github.com/wbbradley/autorize/releases/download/${TAG}/autorize-${TAG}-${TARGET}.tar.gz" \
| tar -xz
./autorize --versionOr browse https://github.com/wbbradley/autorize/releases/latest and grab the archive for your target by hand.
From source:
cargo install --path .# 1. Scaffold an experiment under .autorize/<name>/
autorize init myexp
# 2. Edit .autorize/myexp/config.toml and .autorize/myexp/program.md
# - point `objective.command` at your scoring script
# - point `agent.command` at your agent CLI
# - set a deadline (`total_budget = "4h"` or `deadline = "..."`)
# 3. Commit your repo (autorize refuses dirty trees by default), then run:
autorize run myexp
# 4. Check progress from another shell:
autorize status myexp
# 5. If the loop dies, restart it:
autorize resume myexpThis repo ships a Claude Code skill at skills/autorize/ that
walks you through scaffolding an experiment — it asks about your objective, scoring
command, agent CLI, and schedule, then drafts .autorize/<name>/config.toml,
program.md, and any helper scoring script for your review before writing.
Install once (user-global, applies to every repo you open):
mkdir -p ~/.claude/skills
cp -r skills/autorize ~/.claude/skills/Or per-project (only this repo):
mkdir -p .claude/skills
cp -r skills/autorize .claude/skills/Then, from a Claude Code session in any repo with autorize on PATH, invoke
/autorize. The skill prints autorize llms for context, interviews you,
and stops at "ready to autorize run <name>" — it never starts the loop.
| Command | What it does |
|---|---|
autorize init <name> |
Scaffold .autorize/<name>/{config.toml,program.md}. |
autorize run <name> |
Run the loop until deadline / cap / noop streak. --fresh starts another run building on the prior best. |
autorize status <name> |
One-shot summary from state.json + iterations.jsonl. |
autorize list <name> |
Dump every iteration as markdown (oldest-first, one section per iteration with its summary); colorized on a TTY, plain markdown when piped. --color <auto|always|never> overrides detection. |
autorize tell <name> <message> |
Append operator guidance; the running loop injects it into the next iteration's prompt (see below). |
autorize resume <name> |
Recover after a crash; any in-progress iter is recorded as killed and the loop continues. |
autorize clean <name> |
Tidy a finished/abandoned experiment: detach any worktree still holding the tracking branch checked out (the branch ref is preserved), drop stale staged indexes, prune dead worktree registrations (--remove-worktrees also deletes kept wt/ checkouts). Leaves the log and records intact. |
autorize llms |
Print an exhaustive agent-targeted markdown reference (config schema, on-disk layout, IterationRecord, state machine). |
autorize run accepts --allow-dirty if you need to start with uncommitted
changes outside .autorize/, and --fresh to start another run on a finished
experiment (see below).
When a run finishes (deadline, max_iterations, or the consecutive-noop
streak), re-running autorize run <name> is a no-op — it reloads the saved
state and re-hits the same stop condition. To do another batch of work that
builds on what you already have, pass --fresh:
autorize run myexp --fresh--fresh recomputes the deadline from schedule, resets the per-run
max_iterations budget and the consecutive-noop streak, and refreshes
started_at — while preserving the prior best_score/best_iter, the
autorize/<name> branch and its tip, and the full iterations.jsonl history.
New iterations keep comparing against the prior best and keep numbering upward.
It is a no-op on a never-run experiment, and is refused (use autorize resume)
if an iteration is mid-flight. An already-past absolute schedule.deadline
errors instead of looping; switch to total_budget or edit the deadline first.
To redirect a run in flight — without stopping it — append operator guidance:
autorize tell myexp "stop tuning the series — try a spigot algorithm instead"tell appends a structured entry to .autorize/myexp/guidance.jsonl. The
running loop re-reads that file at the top of every iteration and injects all
entries into a prominent ## Operator guidance section of the agent's prompt,
framed as authoritative direction. The message shows up in the next
iteration and persists thereafter. The file is also safe to hand-edit; a
missing/empty file simply renders no section. In v1 all guidance persists and
is shown every iteration.
[experiment]
name = "myexp"
description = "..."
[objective]
command = "bash score.sh" # prints the score to stdout
direction = "min" # "min" | "max"
parse = { kind = "float" } # or { kind = "regex", pattern = "score=([0-9.]+)" }
# or { kind = "jq", path = ".metrics.loss" }
timeout = "60s"
fail_mode = "invalid" # "invalid" | "worst" | "abort"
[boundaries]
allow_paths = ["src/**/*.py"] # prompt-only in v1
deny_paths = [".autorize/**"] # ENFORCED via diff
[setup] { command = "", timeout = "5m" }
[teardown] { command = "", timeout = "1m" }
[iteration]
budget = "5m"
max_iterations = 0 # 0 = unbounded
keep_worktrees = false
max_consecutive_noops = 5
[schedule]
total_budget = "4h" # OR (exactly one):
# deadline = "2026-05-21T09:00:00-07:00"
[agent]
command = "claude --print {prompt_file}" # {prompt_file}, {workdir}, {iter}
workdir_var = "AUTORIZE_WORKDIR"
stdin = "none" # "none" | "prompt"
[agent.env]
ANTHROPIC_API_KEY = "$ANTHROPIC_API_KEY"
[summarize] # on by default; recap each
enabled = true # iteration with a cheap model
command = 'claude --model haiku --print --tools "" --system-prompt "You are a terse summarizer. Output exactly 1-2 sentences naming the change and why it moved the score. No preamble, no markdown, no questions, no offer of further help."'
timeout = "60s"
stdin = "prompt" # prompt piped on stdin (default)When [summarize] is enabled, each iteration's recap is surfaced to the agent
in later prompts under ## Recent attempt summaries (so it can learn from
discarded attempts). At the top of every autorize run / resume, autorize
also backfills summaries for any records still missing one — those written
before you enabled [summarize], or whose summarize step failed — by replaying
the persisted iter-NNNN/ artifacts (changes.diff, agent.stdout,
agent.stderr). It is best-effort and skips noops and records whose artifacts
are gone; the first run after enabling summaries may therefore fire several
one-time model calls.
program.md lives next to config.toml and is freeform instructions for the
agent — included verbatim at the top of every prompt.
<repo>/
logs/autorize.log # central append-only run log (narrative + teed child stdio)
.autorize/<name>/
config.toml
program.md
state.json # atomic checkpoint of loop state
iterations.jsonl # durable append-only log
guidance.jsonl # operator guidance from `autorize tell` (hand-editable)
iter-0001/
prompt.md # what the agent saw
changes.diff # captured diff
agent.stdout
agent.stderr
iter-0002/
...
logs/ is created on startup (gitignore it). RUST_LOG tunes verbosity
(default info). At info the log is a forensic audit trail — every git
call, subprocess spawn, and filesystem mutation is recorded (dozens of lines
per iteration; agent.env secrets are never logged). Use RUST_LOG=warn to
quiet it (also hides the run narrative).
The tracking branch autorize/<name> records every merged iteration as a
single commit, so git log autorize/<name> is your improvement history and
git diff main..autorize/<name> is the cumulative change.
See examples/pi-digits/ for an end-to-end demo where a
mock agent nudges a number in value.txt toward π:
cp -r examples/pi-digits/. /tmp/pi-demo
cd /tmp/pi-demo
git init -b main
git -c user.email=a@b -c user.name=a add .
git -c user.email=a@b -c user.name=a commit -m init
autorize run piv1 is feature-complete on Linux and macOS (Apple Silicon). Out of scope for v1: parallel iterations, Pareto scoring, web/TUI, token accounting, retry/backoff, remote storage, allow-path enforcement (allow_paths is prompt-only).
AGPL-3.0-or-later.