Skip to content

wbbradley/autorize

Repository files navigation

autorize

autorize is a generic iterative-improvement harness. You point it at a project, a scoring command, and an agent CLI, and it runs the agent in sandboxed git worktrees against the score — keeping improvements, discarding regressions — until a deadline fires.

It generalizes the autoresearch pattern into a small Rust CLI you can point at any repo.

How it works

For each iteration, autorize:

  1. Creates a fresh git worktree off the autorize/<name> tracking branch.
  2. Builds a prompt from your program.md, the boundary rules, any operator guidance (autorize tell), and the last 10 iteration records (with a per-outcome reason and, if enabled, a model-written summary of each attempt).
  3. Spawns your agent (any CLI — Claude Code, a shell script, anything) inside the worktree with a hard wall-clock budget. On timeout the whole process group gets SIGTERM, then SIGKILL after 5 s.
  4. Stages the agent's changes and rejects the iteration if its diff touches a deny_paths glob.
  5. Runs your scoring command (raw float, regex capture, or JSONPath) and compares against the best score seen so far.
  6. Better? Commits onto autorize/<name> and advances the tracking branch. Worse / no-op / denied / invalid? Discards the worktree.
  7. Appends an IterationRecord to iterations.jsonl and rewrites state.json atomically so you can Ctrl-C (or crash) at any point and autorize resume picks up cleanly.

The loop exits when the total deadline fires, max_iterations is hit, or max_consecutive_noops is reached.

Install

Supported platforms: Linux (x86_64-unknown-linux-gnu) and macOS (aarch64-apple-darwin, Apple Silicon).

From crates.io:

cargo install autorize

Prebuilt binary (from the latest GitHub Release):

# Pick your target:
TARGET=x86_64-unknown-linux-gnu       # or: aarch64-apple-darwin

# Resolve the latest tag, then download + extract:
TAG=$(curl -fsSL -o /dev/null -w '%{url_effective}' \
  https://github.com/wbbradley/autorize/releases/latest | sed 's#.*/tag/##')
curl -fsSL "https://github.com/wbbradley/autorize/releases/download/${TAG}/autorize-${TAG}-${TARGET}.tar.gz" \
  | tar -xz
./autorize --version

Or browse https://github.com/wbbradley/autorize/releases/latest and grab the archive for your target by hand.

From source:

cargo install --path .

Quickstart

# 1. Scaffold an experiment under .autorize/<name>/
autorize init myexp

# 2. Edit .autorize/myexp/config.toml and .autorize/myexp/program.md
#    - point `objective.command` at your scoring script
#    - point `agent.command` at your agent CLI
#    - set a deadline (`total_budget = "4h"` or `deadline = "..."`)

# 3. Commit your repo (autorize refuses dirty trees by default), then run:
autorize run myexp

# 4. Check progress from another shell:
autorize status myexp

# 5. If the loop dies, restart it:
autorize resume myexp

Use with Claude Code

This repo ships a Claude Code skill at skills/autorize/ that walks you through scaffolding an experiment — it asks about your objective, scoring command, agent CLI, and schedule, then drafts .autorize/<name>/config.toml, program.md, and any helper scoring script for your review before writing.

Install once (user-global, applies to every repo you open):

mkdir -p ~/.claude/skills
cp -r skills/autorize ~/.claude/skills/

Or per-project (only this repo):

mkdir -p .claude/skills
cp -r skills/autorize .claude/skills/

Then, from a Claude Code session in any repo with autorize on PATH, invoke /autorize. The skill prints autorize llms for context, interviews you, and stops at "ready to autorize run <name>" — it never starts the loop.

Subcommands

Command What it does
autorize init <name> Scaffold .autorize/<name>/{config.toml,program.md}.
autorize run <name> Run the loop until deadline / cap / noop streak. --fresh starts another run building on the prior best.
autorize status <name> One-shot summary from state.json + iterations.jsonl.
autorize list <name> Dump every iteration as markdown (oldest-first, one section per iteration with its summary); colorized on a TTY, plain markdown when piped. --color <auto|always|never> overrides detection.
autorize tell <name> <message> Append operator guidance; the running loop injects it into the next iteration's prompt (see below).
autorize resume <name> Recover after a crash; any in-progress iter is recorded as killed and the loop continues.
autorize clean <name> Tidy a finished/abandoned experiment: detach any worktree still holding the tracking branch checked out (the branch ref is preserved), drop stale staged indexes, prune dead worktree registrations (--remove-worktrees also deletes kept wt/ checkouts). Leaves the log and records intact.
autorize llms Print an exhaustive agent-targeted markdown reference (config schema, on-disk layout, IterationRecord, state machine).

autorize run accepts --allow-dirty if you need to start with uncommitted changes outside .autorize/, and --fresh to start another run on a finished experiment (see below).

Starting another run

When a run finishes (deadline, max_iterations, or the consecutive-noop streak), re-running autorize run <name> is a no-op — it reloads the saved state and re-hits the same stop condition. To do another batch of work that builds on what you already have, pass --fresh:

autorize run myexp --fresh

--fresh recomputes the deadline from schedule, resets the per-run max_iterations budget and the consecutive-noop streak, and refreshes started_at — while preserving the prior best_score/best_iter, the autorize/<name> branch and its tip, and the full iterations.jsonl history. New iterations keep comparing against the prior best and keep numbering upward. It is a no-op on a never-run experiment, and is refused (use autorize resume) if an iteration is mid-flight. An already-past absolute schedule.deadline errors instead of looping; switch to total_budget or edit the deadline first.

Steering a run

To redirect a run in flight — without stopping it — append operator guidance:

autorize tell myexp "stop tuning the series — try a spigot algorithm instead"

tell appends a structured entry to .autorize/myexp/guidance.jsonl. The running loop re-reads that file at the top of every iteration and injects all entries into a prominent ## Operator guidance section of the agent's prompt, framed as authoritative direction. The message shows up in the next iteration and persists thereafter. The file is also safe to hand-edit; a missing/empty file simply renders no section. In v1 all guidance persists and is shown every iteration.

Config (.autorize/<name>/config.toml)

[experiment]
name = "myexp"
description = "..."

[objective]
command   = "bash score.sh"        # prints the score to stdout
direction = "min"                  # "min" | "max"
parse     = { kind = "float" }     # or { kind = "regex", pattern = "score=([0-9.]+)" }
                                   # or { kind = "jq",    path = ".metrics.loss" }
timeout   = "60s"
fail_mode = "invalid"              # "invalid" | "worst" | "abort"

[boundaries]
allow_paths = ["src/**/*.py"]      # prompt-only in v1
deny_paths  = [".autorize/**"]     # ENFORCED via diff

[setup]    { command = "",  timeout = "5m" }
[teardown] { command = "",  timeout = "1m" }

[iteration]
budget                = "5m"
max_iterations        = 0          # 0 = unbounded
keep_worktrees        = false
max_consecutive_noops = 5

[schedule]
total_budget = "4h"                # OR (exactly one):
# deadline   = "2026-05-21T09:00:00-07:00"

[agent]
command     = "claude --print {prompt_file}"   # {prompt_file}, {workdir}, {iter}
workdir_var = "AUTORIZE_WORKDIR"
stdin       = "none"                            # "none" | "prompt"

[agent.env]
ANTHROPIC_API_KEY = "$ANTHROPIC_API_KEY"

[summarize]                                     # on by default; recap each
enabled = true                                  #   iteration with a cheap model
command = 'claude --model haiku --print --tools "" --system-prompt "You are a terse summarizer. Output exactly 1-2 sentences naming the change and why it moved the score. No preamble, no markdown, no questions, no offer of further help."'
timeout = "60s"
stdin   = "prompt"                              # prompt piped on stdin (default)

When [summarize] is enabled, each iteration's recap is surfaced to the agent in later prompts under ## Recent attempt summaries (so it can learn from discarded attempts). At the top of every autorize run / resume, autorize also backfills summaries for any records still missing one — those written before you enabled [summarize], or whose summarize step failed — by replaying the persisted iter-NNNN/ artifacts (changes.diff, agent.stdout, agent.stderr). It is best-effort and skips noops and records whose artifacts are gone; the first run after enabling summaries may therefore fire several one-time model calls.

program.md lives next to config.toml and is freeform instructions for the agent — included verbatim at the top of every prompt.

On-disk layout

<repo>/
  logs/autorize.log        # central append-only run log (narrative + teed child stdio)
  .autorize/<name>/
    config.toml
    program.md
    state.json             # atomic checkpoint of loop state
    iterations.jsonl       # durable append-only log
    guidance.jsonl         # operator guidance from `autorize tell` (hand-editable)
    iter-0001/
      prompt.md            # what the agent saw
      changes.diff         # captured diff
      agent.stdout
      agent.stderr
    iter-0002/
    ...

logs/ is created on startup (gitignore it). RUST_LOG tunes verbosity (default info). At info the log is a forensic audit trail — every git call, subprocess spawn, and filesystem mutation is recorded (dozens of lines per iteration; agent.env secrets are never logged). Use RUST_LOG=warn to quiet it (also hides the run narrative).

The tracking branch autorize/<name> records every merged iteration as a single commit, so git log autorize/<name> is your improvement history and git diff main..autorize/<name> is the cumulative change.

Example

See examples/pi-digits/ for an end-to-end demo where a mock agent nudges a number in value.txt toward π:

cp -r examples/pi-digits/. /tmp/pi-demo
cd /tmp/pi-demo
git init -b main
git -c user.email=a@b -c user.name=a add .
git -c user.email=a@b -c user.name=a commit -m init
autorize run pi

Status

v1 is feature-complete on Linux and macOS (Apple Silicon). Out of scope for v1: parallel iterations, Pareto scoring, web/TUI, token accounting, retry/backoff, remote storage, allow-path enforcement (allow_paths is prompt-only).

License

AGPL-3.0-or-later.

About

An autoresearch bootstrapping tool.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors