Agent Forge

A sandboxed AI coding agent runtime that autonomously modifies codebases through LLM-driven reasoning and isolated tool execution.

Overview

Agent Forge implements the ReAct (Reasoning + Acting) pattern: an agent receives a coding task, iteratively reasons about what to do via an LLM, invokes tools inside ephemeral Docker containers, and loops until the task is complete.

graph TD
    CLI["CLI<br/>(Click commands)"]
    CLI -->|"agent-forge run"| Core

    subgraph Core["Agent Core"]
        Loop["ReAct Loop<br/>Observe → Reason → Act"]
        State["State Machine<br/>(PENDING → RUNNING → ...)"]
        Persist["Persistence<br/>(save/load runs)"]
    end

    Loop -->|"prompt + tools"| LLM
    LLM -->|"function calls"| Loop
    Loop -->|"execute tool"| Sandbox
    Sandbox -->|"result"| Loop

    subgraph LLM["LLM Provider"]
        Gemini["Gemini 3.1<br/>(primary)"]
    end

    subgraph Sandbox["Docker Sandbox"]
        Docker["Ephemeral Container<br/>(per-run, isolated)"]
    end

    subgraph Tools["Tool Registry"]
        read_file
        write_file
        edit_file
        list_directory
        run_shell
        search_codebase
    end

    Sandbox ---|"workspace<br/>bind-mount"| Tools

Key Features

🔒 Sandboxed Execution — Every tool invocation runs in an ephemeral Docker container with resource limits — never on the host.
🧠 Gemini 3.1 Ready — Full support for thought signatures, exponential backoff with jitter, and Retry-After header.
🔌 Extensible LLM Layer — Gemini adapter built, OpenAI and Anthropic interfaces defined for easy addition.
📊 Observability — Structured JSON logs, trace IDs, token/cost tracking on every run.
💾 Run Persistence — Every agent run is saved to disk with full conversation history and tool invocations.
🧩 Extensible Tools — Add new tools by implementing a simple Tool ABC and registering them.

Quick Start

✅ Verified end-to-end with gemini-3.1-flash-lite-preview on 2026-03-07 — 17/17 E2E tests passing.

Prerequisites

Python 3.11+
Docker
A Gemini API key (or OpenAI/Anthropic)

Installation

# Clone the repository
git clone https://github.com/akoita/agent-forge.git
cd agent-forge

# Install in development mode
pip install -e ".[dev]"

# Build the sandbox Docker image
make build-sandbox

# Set your API key
export GEMINI_API_KEY="your-key-here"

Usage

# Run an agent task (direct mode — default)
agent-forge run \
  --task "Add input validation to the /api/users endpoint" \
  --repo ./path/to/your/repo

# Run via queue → worker pipeline (in-memory)
agent-forge run \
  --task "Fix login bug" \
  --repo ./my-app \
  --queue memory

# Run via Redis queue (requires Redis)
agent-forge run \
  --task "Refactor auth module" \
  --repo ./my-app \
  --queue redis \
  --redis-url redis://localhost:6379/0 \
  --max-concurrent-runs 4

# Check run status
agent-forge status <run-id>

# List recent runs
agent-forge list

# View resolved configuration
agent-forge config

Demo

Run the demo locally

# Watch the simulated demo
bash scripts/demo.sh

# Record a new asciinema cast
bash scripts/record-demo.sh

Configuration

Agent Forge uses a layered configuration system (CLI flags > env vars > project config > user config > defaults).

Create an agent-forge.toml in your project root:

[agent]
max_iterations = 25
default_provider = "gemini"
default_model = "gemini-3.1-flash-lite-preview"

[sandbox]
memory_limit = "512m"
timeout_seconds = 300
network_enabled = false

See the Configuration Guide for full reference.

Development

# Install dev dependencies
pip install -e ".[dev,redis]"

# Run unit tests
make test-unit

# Run all tests (requires Docker)
make test

# Run e2e tests (requires GEMINI_API_KEY + Docker)
make test-e2e

# Lint & format
make lint
make format

Project Structure

agent_forge/
├── agent/         # ReAct loop, state machine, prompts, persistence
├── llm/           # LLM provider adapters (Gemini, OpenAI, Anthropic)
├── tools/         # Built-in tools (read_file, write_file, run_shell, etc.)
├── sandbox/       # Docker sandbox management
├── orchestration/ # Task queue, event bus, workers
├── observability/ # Structured logging, tracing, cost tracking
├── cli.py         # Click-based CLI entry point
└── config.py      # Layered configuration system

Documentation

Architecture — System design, layer responsibilities, ReAct loop sequence.
Configuration — Full config reference (TOML, env vars, CLI flags, precedence).
Testing — Running tests, writing new ones, CI workflows, coverage.
Extending — Adding tools, LLM providers, custom sandbox configs.
Technical Spec — Full specification with interface contracts and data models.

Roadmap

Phase	Focus	Status
1	Core Agent MVP — ReAct loop + Docker sandbox + CLI	✅ Complete
2	Production Hardening — Observability, multi-provider, Redis queue	🚧 In Progress
3	Git-Aware Agent & Plugin System	⬜ Planned
4	Web Dashboard & REST API	⬜ Planned
5	Multi-Agent Collaboration	⬜ Planned
6	Advanced Isolation & Scaling (microVMs, K8s)	⬜ Planned
7	Platform & Ecosystem (MCP, marketplace, IDE plugins)	⬜ Planned

See spec.md § Roadmap for detailed milestones.

Contributing

Contributions are welcome! Please read our Contributing Guide and Code of Conduct before submitting a pull request.

Security

If you discover a security vulnerability, please follow our Security Policy for responsible disclosure.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.agent/workflows		.agent/workflows
.github		.github
agent_forge		agent_forge
assets		assets
docs		docs
scripts		scripts
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
agent-forge.toml		agent-forge.toml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
vulture_whitelist.py		vulture_whitelist.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Forge

Overview

Key Features

Quick Start

Prerequisites

Installation

Usage

Demo

Configuration

Development

Project Structure

Documentation

Roadmap

Contributing

Security

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Forge

Overview

Key Features

Quick Start

Prerequisites

Installation

Usage

Demo

Configuration

Development

Project Structure

Documentation

Roadmap

Contributing

Security

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages