swarmweave

Multi-agent swarms you can actually see, debug, and improve.

Sequential or parallel multi-agent orchestration on top of LangGraph + the OpenAI API, with three capabilities that most multi-agent frameworks don't bundle together:

What this is not: not a single-agent memory plugin, not a production observability platform (see Langfuse), not a model router (see LiteLLM). swarmweave is a focused scaffold for building multi-agent pipelines where workers need to see each other's work — and you need to see what's happening.

Live terminal dashboard — swarmweave watch your_swarm.py opens a split-screen Rich TUI that streams worker status, tool calls, shared-context flow, and accumulating token cost in real time. No more print-statement debugging.
Self-improving via lessons — every successful run can distill 1-5 transferable lessons. The next similar run pre-loads them into shared context. Plain JSONL under ~/.swarmweave/lessons/, fully editable, no telemetry.
Shared-context substrate — workers in sequential pipelines read each other's findings via embedding-based retrieval (default text-embedding-3-small, automatic Jaccard fallback offline).

git clone https://github.com/manav8498/swarmweave
cd swarmweave
./init.sh
source .venv/bin/activate
export OPENAI_API_KEY=sk-...
swarmweave watch examples/03_support_triage_swarm/main.py

The 30-line quickstart

from swarmweave import Supervisor, Worker, SharedContext
from swarmweave.backends import LocalBackend

ctx = SharedContext(backend=LocalBackend())

supervisor = Supervisor(
    shared_context=ctx,
    model="gpt-4o-mini",
    mode="sequential",   # or "parallel"
    workers=[
        Worker(name="classifier", role="Categorize the ticket"),
        Worker(name="resolver",   role="Draft a reply using the category"),
        Worker(name="verifier",   role="Check the draft against policy"),
    ],
)

result = supervisor.run("Handle this support ticket: ...")
print(result.final_output)

The whole public API is three primitives — SharedContext, Supervisor, Worker. Optional Mentor for cross-run learning, EventBus for custom observability.

Should you use this?

Use swarmweave if you're building:

Sequential pipelines like classify → draft → verify, where later workers genuinely need earlier work
Multi-specialist reviewers (security / perf / style / coverage) that produce one prioritized output
Research-style fan-out with shared synthesis
Anything where you've been fighting LangGraph's state graph to express "worker B reads what worker A wrote"

Don't use it if:

Your task is a single LLM call — use the OpenAI SDK directly
You need production observability (traces, eval datasets, dashboards) — use Langfuse or LangSmith
You need cost/rate guardrails at the proxy level — use LiteLLM
You need Anthropic/Claude support today — v0.1 is OpenAI API only; Anthropic adapter is planned for v0.2
Your existing setup works and you're not feeling pain around worker coordination or debugging — switching cost isn't worth it

This library is a focused scaffold, not a replacement for the broader ecosystem.

Why this pattern

Most multi-agent systems coordinate through the supervisor. Workers run in private context windows; everything they know about each other has to fit in whatever the supervisor relays at handoff. Re-summarization is lossy, workers redo each other's work, and every additional worker makes the coordination problem worse.

swarmweave replaces that with a thin shared-context layer. In sequential mode, each worker's SharedContext.aread() returns every prior worker's findings — so verifier reads resolver's draft, fixer reads investigator's analysis, etc. In parallel mode, workers fan out simultaneously and the supervisor synthesizes from their collective trace.

Architecture

Class	Role
`SharedContext`	The substrate. `aread()` returns the slice of past observations most relevant to a task; `awrite()` appends a new one. Backed by a swappable `Backend`.
`Supervisor`	Decomposes the task, runs workers (parallel or sequential), synthesizes the final answer, finalizes the outcome.
`Worker`	One specialist. Reads context, autonomously calls tools, writes a finding. Has a forced wrap-up call so reasoning models actually commit to an answer when their tool budget runs out.
`EventBus`	Optional. Streams every event for live observability or custom dashboards.
`Mentor` / `LessonBook`	Optional. Persistent cross-run memory that makes the swarm self-improving.

Live TUI — `swarmweave watch`

┌────── workers ──────┐  ┌────────── shared-context flow ──────────┐
│ classifier  ✓ done  │  │ 14:22:01 [supervisor] decompose_done    │
│ resolver    ⠋ tool  │  │ 14:22:14 [classifier] WRITE → billing   │
│ verifier    · idle  │  │ 14:22:30 [resolver]   READ ← 1 prior    │
└─────────────────────┘  │ 14:23:11 [resolver]   tool_call: lookup │
                         └─────────────────────────────────────────┘
┌─ metrics ───────────────────────────────────────────────────────┐
│ elapsed 02:14 │ calls 12 │ in 8,420 │ out 23,991 │ ~$0.0040     │
└──────────────────────────────────────────────────────────────────┘

Built on Rich. 8 fps refresh. Opt-in via EventBus — zero overhead when unused.

Self-improving — `Mentor` + `LessonBook`

from swarmweave import Mentor

mentor = Mentor(book="customer_support")
await mentor.before_run(supervisor, user_task)   # preloads relevant past lessons
result = await supervisor.arun(user_task)
await mentor.after_run(supervisor, user_task, result)  # extracts new lessons

Lessons live as plain JSONL under ~/.swarmweave/lessons/<book>.jsonl. Fully readable, editable, deletable. No telemetry leaves your machine.

Run examples/04_self_improving_swarm/main.py twice — round 2 reads lessons from round 1. The improvement is demonstrable, not theoretical.

Examples

Four runnable examples under examples/. Each is ~80 lines.

Example	Pattern	What it shows
`01_research_swarm`	parallel fan-out	Three researchers + supervisor synthesis
`02_code_review_swarm`	parallel specialists	security / perf / style / coverage → prioritized review
`03_support_triage_swarm`	sequential pipeline	classify → resolve → verify with policy check
`04_self_improving_swarm`	sequential + Mentor	Same task run twice, second round reads first round's lessons

Watch any of them live in the TUI:

swarmweave watch examples/03_support_triage_swarm/main.py

Benchmark

A 20-task multi-hop benchmark comparing isolated-context (each worker has private memory) against shared-context (workers read each other's findings), sequential mode, same workers, same model, same tools — only the backend differs.

What shared context demonstrably changes in each run:

Worker B reads Worker A's reasoning, not just the original task — a resolver building on a classifier's output uses the actual classification rationale, not a re-statement of the user's query.
Verifiers correct based on actual prior work — a policy-check step reads what the resolver drafted and cites specific lines, instead of checking a hypothetical.
Lessons accumulate across runs — the second run of the same task pre-loads distilled findings from the first, compressing warm-up time.

On raw accuracy: across runs on this 20-task benchmark at gpt-4o-mini, both modes landed in the 85–100% range — essentially tied within model variance. Simple multi-hop factual accuracy is not where the coordination wins show up. The wins are in reasoning chain quality, less duplicated work between workers, and verifier precision — qualitative differences the TUI makes visible and the accuracy rubric only partially captures.

To see the difference concretely, run examples/03_support_triage_swarm/main.py and watch the shared-context flow panel. Or use scripts/real_world_review.py to A/B test both modes against your own codebase.

Reproduce with:

python benchmarks/run_benchmark.py

Full methodology, rubric, and per-task results are in benchmarks/README.md.

Installation

pip install swarmweave

Or from source:

git clone https://github.com/manav8498/swarmweave
cd swarmweave

# macOS / Linux
./init.sh && source .venv/bin/activate

# Windows (PowerShell)
./init.ps1

# Cross-platform alternative (works anywhere Python is)
python -m venv .venv && source .venv/bin/activate   # or .venv\Scripts\Activate.ps1 on Windows
pip install -e ".[dev]"

export OPENAI_API_KEY=sk-...

Requirements: Python 3.11+, an OpenAI API key. Default model is gpt-4o-mini — cheap, fast, available to every account.

Cost note: The examples under examples/ and the benchmark under benchmarks/ call the real OpenAI API. With gpt-4o-mini defaults, a full example run costs roughly $0.001–$0.01; the full 20-task benchmark costs roughly $0.05–$0.20. No telemetry is sent anywhere else.

How this compares

	LangGraph	CrewAI	AutoGen	swarmweave
Graph orchestration	✅	partial	partial	✅ (built on LangGraph)
Sequential workers see each other's findings	manual	partial	manual	✅ via SharedContext
Live terminal UI	—	—	—	✅ `swarmweave watch`
Cross-run learning	—	partial	—	✅ Mentor + LessonBook
Embedding retrieval as default	—	configurable	—	✅ `text-embedding-3-small`
Forced wrap-up when reasoning models loop	—	—	—	✅
Apache-2.0, no telemetry	✅	✅	✅	✅

We sit on top of LangGraph, not against it. Bring an existing LangGraph app and compose build_swarm_graph(supervisor) as a sub-graph.

For persistent memory of a single agent's session, see also claude-mem. For cross-run learning in a different orchestration style, see agent-swarm. swarmweave differs in the retrieval model (embedding + recency vs. flat append) and the shared-context layer that connects workers within a run, not just across runs.

Documentation

docs/architecture.md — the thesis, why shared context matters
docs/api_reference.md — every public class and method
docs/custom_backends.md — bring your own retrieval backend in ~80 lines
CHANGELOG.md — release history

Roadmap

v0.1 (this release) — LocalBackend with embeddings + Jaccard fallback, LangGraph adapter, sequential/parallel modes, live TUI, Mentor/LessonBook, EventBus, four example swarms, reproducible benchmark.
v0.2 — OpenAI Agents SDK adapter and swarmweave replay <session> for post-run inspection.
v0.3 — Anthropic Agent SDK adapter, MCP-tool support in Worker, durable cross-process sessions, production-grade cost guardrails.

Contributing

PRs welcome. See CONTRIBUTING.md for dev setup and the lint/type/test gates, and CODE_OF_CONDUCT.md.

Security

Please report vulnerabilities per SECURITY.md. Do not open public issues for suspected security bugs.

License

Apache-2.0. See LICENSE.

Maintainer

swarmweave is an independent open-source project by @manav8498. Issues and PRs welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

swarmweave

The 30-line quickstart

Should you use this?

Why this pattern

Architecture

Live TUI — `swarmweave watch`

Self-improving — `Mentor` + `LessonBook`

Examples

Benchmark

Installation

How this compares

Documentation

Roadmap

Contributing

Security

License

Maintainer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
scripts		scripts
src/swarmweave		src/swarmweave
tests		tests
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
init.ps1		init.ps1
init.sh		init.sh
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

swarmweave

The 30-line quickstart

Should you use this?

Why this pattern

Architecture

Live TUI — swarmweave watch

Self-improving — Mentor + LessonBook

Examples

Benchmark

Installation

How this compares

Documentation

Roadmap

Contributing

Security

License

Maintainer

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Live TUI — `swarmweave watch`

Self-improving — `Mentor` + `LessonBook`

Packages