Single-binary autonomous AI agent platform
Go backend · Embedded Next.js UI · Single-binary deploy, optional local inference stack
Octopus runs a full ReAct (Reason + Act) cognitive loop, executes tools inside a sandboxed environment, streams every step to a real-time UI, and blocks unsafe actions behind a human-approval gate. The default deployment remains a single binary with an embedded frontend; the optional local inference stack adds ONNX Runtime and managed llama.cpp components when you build and enable them.
The Next.js Mission Control frontend is compiled directly into the Go binary via go:embed. Drop the file anywhere, run it, open the browser.
Licensing Notice Octopus is source-available under PolyForm Noncommercial License 1.0.0 and is not MIT-licensed. Commercial use is not permitted under the repository license. See LICENSE for the full terms.
- One binary, everything included — frontend is embedded, no separate server needed
- ReAct FSM — Reason + Act finite state machine with configurable iteration cap
- Local Thinking (RLoT) — platform-level multi-step reasoning engine, works with any LLM
- Native Thinking — pass-through to provider native CoT (Claude Extended Thinking, o1/o3, Qwen-thinking)
- Tool Search & Dynamic Palette — LLM discovers tools on demand, never wastes context on unused tools
- MCP Integration — local
stdio, remotehttp/streamable_http/sseMCP servers - HNSW Vector Memory — encrypted AES-256-GCM vector store with ONNX or API embeddings and optional reranking
- TurboQuant Compression — vector-store compression, embedding wrapper support, and managed llama-server KV-cache settings
- HITL Approval Gate — sensitive actions require explicit human approval over WebSocket
- Policy Engine — deny rules, auto-approve rules, dangerous-command checks before execution
- Transparent streaming — every thought, tool call, and result streams as a separate card
- Provider-agnostic — OpenAI, Anthropic, Groq, Mistral, OpenRouter, Gemini, Cohere, any OpenAI-compatible endpoint
- Service-friendly — runs under
systemd, PID 1 lifecycle, journald logging
✅ Implemented · 🔶 Partial / in roadmap · ❌ Not available ·
⚠️ Requires separate service
| Feature | Octopus | OpenClo | Moltis | PicoClo | ZeroClo | NanoClo |
|---|---|---|---|---|---|---|
| Deployment | ||||||
| Single binary core deploy | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ |
| Embedded Web UI | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
| Docker-first workflow | 🔶 | ✅ | ✅ | 🔶 | ✅ | 🔶 |
| Cloud-managed option | ❌ | ✅ | ✅ | ❌ | ✅ | ❌ |
| Reasoning | ||||||
| ReAct loop | ✅ | ✅ | ✅ | ✅ | 🔶 | 🔶 |
| Native Thinking (o1, Claude, Qwen) | ✅ | ✅ | ✅ | 🔶 | 🔶 | ❌ |
| Local Thinking (RLoT, any model) | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Multi-agent orchestration | ✅ | ✅ | ✅ | ❌ | 🔶 | ❌ |
| Tools | ||||||
| Tool execution sandboxing | ✅ | 🔶 | ✅ | ❌ | ❌ | ❌ |
| HITL approval gate | ✅ | 🔶 | ✅ | ❌ | ❌ | ❌ |
| Built-in Tool Search & Palette | ✅ | ❌ | ✅ | ❌ | ❌ | ❌ |
| MCP server support | ✅ | ✅ | ✅ | 🔶 | 🔶 | ❌ |
| Policy / deny rules engine | ✅ | 🔶 | ✅ | ❌ | ❌ | ❌ |
| Memory | ||||||
| Encrypted HNSW vector store | ✅ | ❌ | ❌ | ❌ | ||
| Local ONNX embeddings | ✅ | ❌ | ❌ | ❌ | ❌ | |
| Persistent conversation history | ✅ | ✅ | ✅ | 🔶 | ✅ | ❌ |
| Local Inference | ||||||
| TurboQuant KV-cache compression | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Ollama / local vLLM integration | ✅ | ✅ | 🔶 | ✅ | 🔶 | ✅ |
| LLM Routing | ||||||
| OpenAI / Anthropic / Groq / Mistral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Gemini / Cohere / Together / xAI | ✅ | ✅ | 🔶 | 🔶 | 🔶 | 🔶 |
| Custom OpenAI-compatible endpoint | ✅ | ✅ | ✅ | ✅ | ✅ | 🔶 |
Where Octopus leads: single-binary deployment, Local Thinking (platform-level RLoT on any LLM), built-in tool search, HITL approval gate, TurboQuant compression, and a local inference control plane that stays inside the same product surface.
Where competing platforms lead: OpenClo and Moltis offer stronger multi-agent orchestration and cloud-managed hosting. PicoClo, ZeroClo, and NanoClo have simpler operational footprints but more limited feature sets. None currently ship Local Thinking, built-in Tool Search, or TurboQuant.
Honest trade-offs:
- Octopus has no native multi-agent workflow support (single agent per session)
- The Web UI is functional but not as polished as Moltis or OpenClo's dashboards
- Cloud deployment requires self-hosting (no managed SaaS option)
- ONNX chat execution is still not a production path; local chat currently leans on managed
llama.cppserver integration instead
Local Thinking (RLoT) is a platform-level multi-step reasoning engine (Decompose → Reason → Refine → Verify → Terminate) that adds structured chain-of-thought to any LLM — no provider-side reasoning support required.
Built-in Tool Search gives the model a single tool_search meta-tool on its first turn. It discovers and activates only the tools it needs, keeping the context window clean regardless of catalog size.
TurboQuant is now wired in three places: compressed vector memory, ONNX embedding wrappers, and managed llama.cpp server cache configuration. The implementation is broader than before, but the local chat story is still split across ONNX and llama.cpp paths.
curl -fsSL https://raw.githubusercontent.com/Wanderspool/Octopus/main/scripts/build.sh | bashThis clones the repository to the current directory, builds the Next.js frontend, embeds it, and compiles the Go binary to bin/octopus.
| Tool | Version | Purpose |
|---|---|---|
| Go | 1.23+ | Backend compilation |
| Node.js | 20+ | Frontend build only |
| Linux kernel | 5.10+ | Sandbox tool isolation |
| git | any | Clone repository |
git clone https://github.com/Wanderspool/Octopus.git
cd Octopus# Canonical full build: frontend → embed → ONNX-capable Go binary
make
# Or step by step:
make frontend # npm ci + npm run build (Next.js static export)
make embed # copies web/out → pkg/static/out
make build # CGO_ENABLED=1 go build -tags "with_onnx embed_onnx embed_llamacpp" -o bin/octopus ./cmd/octopus
# Deployment rebuild helper used on the target host:
./web/build.shThe binary is at bin/octopus. It contains the full UI — no separate server needed.
The default repository build is no longer a minimal CGO_ENABLED=0 profile. It intentionally targets the current product surface:
- embedded frontend assets
- ONNX-enabled embeddings and reranking
- managed
llama.cppbundle support
If you need a narrower binary for debugging build variants, the Makefile still exposes build-embed-onnx and build-embed-all.
# The default build requires CGO and the repository-pinned Go toolchain path.
export PATH=/usr/local/go/bin:$PATH
# If the ONNX runtime shared library is not discoverable automatically, point Octopus at it explicitly.
export ORT_LIB=/absolute/path/to/libonnxruntime.so
./bin/octopusCurrent scope of the local inference stack:
- Embedding ONNX models: supported when built with
with_onnx - Reranker ONNX models: supported when built with
with_onnx - Chat ONNX models: metadata/download/config surfaces exist, but runtime chat execution is still incomplete
- Managed
llama.cppbundle lifecycle: supported from the Settings UI - Managed
llama-serverlifecycle: start/stop/status supported from the Settings UI llama-serverauto-start on Octopus boot: supported via persisted inference settings- GGUF model discovery for managed
llama-server: supported via filesystem scan + UI dropdown - OpenAI-compatible local provider auto-config: supported when managed
llama-serverstarts successfully - TurboQuant live stats push: supported via periodic WebSocket notifications to the Settings UI
Managed runtime bundles:
- Octopus now includes a managed ONNX Runtime bundle layer for tracking installed runtime versions, activating one runtime at a time, and downloading newer official community releases from the Settings UI.
- The runtime manager can auto-wire
ORT_LIBfrom the currently active managed bundle. - The embedded-bundle architecture is implemented, but the repository still does not ship a fully self-sufficient local inference payload for every path.
- In practice, the product is closest to "single-binary control plane + optional managed local runtimes", not a fully hermetic local inference appliance.
What works well:
- one Settings surface for ONNX runtime bundles, ONNX models, TurboQuant, and managed
llama.cpp - persistent inference configuration in
config/settings.json - managed
llama-serverbootstrapping with auto-created OpenAI-compatible endpoint config - WebSocket-driven live refresh of TurboQuant stats in the UI
What is still uneven:
- the ONNX path is stronger for embeddings/reranking than for chat
- the
llama.cpppath is better for local chat serving than for deep in-process integration - a successful ONNX build still depends on CGO and a compatible runtime library
- local inference remains Linux-first operationally
Detailed status and trade-offs: docs/src/local-inference-status.md
Copy the example config and fill in your API key:
cp config/settings.example.json config/settings.jsonOr skip the file entirely and use environment variables:
export OCTOPUS_LLM_API_KEY=sk-... # your provider API key
export OCTOPUS_LLM_MODEL=gpt-4o # or claude-sonnet-4-20250514, etc../bin/octopusOpen http://localhost:8080 — Mission Control is ready.
# OpenAI
OCTOPUS_LLM_API_KEY=sk-... ./bin/octopus
# Anthropic
OCTOPUS_LLM_PROVIDER=anthropic \
OCTOPUS_LLM_API_KEY=sk-ant-... \
OCTOPUS_LLM_MODEL=claude-sonnet-4-20250514 \
./bin/octopus
# Groq (OpenAI-compatible)
OCTOPUS_LLM_PROVIDER=openai \
OCTOPUS_LLM_ENDPOINT=https://api.groq.com/openai/v1 \
OCTOPUS_LLM_API_KEY=gsk_... \
OCTOPUS_LLM_MODEL=llama-3.3-70b-versatile \
./bin/octopus
# Local vLLM
OCTOPUS_LLM_PROVIDER=openai \
OCTOPUS_LLM_ENDPOINT=http://localhost:8000/v1 \
OCTOPUS_LLM_API_KEY=none \
OCTOPUS_LLM_MODEL=Qwen/Qwen2.5-7B-Instruct \
./bin/octopus# Copy the binary
sudo cp bin/octopus /opt/Octopus/bin/octopus
# Install the service unit
sudo cp scripts/systemd/octopus.service /etc/systemd/system/octopus.service
sudo systemctl daemon-reload
sudo systemctl enable --now octopus
# Check status
sudo systemctl status octopus
sudo journalctl -u octopus -fOperational note: the tracked unit currently uses TimeoutStopSec=20. Under heavy shutdown paths, especially after local inference activity, a restart can hit the timeout and be escalated to SIGKILL. This is a known operational gap, not a documentation omission.
All settings have production-ready defaults. Override with environment variables:
| Variable | Default | Description |
|---|---|---|
OCTOPUS_ADDR |
:8080 |
HTTP / WebSocket listen address |
OCTOPUS_LLM_PROVIDER |
openai |
Provider: openai, anthropic |
OCTOPUS_LLM_API_KEY |
(none) | Provider API key |
OCTOPUS_LLM_MODEL |
gpt-4o |
Model identifier |
OCTOPUS_LLM_ENDPOINT |
(provider default) | Custom API base URL |
OCTOPUS_LOG_LEVEL |
info |
debug · info · warn · error |
OCTOPUS_LOG_FORMAT |
json |
json · text |
Runtime settings (thinking, generation controls, memory, MCP servers, tool toggles) are managed through
the Settings UI and persisted to config/settings.json.
Full reference → docs/src/configuration.md
main.go (DI wiring) │
Transport │ Use Cases │ Adapters │
┌────────────┐ │ ┌─────────────────┐ │ ┌─────────────┐ │
│ WS Hub │──┼─▶│ Orchestrator │──┼─▶│ Providers │ │
│ Dispatch │ │ │ ReAct Loop │ │ │ Tools │ │
└────────────┘ │ │ Thinking Eng. │ │ │ MCP Tools │ │
│ │ Conversation │ │ │ Storage │ │
│ │ Policy │ │ │ Approval │ │
│ └─────────────────┘ │ │ Embeddings │ │
│ │ │ VectorStore│ │
│ │ │ TurboQuant │ │
│ │ └─────────────┘ │
Domain (zero deps) │
agent · tool · session · memory · message · llm · thinking │
embedder · vector (VectorStore, Embedder, Reranker) │
Full design docs → docs/src/clean-architecture.md
Octopus has two independent reasoning modes that can run simultaneously:
Local Thinking (RLoT) — runs on the platform, works with any LLM:
User message → Decompose → Reason → Verify → Terminate → enriched prompt → ReAct loop
↑ │
└─Refine──┘ (PRM rejection)
Native Thinking — delegates to the model's built-in CoT (Claude Extended Thinking, o1, Qwen).
Both can be enabled at the same time: RLoT pre-processes the input, then the inner ReAct loop runs with native thinking active on top.
Full details: docs/src/thinking-engine.md
The model receives exactly one meta-tool on its first turn: tool_search. To use any other tool it must first discover it by search and explicitly activate it into the session palette. This keeps prompt token usage proportional to what the model actually needs, regardless of how many tools are installed.
Turn 1: LLM sees: [tool_search]
LLM calls: tool_search("search", query="file editing")
Returns: file_read, file_write, file_patch
Turn 2: LLM calls: tool_search("activate", tool_id="file_read")
Palette: [tool_search, file_read]
Turn 3: LLM calls: file_read(path="/src/main.go")
Full details: docs/src/tool-search.md
Encrypted HNSW vector memory with zero external dependencies:
- Pure-Go HNSW index (
M=16, cosine similarity via L2-normalised inner product) - AES-256-GCM at-rest encryption (auto-generated key, atomic writes)
- API embeddings: OpenAI (
text-embedding-3-small/large), Cohere (embed-v3) - Local ONNX embeddings (build with
-tags with_onnx) - Optional reranking: Cohere Rerank v3 or local ONNX cross-encoder
- Graceful fallback to keyword store on any I/O error
Full details: docs/src/memory.md
TurboQuant in the current product:
- WHT rotation → decorrelate embedding dimensions
- Max-Lloyd codebooks → optimal scalar quantisation per channel
- QJL residual → quantised Johnson-Lindenstrauss residual coding
- 1-4 bit modes for vector compression depending on the path
- vector-store compression is implemented directly inside Octopus
- ONNX embedding compression is wired through
TurboQuantEmbedder - managed
llama-serverconsumes persisted KV cache settings from the same inference panel
Full details: docs/src/turboquant.md
WebSocket endpoint: ws://localhost:8080/ws — JSON-RPC 2.0
| Method | Direction | Description |
|---|---|---|
session/initialize |
C→S | Create a new agent session |
session/message |
C→S | Send a message, trigger ReAct cycle |
agent/interrupt |
C→S | Abort the current reasoning loop |
agent/stream |
S→C | Streaming step deltas |
tool/requestApproval |
S→C | HITL approval request |
tool/approvalResponse |
C→S | Human approval decision |
tool/palette/search |
C→S | Search available tools |
tool/palette/activate |
C→S | Activate a tool into the session palette |
sandbox/telemetry |
S→C | Live stdout/stderr from tool execution |
models/list |
C→S | List all models with capability flags |
settings/agentloop/memory/get |
C→S | Get Dialog Memory configuration |
settings/agentloop/memory/set |
C→S | Update Dialog Memory configuration |
Full reference → docs/src/api.md
| Document | Description |
|---|---|
| Configuration | All environment variables and defaults |
| API Reference | Full JSON-RPC 2.0 WebSocket protocol |
| LLM Providers | Provider setup and model routing |
| Local Inference Status | Current state, strengths, limits, operational nuances |
| Agent Scheduling | Permanent agents, schedule kinds, Mission Control semantics |
| MCP Servers | Extending with external tool servers |
| Memory | HNSW vector retrieval, storage layout, encryption |
| Tool Search & Palette | Dynamic tool discovery |
| HITL Approval | Human-in-the-loop approval flow |
| Thinking Engine | RLoT + native model reasoning |
| TurboQuant Compression | KV-cache vector compression |
| Clean Architecture | Layer boundaries and DI design |
| ReAct & Lane Queue | FSM orchestration |
| Tool Execution | Policy-guarded execution pipeline |
| Operations | Build, service, runtime files, deployment |
| Streaming | Real-time step rendering |
| UI Design | Mission Control card system |
| Completion Status | Implemented areas, active gaps, current status snapshot |
Octopus is licensed under the PolyForm Noncommercial License 1.0.0. This repository is source-available for noncommercial use only and is not offered under the MIT license.
See LICENSE for the complete license text.
Required Notice: Copyright Wanderspool (https://github.com/Wanderspool)