🐙 Octopus

Single-binary autonomous AI agent platform

Go backend · Embedded Next.js UI · Single-binary deploy, optional local inference stack

Octopus runs a full ReAct (Reason + Act) cognitive loop, executes tools inside a sandboxed environment, streams every step to a real-time UI, and blocks unsafe actions behind a human-approval gate. The default deployment remains a single binary with an embedded frontend; the optional local inference stack adds ONNX Runtime and managed llama.cpp components when you build and enable them.

The Next.js Mission Control frontend is compiled directly into the Go binary via go:embed. Drop the file anywhere, run it, open the browser.

Licensing Notice Octopus is source-available under PolyForm Noncommercial License 1.0.0 and is not MIT-licensed. Commercial use is not permitted under the repository license. See LICENSE for the full terms.

✨ Features

One binary, everything included — frontend is embedded, no separate server needed
ReAct FSM — Reason + Act finite state machine with configurable iteration cap
Local Thinking (RLoT) — platform-level multi-step reasoning engine, works with any LLM
Native Thinking — pass-through to provider native CoT (Claude Extended Thinking, o1/o3, Qwen-thinking)
Tool Search & Dynamic Palette — LLM discovers tools on demand, never wastes context on unused tools
MCP Integration — local stdio, remote http / streamable_http / sse MCP servers
HNSW Vector Memory — encrypted AES-256-GCM vector store with ONNX or API embeddings and optional reranking
TurboQuant Compression — vector-store compression, embedding wrapper support, and managed llama-server KV-cache settings
HITL Approval Gate — sensitive actions require explicit human approval over WebSocket
Policy Engine — deny rules, auto-approve rules, dangerous-command checks before execution
Transparent streaming — every thought, tool call, and result streams as a separate card
Provider-agnostic — OpenAI, Anthropic, Groq, Mistral, OpenRouter, Gemini, Cohere, any OpenAI-compatible endpoint
Service-friendly — runs under systemd, PID 1 lifecycle, journald logging

🆚 Comparison

vs. Specialised Agent Platforms

✅ Implemented · 🔶 Partial / in roadmap · ❌ Not available · ⚠️ Requires separate service

Feature	Octopus	OpenClo	Moltis	PicoClo	ZeroClo	NanoClo
Deployment
Single binary core deploy	✅	❌	❌	✅	✅	✅
Embedded Web UI	✅	✅	✅	❌	❌	❌
Docker-first workflow	🔶	✅	✅	🔶	✅	🔶
Cloud-managed option	❌	✅	✅	❌	✅	❌
Reasoning
ReAct loop	✅	✅	✅	✅	🔶	🔶
Native Thinking (o1, Claude, Qwen)	✅	✅	✅	🔶	🔶	❌
Local Thinking (RLoT, any model)	✅	❌	❌	❌	❌	❌
Multi-agent orchestration	✅	✅	✅	❌	🔶	❌
Tools
Tool execution sandboxing	✅	🔶	✅	❌	❌	❌
HITL approval gate	✅	🔶	✅	❌	❌	❌
Built-in Tool Search & Palette	✅	❌	✅	❌	❌	❌
MCP server support	✅	✅	✅	🔶	🔶	❌
Policy / deny rules engine	✅	🔶	✅	❌	❌	❌
Memory
Encrypted HNSW vector store	✅	⚠️	⚠️	❌	❌	❌
Local ONNX embeddings	✅	❌	⚠️	❌	❌	❌
Persistent conversation history	✅	✅	✅	🔶	✅	❌
Local Inference
TurboQuant KV-cache compression	✅	❌	❌	❌	❌	❌
Ollama / local vLLM integration	✅	✅	🔶	✅	🔶	✅
LLM Routing
OpenAI / Anthropic / Groq / Mistral	✅	✅	✅	✅	✅	✅
Gemini / Cohere / Together / xAI	✅	✅	🔶	🔶	🔶	🔶
Custom OpenAI-compatible endpoint	✅	✅	✅	✅	✅	🔶

Where Octopus leads: single-binary deployment, Local Thinking (platform-level RLoT on any LLM), built-in tool search, HITL approval gate, TurboQuant compression, and a local inference control plane that stays inside the same product surface.

Where competing platforms lead: OpenClo and Moltis offer stronger multi-agent orchestration and cloud-managed hosting. PicoClo, ZeroClo, and NanoClo have simpler operational footprints but more limited feature sets. None currently ship Local Thinking, built-in Tool Search, or TurboQuant.

Honest trade-offs:

Octopus has no native multi-agent workflow support (single agent per session)
The Web UI is functional but not as polished as Moltis or OpenClo's dashboards
Cloud deployment requires self-hosting (no managed SaaS option)
ONNX chat execution is still not a production path; local chat currently leans on managed llama.cpp server integration instead

Local Thinking (RLoT) is a platform-level multi-step reasoning engine (Decompose → Reason → Refine → Verify → Terminate) that adds structured chain-of-thought to any LLM — no provider-side reasoning support required.

Built-in Tool Search gives the model a single tool_search meta-tool on its first turn. It discovers and activates only the tools it needs, keeping the context window clean regardless of catalog size.

TurboQuant is now wired in three places: compressed vector memory, ONNX embedding wrappers, and managed llama.cpp server cache configuration. The implementation is broader than before, but the local chat story is still split across ONNX and llama.cpp paths.

🚀 Installation

One-liner (Linux, recommended)

curl -fsSL https://raw.githubusercontent.com/Wanderspool/Octopus/main/scripts/build.sh | bash

This clones the repository to the current directory, builds the Next.js frontend, embeds it, and compiles the Go binary to bin/octopus.

Step-by-step setup

1. Prerequisites

Tool	Version	Purpose
Go	1.23+	Backend compilation
Node.js	20+	Frontend build only
Linux kernel	5.10+	Sandbox tool isolation
git	any	Clone repository

2. Clone the repository

git clone https://github.com/Wanderspool/Octopus.git
cd Octopus

3. Build

# Canonical full build: frontend → embed → ONNX-capable Go binary
make

# Or step by step:
make frontend   # npm ci + npm run build (Next.js static export)
make embed      # copies web/out → pkg/static/out
make build      # CGO_ENABLED=1 go build -tags "with_onnx embed_onnx embed_llamacpp" -o bin/octopus ./cmd/octopus

# Deployment rebuild helper used on the target host:
./web/build.sh

The binary is at bin/octopus. It contains the full UI — no separate server needed.

The default repository build is no longer a minimal CGO_ENABLED=0 profile. It intentionally targets the current product surface:

embedded frontend assets
ONNX-enabled embeddings and reranking
managed llama.cpp bundle support

If you need a narrower binary for debugging build variants, the Makefile still exposes build-embed-onnx and build-embed-all.

Build prerequisites for the default profile

# The default build requires CGO and the repository-pinned Go toolchain path.
export PATH=/usr/local/go/bin:$PATH

# If the ONNX runtime shared library is not discoverable automatically, point Octopus at it explicitly.
export ORT_LIB=/absolute/path/to/libonnxruntime.so
./bin/octopus

Current scope of the local inference stack:

Embedding ONNX models: supported when built with with_onnx
Reranker ONNX models: supported when built with with_onnx
Chat ONNX models: metadata/download/config surfaces exist, but runtime chat execution is still incomplete
Managed llama.cpp bundle lifecycle: supported from the Settings UI
Managed llama-server lifecycle: start/stop/status supported from the Settings UI
llama-server auto-start on Octopus boot: supported via persisted inference settings
GGUF model discovery for managed llama-server: supported via filesystem scan + UI dropdown
OpenAI-compatible local provider auto-config: supported when managed llama-server starts successfully
TurboQuant live stats push: supported via periodic WebSocket notifications to the Settings UI

Managed runtime bundles:

Octopus now includes a managed ONNX Runtime bundle layer for tracking installed runtime versions, activating one runtime at a time, and downloading newer official community releases from the Settings UI.
The runtime manager can auto-wire ORT_LIB from the currently active managed bundle.
The embedded-bundle architecture is implemented, but the repository still does not ship a fully self-sufficient local inference payload for every path.
In practice, the product is closest to "single-binary control plane + optional managed local runtimes", not a fully hermetic local inference appliance.

Current local inference status

What works well:

one Settings surface for ONNX runtime bundles, ONNX models, TurboQuant, and managed llama.cpp
persistent inference configuration in config/settings.json
managed llama-server bootstrapping with auto-created OpenAI-compatible endpoint config
WebSocket-driven live refresh of TurboQuant stats in the UI

What is still uneven:

the ONNX path is stronger for embeddings/reranking than for chat
the llama.cpp path is better for local chat serving than for deep in-process integration
a successful ONNX build still depends on CGO and a compatible runtime library
local inference remains Linux-first operationally

Detailed status and trade-offs: docs/src/local-inference-status.md

4. Configure

Copy the example config and fill in your API key:

cp config/settings.example.json config/settings.json

Or skip the file entirely and use environment variables:

export OCTOPUS_LLM_API_KEY=sk-...           # your provider API key
export OCTOPUS_LLM_MODEL=gpt-4o             # or claude-sonnet-4-20250514, etc.

5. Run

./bin/octopus

Open http://localhost:8080 — Mission Control is ready.

Provider quick-start

# OpenAI
OCTOPUS_LLM_API_KEY=sk-... ./bin/octopus

# Anthropic
OCTOPUS_LLM_PROVIDER=anthropic \
OCTOPUS_LLM_API_KEY=sk-ant-... \
OCTOPUS_LLM_MODEL=claude-sonnet-4-20250514 \
./bin/octopus

# Groq (OpenAI-compatible)
OCTOPUS_LLM_PROVIDER=openai \
OCTOPUS_LLM_ENDPOINT=https://api.groq.com/openai/v1 \
OCTOPUS_LLM_API_KEY=gsk_... \
OCTOPUS_LLM_MODEL=llama-3.3-70b-versatile \
./bin/octopus

# Local vLLM
OCTOPUS_LLM_PROVIDER=openai \
OCTOPUS_LLM_ENDPOINT=http://localhost:8000/v1 \
OCTOPUS_LLM_API_KEY=none \
OCTOPUS_LLM_MODEL=Qwen/Qwen2.5-7B-Instruct \
./bin/octopus

Run as a systemd service

# Copy the binary
sudo cp bin/octopus /opt/Octopus/bin/octopus

# Install the service unit
sudo cp scripts/systemd/octopus.service /etc/systemd/system/octopus.service
sudo systemctl daemon-reload
sudo systemctl enable --now octopus

# Check status
sudo systemctl status octopus
sudo journalctl -u octopus -f

Operational note: the tracked unit currently uses TimeoutStopSec=20. Under heavy shutdown paths, especially after local inference activity, a restart can hit the timeout and be escalated to SIGKILL. This is a known operational gap, not a documentation omission.

⚙️ Configuration

All settings have production-ready defaults. Override with environment variables:

Variable	Default	Description
`OCTOPUS_ADDR`	`:8080`	HTTP / WebSocket listen address
`OCTOPUS_LLM_PROVIDER`	`openai`	Provider: `openai`, `anthropic`
`OCTOPUS_LLM_API_KEY`	(none)	Provider API key
`OCTOPUS_LLM_MODEL`	`gpt-4o`	Model identifier
`OCTOPUS_LLM_ENDPOINT`	(provider default)	Custom API base URL
`OCTOPUS_LOG_LEVEL`	`info`	`debug` · `info` · `warn` · `error`
`OCTOPUS_LOG_FORMAT`	`json`	`json` · `text`

Runtime settings (thinking, generation controls, memory, MCP servers, tool toggles) are managed through the Settings UI and persisted to config/settings.json.

Full reference → docs/src/configuration.md

🏗 Architecture


                        main.go  (DI wiring)                  │

    Transport     │      Use Cases         │     Adapters      │
  ┌────────────┐  │  ┌─────────────────┐  │  ┌─────────────┐  │
  │  WS Hub    │──┼─▶│  Orchestrator   │──┼─▶│  Providers  │  │
  │  Dispatch  │  │  │  ReAct Loop     │  │  │  Tools      │  │
  └────────────┘  │  │  Thinking Eng.  │  │  │  MCP Tools  │  │
                  │  │  Conversation   │  │  │  Storage    │  │
                  │  │  Policy         │  │  │  Approval   │  │
                  │  └─────────────────┘  │  │  Embeddings │  │
                  │                       │  │  VectorStore│  │
                  │                       │  │  TurboQuant │  │
                  │                       │  └─────────────┘  │

                      Domain  (zero deps)                     │
   agent · tool · session · memory · message · llm · thinking │
   embedder · vector (VectorStore, Embedder, Reranker)        │

Full design docs → docs/src/clean-architecture.md

🧠 Thinking Engine

Octopus has two independent reasoning modes that can run simultaneously:

Local Thinking (RLoT) — runs on the platform, works with any LLM:

User message → Decompose → Reason → Verify → Terminate → enriched prompt → ReAct loop
                                         ↑         │
                                         └─Refine──┘ (PRM rejection)

Native Thinking — delegates to the model's built-in CoT (Claude Extended Thinking, o1, Qwen).

Both can be enabled at the same time: RLoT pre-processes the input, then the inner ReAct loop runs with native thinking active on top.

Full details: docs/src/thinking-engine.md

🔍 Tool Search & Dynamic Palette

The model receives exactly one meta-tool on its first turn: tool_search. To use any other tool it must first discover it by search and explicitly activate it into the session palette. This keeps prompt token usage proportional to what the model actually needs, regardless of how many tools are installed.

Turn 1:  LLM sees:    [tool_search]
         LLM calls:   tool_search("search", query="file editing")
         Returns:     file_read, file_write, file_patch

Turn 2:  LLM calls:   tool_search("activate", tool_id="file_read")
         Palette:     [tool_search, file_read]

Turn 3:  LLM calls:   file_read(path="/src/main.go")

Full details: docs/src/tool-search.md

🧠 Memory (Vector Search)

Encrypted HNSW vector memory with zero external dependencies:

Pure-Go HNSW index (M=16, cosine similarity via L2-normalised inner product)
AES-256-GCM at-rest encryption (auto-generated key, atomic writes)
API embeddings: OpenAI (text-embedding-3-small/large), Cohere (embed-v3)
Local ONNX embeddings (build with -tags with_onnx)
Optional reranking: Cohere Rerank v3 or local ONNX cross-encoder
Graceful fallback to keyword store on any I/O error

Full details: docs/src/memory.md

⚡ TurboQuant Compression

TurboQuant in the current product:

WHT rotation → decorrelate embedding dimensions
Max-Lloyd codebooks → optimal scalar quantisation per channel
QJL residual → quantised Johnson-Lindenstrauss residual coding
1-4 bit modes for vector compression depending on the path
vector-store compression is implemented directly inside Octopus
ONNX embedding compression is wired through TurboQuantEmbedder
managed llama-server consumes persisted KV cache settings from the same inference panel

Full details: docs/src/turboquant.md

📡 API

WebSocket endpoint: ws://localhost:8080/ws — JSON-RPC 2.0

Method	Direction	Description
`session/initialize`	C→S	Create a new agent session
`session/message`	C→S	Send a message, trigger ReAct cycle
`agent/interrupt`	C→S	Abort the current reasoning loop
`agent/stream`	S→C	Streaming step deltas
`tool/requestApproval`	S→C	HITL approval request
`tool/approvalResponse`	C→S	Human approval decision
`tool/palette/search`	C→S	Search available tools
`tool/palette/activate`	C→S	Activate a tool into the session palette
`sandbox/telemetry`	S→C	Live stdout/stderr from tool execution
`models/list`	C→S	List all models with capability flags
`settings/agentloop/memory/get`	C→S	Get Dialog Memory configuration
`settings/agentloop/memory/set`	C→S	Update Dialog Memory configuration

Full reference → docs/src/api.md

📚 Documentation

Document	Description
Configuration	All environment variables and defaults
API Reference	Full JSON-RPC 2.0 WebSocket protocol
LLM Providers	Provider setup and model routing
Local Inference Status	Current state, strengths, limits, operational nuances
Agent Scheduling	Permanent agents, schedule kinds, Mission Control semantics
MCP Servers	Extending with external tool servers
Memory	HNSW vector retrieval, storage layout, encryption
Tool Search & Palette	Dynamic tool discovery
HITL Approval	Human-in-the-loop approval flow
Thinking Engine	RLoT + native model reasoning
TurboQuant Compression	KV-cache vector compression
Clean Architecture	Layer boundaries and DI design
ReAct & Lane Queue	FSM orchestration
Tool Execution	Policy-guarded execution pipeline
Operations	Build, service, runtime files, deployment
Streaming	Real-time step rendering
UI Design	Mission Control card system
Completion Status	Implemented areas, active gaps, current status snapshot

📄 License

Octopus is licensed under the PolyForm Noncommercial License 1.0.0. This repository is source-available for noncommercial use only and is not offered under the MIT license.

See LICENSE for the complete license text.

Required Notice: Copyright Wanderspool (https://github.com/Wanderspool)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github		.github
api		api
cmd/octopus		cmd/octopus
config		config
docs		docs
internal		internal
modules		modules
pkg/static		pkg/static
platform		platform
scripts		scripts
test-results		test-results
tmp		tmp
web		web
.gitignore		.gitignore
.golangci.yml		.golangci.yml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
patch_page.js		patch_page.js
patch_page2.js		patch_page2.js
patch_palette_persistence.js		patch_palette_persistence.js
patch_session_delete.js		patch_session_delete.js
patch_tree_delete.js		patch_tree_delete.js
patch_websocket.js		patch_websocket.js
patch_websocket2.js		patch_websocket2.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐙 Octopus

✨ Features

🆚 Comparison

vs. Specialised Agent Platforms

🚀 Installation

One-liner (Linux, recommended)

Step-by-step setup

1. Prerequisites

2. Clone the repository

3. Build

Build prerequisites for the default profile

Current local inference status

4. Configure

5. Run

Provider quick-start

Run as a systemd service

⚙️ Configuration

🏗 Architecture

🧠 Thinking Engine

🔍 Tool Search & Dynamic Palette

🧠 Memory (Vector Search)

⚡ TurboQuant Compression

📡 API

📚 Documentation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐙 Octopus

✨ Features

🆚 Comparison

vs. Specialised Agent Platforms

🚀 Installation

One-liner (Linux, recommended)

Step-by-step setup

1. Prerequisites

2. Clone the repository

3. Build

Build prerequisites for the default profile

Current local inference status

4. Configure

5. Run

Provider quick-start

Run as a systemd service

⚙️ Configuration

🏗 Architecture

🧠 Thinking Engine

🔍 Tool Search & Dynamic Palette

🧠 Memory (Vector Search)

⚡ TurboQuant Compression

📡 API

📚 Documentation

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages