Skip to content

Wanderspool/Octopus

Repository files navigation

🐙 Octopus

Single-binary autonomous AI agent platform

Go backend · Embedded Next.js UI · Single-binary deploy, optional local inference stack


PolyForm Noncommercial License Go 1.23+ Next.js 15 Local inference stack

Octopus runs a full ReAct (Reason + Act) cognitive loop, executes tools inside a sandboxed environment, streams every step to a real-time UI, and blocks unsafe actions behind a human-approval gate. The default deployment remains a single binary with an embedded frontend; the optional local inference stack adds ONNX Runtime and managed llama.cpp components when you build and enable them.

The Next.js Mission Control frontend is compiled directly into the Go binary via go:embed. Drop the file anywhere, run it, open the browser.

Licensing Notice Octopus is source-available under PolyForm Noncommercial License 1.0.0 and is not MIT-licensed. Commercial use is not permitted under the repository license. See LICENSE for the full terms.


✨ Features

  • One binary, everything included — frontend is embedded, no separate server needed
  • ReAct FSM — Reason + Act finite state machine with configurable iteration cap
  • Local Thinking (RLoT) — platform-level multi-step reasoning engine, works with any LLM
  • Native Thinking — pass-through to provider native CoT (Claude Extended Thinking, o1/o3, Qwen-thinking)
  • Tool Search & Dynamic Palette — LLM discovers tools on demand, never wastes context on unused tools
  • MCP Integration — local stdio, remote http / streamable_http / sse MCP servers
  • HNSW Vector Memory — encrypted AES-256-GCM vector store with ONNX or API embeddings and optional reranking
  • TurboQuant Compression — vector-store compression, embedding wrapper support, and managed llama-server KV-cache settings
  • HITL Approval Gate — sensitive actions require explicit human approval over WebSocket
  • Policy Engine — deny rules, auto-approve rules, dangerous-command checks before execution
  • Transparent streaming — every thought, tool call, and result streams as a separate card
  • Provider-agnostic — OpenAI, Anthropic, Groq, Mistral, OpenRouter, Gemini, Cohere, any OpenAI-compatible endpoint
  • Service-friendly — runs under systemd, PID 1 lifecycle, journald logging

🆚 Comparison

vs. Specialised Agent Platforms

✅ Implemented · 🔶 Partial / in roadmap · ❌ Not available · ⚠️ Requires separate service

Feature Octopus OpenClo Moltis PicoClo ZeroClo NanoClo
Deployment
Single binary core deploy
Embedded Web UI
Docker-first workflow 🔶 🔶 🔶
Cloud-managed option
Reasoning
ReAct loop 🔶 🔶
Native Thinking (o1, Claude, Qwen) 🔶 🔶
Local Thinking (RLoT, any model)
Multi-agent orchestration 🔶
Tools
Tool execution sandboxing 🔶
HITL approval gate 🔶
Built-in Tool Search & Palette
MCP server support 🔶 🔶
Policy / deny rules engine 🔶
Memory
Encrypted HNSW vector store ⚠️ ⚠️
Local ONNX embeddings ⚠️
Persistent conversation history 🔶
Local Inference
TurboQuant KV-cache compression
Ollama / local vLLM integration 🔶 🔶
LLM Routing
OpenAI / Anthropic / Groq / Mistral
Gemini / Cohere / Together / xAI 🔶 🔶 🔶 🔶
Custom OpenAI-compatible endpoint 🔶

Where Octopus leads: single-binary deployment, Local Thinking (platform-level RLoT on any LLM), built-in tool search, HITL approval gate, TurboQuant compression, and a local inference control plane that stays inside the same product surface.

Where competing platforms lead: OpenClo and Moltis offer stronger multi-agent orchestration and cloud-managed hosting. PicoClo, ZeroClo, and NanoClo have simpler operational footprints but more limited feature sets. None currently ship Local Thinking, built-in Tool Search, or TurboQuant.

Honest trade-offs:

  • Octopus has no native multi-agent workflow support (single agent per session)
  • The Web UI is functional but not as polished as Moltis or OpenClo's dashboards
  • Cloud deployment requires self-hosting (no managed SaaS option)
  • ONNX chat execution is still not a production path; local chat currently leans on managed llama.cpp server integration instead

Local Thinking (RLoT) is a platform-level multi-step reasoning engine (Decompose → Reason → Refine → Verify → Terminate) that adds structured chain-of-thought to any LLM — no provider-side reasoning support required.

Built-in Tool Search gives the model a single tool_search meta-tool on its first turn. It discovers and activates only the tools it needs, keeping the context window clean regardless of catalog size.

TurboQuant is now wired in three places: compressed vector memory, ONNX embedding wrappers, and managed llama.cpp server cache configuration. The implementation is broader than before, but the local chat story is still split across ONNX and llama.cpp paths.


🚀 Installation

One-liner (Linux, recommended)

curl -fsSL https://raw.githubusercontent.com/Wanderspool/Octopus/main/scripts/build.sh | bash

This clones the repository to the current directory, builds the Next.js frontend, embeds it, and compiles the Go binary to bin/octopus.


Step-by-step setup

1. Prerequisites

Tool Version Purpose
Go 1.23+ Backend compilation
Node.js 20+ Frontend build only
Linux kernel 5.10+ Sandbox tool isolation
git any Clone repository

2. Clone the repository

git clone https://github.com/Wanderspool/Octopus.git
cd Octopus

3. Build

# Canonical full build: frontend → embed → ONNX-capable Go binary
make

# Or step by step:
make frontend   # npm ci + npm run build (Next.js static export)
make embed      # copies web/out → pkg/static/out
make build      # CGO_ENABLED=1 go build -tags "with_onnx embed_onnx embed_llamacpp" -o bin/octopus ./cmd/octopus

# Deployment rebuild helper used on the target host:
./web/build.sh

The binary is at bin/octopus. It contains the full UI — no separate server needed.

The default repository build is no longer a minimal CGO_ENABLED=0 profile. It intentionally targets the current product surface:

  • embedded frontend assets
  • ONNX-enabled embeddings and reranking
  • managed llama.cpp bundle support

If you need a narrower binary for debugging build variants, the Makefile still exposes build-embed-onnx and build-embed-all.

Build prerequisites for the default profile

# The default build requires CGO and the repository-pinned Go toolchain path.
export PATH=/usr/local/go/bin:$PATH

# If the ONNX runtime shared library is not discoverable automatically, point Octopus at it explicitly.
export ORT_LIB=/absolute/path/to/libonnxruntime.so
./bin/octopus

Current scope of the local inference stack:

  • Embedding ONNX models: supported when built with with_onnx
  • Reranker ONNX models: supported when built with with_onnx
  • Chat ONNX models: metadata/download/config surfaces exist, but runtime chat execution is still incomplete
  • Managed llama.cpp bundle lifecycle: supported from the Settings UI
  • Managed llama-server lifecycle: start/stop/status supported from the Settings UI
  • llama-server auto-start on Octopus boot: supported via persisted inference settings
  • GGUF model discovery for managed llama-server: supported via filesystem scan + UI dropdown
  • OpenAI-compatible local provider auto-config: supported when managed llama-server starts successfully
  • TurboQuant live stats push: supported via periodic WebSocket notifications to the Settings UI

Managed runtime bundles:

  • Octopus now includes a managed ONNX Runtime bundle layer for tracking installed runtime versions, activating one runtime at a time, and downloading newer official community releases from the Settings UI.
  • The runtime manager can auto-wire ORT_LIB from the currently active managed bundle.
  • The embedded-bundle architecture is implemented, but the repository still does not ship a fully self-sufficient local inference payload for every path.
  • In practice, the product is closest to "single-binary control plane + optional managed local runtimes", not a fully hermetic local inference appliance.

Current local inference status

What works well:

  • one Settings surface for ONNX runtime bundles, ONNX models, TurboQuant, and managed llama.cpp
  • persistent inference configuration in config/settings.json
  • managed llama-server bootstrapping with auto-created OpenAI-compatible endpoint config
  • WebSocket-driven live refresh of TurboQuant stats in the UI

What is still uneven:

  • the ONNX path is stronger for embeddings/reranking than for chat
  • the llama.cpp path is better for local chat serving than for deep in-process integration
  • a successful ONNX build still depends on CGO and a compatible runtime library
  • local inference remains Linux-first operationally

Detailed status and trade-offs: docs/src/local-inference-status.md

4. Configure

Copy the example config and fill in your API key:

cp config/settings.example.json config/settings.json

Or skip the file entirely and use environment variables:

export OCTOPUS_LLM_API_KEY=sk-...           # your provider API key
export OCTOPUS_LLM_MODEL=gpt-4o             # or claude-sonnet-4-20250514, etc.

5. Run

./bin/octopus

Open http://localhost:8080 — Mission Control is ready.


Provider quick-start

# OpenAI
OCTOPUS_LLM_API_KEY=sk-... ./bin/octopus

# Anthropic
OCTOPUS_LLM_PROVIDER=anthropic \
OCTOPUS_LLM_API_KEY=sk-ant-... \
OCTOPUS_LLM_MODEL=claude-sonnet-4-20250514 \
./bin/octopus

# Groq (OpenAI-compatible)
OCTOPUS_LLM_PROVIDER=openai \
OCTOPUS_LLM_ENDPOINT=https://api.groq.com/openai/v1 \
OCTOPUS_LLM_API_KEY=gsk_... \
OCTOPUS_LLM_MODEL=llama-3.3-70b-versatile \
./bin/octopus

# Local vLLM
OCTOPUS_LLM_PROVIDER=openai \
OCTOPUS_LLM_ENDPOINT=http://localhost:8000/v1 \
OCTOPUS_LLM_API_KEY=none \
OCTOPUS_LLM_MODEL=Qwen/Qwen2.5-7B-Instruct \
./bin/octopus

Run as a systemd service

# Copy the binary
sudo cp bin/octopus /opt/Octopus/bin/octopus

# Install the service unit
sudo cp scripts/systemd/octopus.service /etc/systemd/system/octopus.service
sudo systemctl daemon-reload
sudo systemctl enable --now octopus

# Check status
sudo systemctl status octopus
sudo journalctl -u octopus -f

Operational note: the tracked unit currently uses TimeoutStopSec=20. Under heavy shutdown paths, especially after local inference activity, a restart can hit the timeout and be escalated to SIGKILL. This is a known operational gap, not a documentation omission.


⚙️ Configuration

All settings have production-ready defaults. Override with environment variables:

Variable Default Description
OCTOPUS_ADDR :8080 HTTP / WebSocket listen address
OCTOPUS_LLM_PROVIDER openai Provider: openai, anthropic
OCTOPUS_LLM_API_KEY (none) Provider API key
OCTOPUS_LLM_MODEL gpt-4o Model identifier
OCTOPUS_LLM_ENDPOINT (provider default) Custom API base URL
OCTOPUS_LOG_LEVEL info debug · info · warn · error
OCTOPUS_LOG_FORMAT json json · text

Runtime settings (thinking, generation controls, memory, MCP servers, tool toggles) are managed through the Settings UI and persisted to config/settings.json.

Full reference → docs/src/configuration.md


🏗 Architecture


                        main.go  (DI wiring)                  │

    Transport     │      Use Cases         │     Adapters      │
  ┌────────────┐  │  ┌─────────────────┐  │  ┌─────────────┐  │
  │  WS Hub    │──┼─▶│  Orchestrator   │──┼─▶│  Providers  │  │
  │  Dispatch  │  │  │  ReAct Loop     │  │  │  Tools      │  │
  └────────────┘  │  │  Thinking Eng.  │  │  │  MCP Tools  │  │
                  │  │  Conversation   │  │  │  Storage    │  │
                  │  │  Policy         │  │  │  Approval   │  │
                  │  └─────────────────┘  │  │  Embeddings │  │
                  │                       │  │  VectorStore│  │
                  │                       │  │  TurboQuant │  │
                  │                       │  └─────────────┘  │

                      Domain  (zero deps)                     │
   agent · tool · session · memory · message · llm · thinking │
   embedder · vector (VectorStore, Embedder, Reranker)        │

Full design docs → docs/src/clean-architecture.md


🧠 Thinking Engine

Octopus has two independent reasoning modes that can run simultaneously:

Local Thinking (RLoT) — runs on the platform, works with any LLM:

User message → Decompose → Reason → Verify → Terminate → enriched prompt → ReAct loop
                                         ↑         │
                                         └─Refine──┘ (PRM rejection)

Native Thinking — delegates to the model's built-in CoT (Claude Extended Thinking, o1, Qwen).

Both can be enabled at the same time: RLoT pre-processes the input, then the inner ReAct loop runs with native thinking active on top.

Full details: docs/src/thinking-engine.md


🔍 Tool Search & Dynamic Palette

The model receives exactly one meta-tool on its first turn: tool_search. To use any other tool it must first discover it by search and explicitly activate it into the session palette. This keeps prompt token usage proportional to what the model actually needs, regardless of how many tools are installed.

Turn 1:  LLM sees:    [tool_search]
         LLM calls:   tool_search("search", query="file editing")
         Returns:     file_read, file_write, file_patch

Turn 2:  LLM calls:   tool_search("activate", tool_id="file_read")
         Palette:     [tool_search, file_read]

Turn 3:  LLM calls:   file_read(path="/src/main.go")

Full details: docs/src/tool-search.md


🧠 Memory (Vector Search)

Encrypted HNSW vector memory with zero external dependencies:

  • Pure-Go HNSW index (M=16, cosine similarity via L2-normalised inner product)
  • AES-256-GCM at-rest encryption (auto-generated key, atomic writes)
  • API embeddings: OpenAI (text-embedding-3-small/large), Cohere (embed-v3)
  • Local ONNX embeddings (build with -tags with_onnx)
  • Optional reranking: Cohere Rerank v3 or local ONNX cross-encoder
  • Graceful fallback to keyword store on any I/O error

Full details: docs/src/memory.md


⚡ TurboQuant Compression

TurboQuant in the current product:

  • WHT rotation → decorrelate embedding dimensions
  • Max-Lloyd codebooks → optimal scalar quantisation per channel
  • QJL residual → quantised Johnson-Lindenstrauss residual coding
  • 1-4 bit modes for vector compression depending on the path
  • vector-store compression is implemented directly inside Octopus
  • ONNX embedding compression is wired through TurboQuantEmbedder
  • managed llama-server consumes persisted KV cache settings from the same inference panel

Full details: docs/src/turboquant.md


📡 API

WebSocket endpoint: ws://localhost:8080/ws — JSON-RPC 2.0

Method Direction Description
session/initialize C→S Create a new agent session
session/message C→S Send a message, trigger ReAct cycle
agent/interrupt C→S Abort the current reasoning loop
agent/stream S→C Streaming step deltas
tool/requestApproval S→C HITL approval request
tool/approvalResponse C→S Human approval decision
tool/palette/search C→S Search available tools
tool/palette/activate C→S Activate a tool into the session palette
sandbox/telemetry S→C Live stdout/stderr from tool execution
models/list C→S List all models with capability flags
settings/agentloop/memory/get C→S Get Dialog Memory configuration
settings/agentloop/memory/set C→S Update Dialog Memory configuration

Full reference → docs/src/api.md


📚 Documentation

Document Description
Configuration All environment variables and defaults
API Reference Full JSON-RPC 2.0 WebSocket protocol
LLM Providers Provider setup and model routing
Local Inference Status Current state, strengths, limits, operational nuances
Agent Scheduling Permanent agents, schedule kinds, Mission Control semantics
MCP Servers Extending with external tool servers
Memory HNSW vector retrieval, storage layout, encryption
Tool Search & Palette Dynamic tool discovery
HITL Approval Human-in-the-loop approval flow
Thinking Engine RLoT + native model reasoning
TurboQuant Compression KV-cache vector compression
Clean Architecture Layer boundaries and DI design
ReAct & Lane Queue FSM orchestration
Tool Execution Policy-guarded execution pipeline
Operations Build, service, runtime files, deployment
Streaming Real-time step rendering
UI Design Mission Control card system
Completion Status Implemented areas, active gaps, current status snapshot

📄 License

Octopus is licensed under the PolyForm Noncommercial License 1.0.0. This repository is source-available for noncommercial use only and is not offered under the MIT license.

See LICENSE for the complete license text.

Required Notice: Copyright Wanderspool (https://github.com/Wanderspool)

About

Platform for multi-instance AI inference/agents, pre-alfa-ver., there is tool search, local thinking (throw RLoT and PRM), TurboQuant tech for inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors