Skip to content

Latest commit

 

History

History
669 lines (461 loc) · 20.1 KB

File metadata and controls

669 lines (461 loc) · 20.1 KB

Quickstart: Deploy Fortemi

Deploy a fully functional Fortemi instance using published container images from GHCR. This guide covers three progressive tiers — each self-contained, each building on the previous:

  1. Core — Full-text search, tagging, graph linking (no AI, no GPU)
  2. +AI — Add Ollama for semantic search, auto-linking, and NLP extraction
  3. +Full Stack — Add extraction sidecars (GLiNER NER, Whisper transcription)

Time estimate: 2 minutes for Core (one command), 15-20 minutes through Full Stack with AI.

This guide is designed for both humans and AI agents. Agent-parseable markers (<!-- agent:step -->) annotate each step with check commands, expected output, and failure actions.


Prerequisites

Required

Docker Engine 24.0+ with Compose v2

docker --version
# Expected: Docker version 24.x.x or higher

docker compose version
# Expected: Docker Compose version v2.x.x or higher

On failure: Install Docker Engine from https://docs.docker.com/engine/install/

curl

curl --version
# Expected: curl 7.x or 8.x

On failure: Install via your package manager (apt install curl, brew install curl, etc.)

Ports 3000 and 3001 available

# Linux/macOS
ss -tlnp | grep -E ':300[01]\b' || echo "Ports available"

# Alternative
curl -sf http://localhost:3000 > /dev/null 2>&1 && echo "FAIL: Port 3000 in use" || echo "OK: Port 3000 free"
curl -sf http://localhost:3001 > /dev/null 2>&1 && echo "FAIL: Port 3001 in use" || echo "OK: Port 3001 free"

On failure: Stop the service occupying the port, or change the port mapping in docker-compose.bundle.yml.

10 GB free disk space (minimum; 20 GB+ recommended with AI models)

df -h / | awk 'NR==2 {print $4}'
# Expected: 10G or more

On failure: Free disk space before proceeding.

4 GB RAM (minimum; 8 GB+ recommended)

# Linux
free -g | awk '/^Mem:/ {print $2 "GB total"}'

# macOS
sysctl -n hw.memsize | awk '{print int($1/1024/1024/1024) "GB total"}'

On failure: Fortemi Core runs in 4 GB. AI features need 8 GB+. Upgrade RAM or use a larger machine.

Optional (for AI features)

NVIDIA GPU detection

nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null \
  && echo "GPU: detected" \
  || echo "GPU: not detected (CPU-only mode)"

NVIDIA Container Toolkit (only if GPU detected)

docker info 2>/dev/null | grep -i nvidia \
  && echo "NVIDIA Container Toolkit: installed" \
  || echo "NVIDIA Container Toolkit: not installed"

On failure (with GPU): Install from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Ollama (for AI features in Tier 2+)

ollama --version 2>/dev/null \
  && echo "Ollama: installed" \
  || echo "Ollama: not installed"

Step 1: Download and Start

Download the compose file and start Fortemi:

mkdir -p fortemi && cd fortemi

# Download compose file
curl -fsSL -o docker-compose.bundle.yml \
  https://raw.githubusercontent.com/fortemi/fortemi/main/docker-compose.bundle.yml

Alternative — clone the repository:

git clone https://github.com/fortemi/fortemi.git
cd fortemi

Verify the compose file is valid:

docker compose -f docker-compose.bundle.yml config --quiet \
  && echo "OK: compose file valid" \
  || echo "FAIL: compose file invalid"

On failure: Re-download the file. If using a proxy, ensure it's not modifying the download.

No .env file required for basic usage. The compose file defaults to pulling published images from GHCR (ghcr.io/fortemi/fortemi). All sidecars (GLiNER, Whisper, pyannote) are optional and won't block startup.

Optional: Configure Environment

Create a .env file only if you need to customize settings:

# Download the template (optional)
curl -fsSL -o .env.example \
  https://raw.githubusercontent.com/fortemi/fortemi/main/.env.example
cp .env.example .env

Common customizations:

Setting When to configure
ISSUER_URL=https://your-domain.com Deploying behind a reverse proxy or domain
OLLAMA_VISION_MODEL= CPU-only host (disables vision model)
FORTEMI_REGISTRY=git.integrolabs.net Internal development (Gitea registry)

GPU or CPU-only?

GPU detected — no changes needed. The default compose file enables GPU acceleration for Whisper and pyannote sidecars via deploy.resources.reservations.devices.

No GPU detected — that's fine. Sidecars requiring GPU (Whisper, pyannote) are marked required: false and won't block startup. Fortemi Core and GLiNER work on CPU. For CPU transcription, use:

docker compose -f docker-compose.bundle.yml --profile whisper-cpu up -d

Step 2: Start Core Services

Start the core stack (Fortemi + Redis). Sidecars (Whisper, GLiNER, pyannote) start automatically but are non-blocking — Fortemi works without them.

docker compose -f docker-compose.bundle.yml up -d matric redis

This pulls ~1 GB of images on first run and starts:

  • PostgreSQL 18 with pgvector + PostGIS (bundled in the matric container)
  • The API server on port 3000
  • The MCP server on port 3001
  • Redis for search query caching

Wait for healthy

Poll until the health check passes (allows up to 90 seconds for first-time initialization):

# Poll health endpoint
for i in $(seq 1 18); do
  status=$(curl -sf http://localhost:3000/health | grep -o '"status":"[^"]*"' | head -1)
  if echo "$status" | grep -q "healthy"; then
    echo "OK: Fortemi is healthy"
    break
  fi
  echo "Waiting... ($i/18)"
  sleep 5
done

Or just wait and check once:

sleep 30 && curl -s http://localhost:3000/health

Expected output includes:

{
  "status": "healthy",
  "database": "connected"
}

On failure:

  • Check logs: docker compose -f docker-compose.bundle.yml logs matric
  • Check container status: docker compose -f docker-compose.bundle.yml ps
  • Verify port isn't in use: ss -tlnp | grep :3000

Verify endpoints

# API docs (Swagger UI)
curl -sf http://localhost:3000/docs > /dev/null \
  && echo "OK: API docs available at http://localhost:3000/docs" \
  || echo "FAIL: API docs not reachable"

# MCP endpoint
curl -sf http://localhost:3001/ > /dev/null 2>&1; echo "MCP server on port 3001"

At this point, Tier 1 (Core) is complete. You have full-text search, tagging, collections, version history, graph linking, and the MCP server. No AI/GPU required.


Step 3: Add AI Features (Optional)

This section adds Ollama for semantic search, embeddings, auto-linking, and NLP extraction.

Install Ollama

Skip if Ollama is already installed (check: ollama --version).

curl -fsSL https://ollama.ai/install.sh | sh

Verify:

ollama --version
# Expected: ollama version 0.x.x

# Ensure the service is running
ollama list > /dev/null 2>&1 \
  && echo "OK: Ollama running" \
  || echo "Starting Ollama..." && ollama serve &

Detect Hardware and Select Models

Detect available VRAM and RAM to select appropriate models:

# Detect VRAM (GB)
VRAM=$(nvidia-smi --query-gpu=memory.total --format=csv,noheader,nounits 2>/dev/null | head -1)
if [ -n "$VRAM" ]; then
  VRAM_GB=$((VRAM / 1024))
  echo "GPU VRAM: ${VRAM_GB}GB"
else
  VRAM_GB=0
  echo "GPU VRAM: none"
fi

# Detect RAM (GB)
if [ -f /proc/meminfo ]; then
  RAM_GB=$(awk '/MemTotal/ {print int($2/1024/1024)}' /proc/meminfo)
else
  RAM_GB=$(sysctl -n hw.memsize 2>/dev/null | awk '{print int($1/1024/1024/1024)}')
fi
echo "System RAM: ${RAM_GB}GB"

Use this table to select models based on your hardware:

VRAM RAM Generation Fast Vision Embedding
24 GB+ any qwen3.5:27b qwen3.5:9b qwen3.5:9b nomic-embed-text
12-23 GB any qwen2.5:7b qwen3.5:9b (disable) nomic-embed-text
6-11 GB any llama3.2:3b (disable) (disable) nomic-embed-text
none 32 GB+ qwen2.5:7b qwen3.5:9b (disable) nomic-embed-text
none 16 GB+ qwen2.5:7b llama3.2:3b (disable) nomic-embed-text
none 8-15 GB llama3.2:3b (disable) (disable) nomic-embed-text

(disable) means set the corresponding env var to empty string. See Hardware Planning for full quality benchmarks and cost analysis.

Pull Models

Pull the models you selected. At minimum, pull the embedding model:

# Always needed for semantic search
ollama pull nomic-embed-text

# Pull your selected generation model (example for 12-23GB VRAM)
ollama pull qwen2.5:7b

# Pull fast model if your hardware supports it (also serves as vision model — natively multimodal)
ollama pull qwen3.5:9b

# Pull vision model: qwen3.5:9b is natively multimodal; pull separately only if using a dedicated vision model
# ollama pull qwen3.5:9b  # already pulled above for 24GB+ VRAM tier

Verify models are available:

ollama list
# Expected: nomic-embed-text and your selected models listed

Update .env with Model Selections

Create a .env file (if you don't already have one) with your model selections. Example for a 12-23 GB VRAM system:

cat >> .env << 'EOF'

# ── Ollama Model Configuration ──────────────────────────────────────────
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=qwen2.5:7b
MATRIC_FAST_GEN_MODEL=qwen3.5:9b
OLLAMA_VISION_MODEL=
EOF

Adjust the model names to match your hardware tier from the table above. For models marked (disable), set the variable to empty (e.g., MATRIC_FAST_GEN_MODEL=).

Restart and Verify

Restart the matric service to pick up new model configuration:

docker compose -f docker-compose.bundle.yml up -d matric

Wait for healthy (same poll as Step 2), then verify Ollama connectivity:

# Check Fortemi can reach Ollama
curl -s http://localhost:3000/health | grep -o '"ollama[^}]*}'

Test embedding generation:

# Create a test note
NOTE_ID=$(curl -sf -X POST http://localhost:3000/api/v1/notes \
  -H "Content-Type: application/json" \
  -d '{"content":"Test note for embedding verification."}' \
  | grep -o '"id":"[^"]*"' | cut -d'"' -f4)

echo "Created note: $NOTE_ID"

# Wait for background embedding job (5-10 seconds)
sleep 10

# Verify embeddings exist
curl -sf "http://localhost:3000/api/v1/notes/$NOTE_ID" \
  | grep -o '"has_embedding":[^,]*'
# Expected: "has_embedding":true

On failure: Check that Ollama is reachable from Docker. The compose file uses host.docker.internal — on Linux, this requires the extra_hosts mapping (already configured in the compose file). Verify with:

docker exec $(docker compose -f docker-compose.bundle.yml ps -q matric) \
  curl -sf http://host.docker.internal:11434/api/tags > /dev/null \
  && echo "OK: Ollama reachable from container" \
  || echo "FAIL: Cannot reach Ollama from container"

Tier 2 (+AI) is complete. You now have semantic search, auto-linking, and NLP extraction.


Step 4: Enable Extraction Sidecars (Optional)

Sidecars provide specialized NLP capabilities that run as separate containers alongside Fortemi.

GLiNER (Named Entity Recognition)

GLiNER is a zero-shot NER model that extracts entities from text. It's CPU-only and adds rich concept tagging to the extraction pipeline.

# Start GLiNER alongside existing services
docker compose -f docker-compose.bundle.yml up -d gliner

Wait for GLiNER to be healthy (first start downloads the model, ~1-2 minutes):

for i in $(seq 1 12); do
  curl -sf http://localhost:8090/health > /dev/null 2>&1 \
    && echo "OK: GLiNER healthy" && break
  echo "Waiting for GLiNER... ($i/12)"
  sleep 10
done

Whisper (Audio Transcription)

Whisper transcribes audio and video attachments. Choose GPU or CPU mode:

GPU mode (default, fast, requires NVIDIA Container Toolkit):

docker compose -f docker-compose.bundle.yml up -d whisper

CPU mode (slower, works everywhere):

docker compose -f docker-compose.bundle.yml --profile whisper-cpu up -d

Wait for Whisper (first start downloads the model, ~2-5 minutes):

for i in $(seq 1 30); do
  curl -sf http://localhost:8000/health > /dev/null 2>&1 \
    && echo "OK: Whisper healthy" && break
  echo "Waiting for Whisper... ($i/30)"
  sleep 10
done

pyannote (Speaker Diarization)

pyannote identifies and labels individual speakers in audio. Requires a HuggingFace token for the gated pyannote model.

# Add your HuggingFace token (required for model download)
echo 'HF_TOKEN=hf_your_token_here' >> .env

# GPU mode (default, requires NVIDIA Container Toolkit):
docker compose -f docker-compose.bundle.yml up -d pyannote

# CPU mode (slower, works everywhere):
docker compose -f docker-compose.bundle.yml --profile pyannote-cpu up -d

Wait for pyannote (first start downloads the model, ~2-5 minutes):

for i in $(seq 1 30); do
  curl -sf http://localhost:8001/health > /dev/null 2>&1 \
    && echo "OK: pyannote healthy" && break
  echo "Waiting for pyannote... ($i/30)"
  sleep 10
done

Verify Capabilities

Check which extraction strategies are active:

curl -s http://localhost:3000/health | python3 -m json.tool 2>/dev/null || curl -s http://localhost:3000/health

The capabilities.extraction_strategies array in the health response shows all registered adapters. Expected entries depending on what you enabled:

Sidecar Extraction Strategy
GLiNER gliner_ner
Whisper audio_transcription
Ollama Vision image_vision, video_multimodal

On failure: Restart the matric service to re-detect sidecars: docker compose -f docker-compose.bundle.yml restart matric

Tier 3 (+Full Stack) is complete.


Step 5: Connect an AI Agent (MCP)

Fortemi's MCP server enables AI agents (Claude Code, etc.) to read, search, and manage your knowledge base.

Claude Code

Add to your project's .mcp.json (or ~/.claude/mcp.json for global access):

{
  "mcpServers": {
    "fortemi": {
      "url": "http://localhost:3001/mcp"
    }
  }
}

For remote deployments behind a domain:

{
  "mcpServers": {
    "fortemi": {
      "url": "https://memory.example.com/mcp"
    }
  }
}

Verify MCP Tools

After restarting Claude Code, verify tools are available:

# Quick test: list notes via MCP-backed API
curl -sf http://localhost:3000/api/v1/notes | head -c 200

In Claude Code, the fortemi MCP tools (e.g., capture_knowledge, search, manage_tags) should appear in the tool list. See MCP Server for full tool documentation.


Verification Checklist

Feature Check Command Expected Result
API health curl -sf http://localhost:3000/health "status":"healthy"
API docs curl -sf http://localhost:3000/docs -o /dev/null -w '%{http_code}' 200
MCP server curl -sf http://localhost:3001/ -o /dev/null -w '%{http_code}' 200 or connection accepted
Full-text search curl -sf 'http://localhost:3000/api/v1/search?q=test' JSON response with results array
Ollama connectivity curl -sf http://localhost:11434/api/tags JSON with model list
Embeddings working Create note, wait 10s, check has_embedding true
GLiNER healthy curl -sf http://localhost:8090/health 200
Whisper healthy curl -sf http://localhost:8000/health 200
Extraction strategies curl -sf http://localhost:3000/health | grep extraction Lists active strategies

Troubleshooting

Container fails to start on CPU-only host

Symptom: docker compose up fails with nvidia runtime error.

Cause: A sidecar (whisper, pyannote) requests GPU resources via deploy.resources.reservations.devices.

Fix: These sidecars are already required: false in the compose file. Start only the services you need:

docker compose -f docker-compose.bundle.yml up -d matric redis

Or use CPU profiles for transcription:

docker compose -f docker-compose.bundle.yml --profile whisper-cpu up -d

Port 3000 already in use

# Find what's using the port
ss -tlnp | grep :3000
# Kill it or change the port mapping in docker-compose.bundle.yml

Ollama not reachable from container

Symptom: Health shows ollama: disconnected or embeddings never generate.

Fix: Verify host.docker.internal resolves inside the container:

docker exec $(docker compose -f docker-compose.bundle.yml ps -q matric) \
  getent hosts host.docker.internal

If it doesn't resolve (some older Docker versions on Linux), add to your .env:

OLLAMA_BASE=http://172.17.0.1:11434
OLLAMA_HOST=http://172.17.0.1:11434

Slow first startup

First-time initialization runs all database migrations and creates extensions. This can take 30-60 seconds. Subsequent starts are faster (~10 seconds).

MCP tools not loading in Claude Code

  1. Verify MCP server is running: curl -sf http://localhost:3001/
  2. Check .mcp.json syntax (must be valid JSON)
  3. Restart Claude Code after editing .mcp.json
  4. See MCP Troubleshooting for detailed diagnostics

Image pull fails from GHCR

# Verify GHCR is reachable
docker pull ghcr.io/fortemi/fortemi:bundle-latest

If you get authentication errors, GHCR public images should not require login. Check your Docker daemon configuration and network connectivity.

Data persistence across restarts

All data is stored in Docker volumes (matric-pgdata, matric-files, matric-backups, matric-redis). Stopping and starting containers preserves data. Only docker compose down -v deletes volumes.


What's Next?

Goal Guide
Explore features (notes, search, tags, graph) Getting Started
Configure search and AI in depth Search Guide, Inference Backends
Plan hardware for production Hardware Planning
Set up OAuth authentication Authentication
Configure reverse proxy (nginx) Deployment and Migrations
Connect AI assistants MCP Server
Manage multiple memories Multi-Memory Guide
Troubleshoot MCP issues MCP Troubleshooting