Configuration Reference

This document is the single source of truth for all Fortémi configuration options. It covers environment variables, TOML configuration files, feature flags, and deployment-specific settings.

Overview

Configuration Hierarchy

Fortemi uses a layered configuration approach:

TOML configuration files - Structured config for inference backends
Environment variables - Runtime settings, overrides, and secrets
Built-in defaults - Sensible defaults for most use cases

Environment variables take precedence over TOML files, which take precedence over defaults.

Configuration Files

File	Location	Purpose
`.env`	Project root	Environment variables for local development
`inference.toml`	Config directory	Inference backend configuration
`docker-compose.bundle.yml`	Project root	Docker environment variables

Docker Bundle Considerations

In Docker bundle deployments:

Environment variables are set in docker-compose.bundle.yml or .env file
The API container reads environment variables on startup
Changes require container restart: docker compose -f docker-compose.bundle.yml down && docker compose -f docker-compose.bundle.yml up -d
Use host.docker.internal to access services on the Docker host (e.g., Ollama)
Use 172.17.0.1 on Linux when host.docker.internal is unavailable

Environment Variables

Database

Variable	Type	Default	Description
`DATABASE_URL`	String	`postgres://matric:matric@localhost:5432/matric`	PostgreSQL connection URL with user, password, host, port, and database name

Example:

DATABASE_URL=postgres://myuser:mypass@db.example.com:5432/matric_prod

API Server

Variable	Type	Default	Description
`HOST`	String	`0.0.0.0`	IP address to bind the API server (0.0.0.0 = all interfaces)
`PORT`	Integer	`3000`	Port number for the HTTP API server
`ALLOWED_ORIGINS`	String	`http://localhost:3000`	Comma-separated list of allowed CORS origins
`MATRIC_MAX_BODY_SIZE_BYTES`	Integer	`2147483648`	Maximum request body size in bytes (default: 2 GB, needed for database backup uploads)
`MATRIC_MAX_UPLOAD_SIZE_BYTES`	Integer	`52428800`	Maximum file upload size in bytes (default: 50 MB). Enforced at the multipart upload route and validated per-file.

Example:

HOST=127.0.0.1  # Localhost only
PORT=8080       # Custom port
ALLOWED_ORIGINS=https://memory.example.com,http://localhost:3000
MATRIC_MAX_BODY_SIZE_BYTES=2147483648
MATRIC_MAX_UPLOAD_SIZE_BYTES=104857600  # 100 MB

Authentication

Variable	Type	Default	Description
`REQUIRE_AUTH`	Boolean	`false`	Require authentication on all `/api/v1/*` endpoints. When `false`, all endpoints are publicly accessible.
`ISSUER_URL`	String	`http://<HOST>:<PORT>`	External base URL for OAuth discovery and MCP (e.g., https://memory.example.com). Required for OAuth/MCP.
`OAUTH_TOKEN_LIFETIME_SECS`	Integer	`3600`	OAuth access token lifetime in seconds (1 hour). Shorter = more secure; longer = less re-authentication friction.
`OAUTH_MCP_TOKEN_LIFETIME_SECS`	Integer	`86400`	MCP OAuth access token lifetime in seconds (24 hours). MCP sessions are interactive — shorter tokens cause mid-session disconnects.

Example (Personal Use):

REQUIRE_AUTH=false

Example (Team Deployment):

REQUIRE_AUTH=true
ISSUER_URL=https://memory.team.com
OAUTH_TOKEN_LIFETIME_SECS=3600
OAUTH_MCP_TOKEN_LIFETIME_SECS=86400

Rate Limiting

Variable	Type	Default	Description
`RATE_LIMIT_ENABLED`	Boolean	`false`	Enable rate limiting for API endpoints
`RATE_LIMIT_REQUESTS`	Integer	`100`	Maximum requests per time window
`RATE_LIMIT_PERIOD_SECS`	Integer	`60`	Rate limit time window in seconds

Example:

RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_PERIOD_SECS=60

Logging

Variable	Type	Default	Description
`RUST_LOG`	String	`info`	Log level and filtering for Rust components (uses env_logger syntax)
`LOG_FORMAT`	String	`pretty`	Log output format: `pretty`, `json`, or `compact`
`LOG_FILE`	String	None	Path to log file (logs to stdout if not set)
`LOG_ANSI`	Boolean	`true`	Enable ANSI color codes in logs

Common Configurations:

Production (default):

RUST_LOG=info
LOG_FORMAT=json
LOG_FILE=/var/log/matric/api.log
LOG_ANSI=false

API debugging:

RUST_LOG=matric_api=debug,info

Inference debugging:

RUST_LOG=matric_inference=debug,info

Search debugging:

RUST_LOG=matric_db=debug,matric_search=debug,info

Full debug (verbose):

RUST_LOG=debug

Specific module debugging:

RUST_LOG=matric_api::routes::search=trace,info

Background Worker

Variable	Type	Default	Description
`WORKER_ENABLED`	Boolean	`true`	Enable background job processing (embeddings, linking, cleanup). Alias: `JOB_WORKER_ENABLED`.
`JOB_WORKER_ENABLED`	Boolean	`true`	Enable/disable job processing in the worker process (takes precedence when set).
`WORKER_THREADS`	Integer	CPU cores	Number of Tokio worker threads for background jobs
`JOB_POLL_INTERVAL_MS`	Integer	`60000`	Safety-net polling interval in milliseconds. The worker is event-driven (woken by NOTIFY); this interval only triggers as a fallback for crash recovery and race conditions.
`JOB_MAX_CONCURRENT`	Integer	`4`	Maximum number of jobs that can run concurrently in the worker

Example:

WORKER_ENABLED=true
WORKER_THREADS=4
JOB_POLL_INTERVAL_MS=60000
JOB_MAX_CONCURRENT=4

Chat (Synchronous LLM)

Variable	Type	Default	Description
`CHAT_MAX_CONCURRENT`	Integer	`1`	Maximum concurrent chat requests hitting the GPU. Chat bypasses the job queue and calls Ollama directly via a `tokio::Semaphore`. When all permits are in use, new requests return 503. Increase if VRAM allows parallel inference.

Example:

# Allow 2 concurrent chat sessions (requires sufficient VRAM)
CHAT_MAX_CONCURRENT=2

Real-Time Events

Variable	Type	Default	Description
`MATRIC_EVENT_BUS_CAPACITY`	Integer	`256`	Broadcast channel capacity for the internal event bus. Increase for high-traffic deployments.
`SSE_REPLAY_BUFFER_SIZE`	Integer	`1024`	Number of past events retained in the SSE replay buffer for `Last-Event-ID` reconnection support.
`SSE_COALESCE_WINDOW_MS`	Integer	`500`	Deduplication window in milliseconds for low-priority SSE events (e.g., `job.progress`). Events with the same coalescing key are deduplicated within this window, keeping only the latest. Set to `0` to disable.
`MATRIC_WEBHOOK_TIMEOUT_SECS`	Integer	`10`	Timeout in seconds for outgoing webhook HTTP requests.

Example:

MATRIC_EVENT_BUS_CAPACITY=512
SSE_REPLAY_BUFFER_SIZE=2048
SSE_COALESCE_WINDOW_MS=500
MATRIC_WEBHOOK_TIMEOUT_SECS=10

File Storage

Variable	Type	Default	Description
`FILE_STORAGE_PATH`	String	`/var/lib/matric/files`	Directory for storing uploaded file attachments on disk

Example:

FILE_STORAGE_PATH=/mnt/data/matric/files

Tus Resumable Uploads

Settings for the tus v1.0.0 resumable upload protocol. See File Attachments Guide for protocol details.

Variable	Type	Default	Description
`TUS_UPLOAD_EXPIRY_HOURS`	Integer	`24`	Hours before an incomplete tus upload session expires and is cleaned up
`TUS_CHUNK_MAX_SIZE`	Integer	`52428800`	Maximum chunk size in bytes for tus PATCH requests (default: 50 MB)

Example:

TUS_UPLOAD_EXPIRY_HOURS=48       # 2 days for slow connections
TUS_CHUNK_MAX_SIZE=104857600     # 100 MB chunks for fast networks

Memory Management

Variable	Type	Default	Description
`MAX_MEMORIES`	Integer	`10`	Maximum number of live memory archives in the database
`DEFAULT_ARCHIVE_CACHE_TTL`	Integer	`60`	Cache TTL in seconds for the default archive lookup. Reduces database lookups for the default memory on high-traffic deployments.
`DISABLE_SUPPORT_MEMORY`	Boolean	`false`	Set to `true` to skip automatic loading of the built-in `fortemi-docs` support archive on first boot.

Example:

# Scale with your hardware (see capacity formula below)
MAX_MEMORIES=50   # 16GB RAM, 100GB disk
MAX_MEMORIES=200  # 32GB RAM, 500GB disk
MAX_MEMORIES=500  # 64GB+ RAM, 1TB+ disk

Capacity Planning:

Each empty memory adds ~1MB schema overhead (41 tables + indexes). The real cost is data growth within each memory. Average storage per note (with 20% attachment rate):

Component	Per Note	Per 1,000 Notes
Note metadata + content	~11 KB	11 MB
Embeddings (768-dim)	~3 KB	3 MB
Attachments (avg 500KB, 20% rate)	~100 KB	100 MB
Thumbnails (100KB, 20% rate)	~20 KB	20 MB
Total average	~134 KB	~134 MB

Capacity formula:

max_total_notes = available_storage / 134 KB
MAX_MEMORIES = max_total_notes / target_notes_per_memory

Recommended limits by hardware tier:

Tier	RAM	Storage	MAX_MEMORIES	Notes per Memory	Total Notes
Tier 1 (Minimum)	8 GB	10 GB	10	~5,000	~50,000
Tier 2 (Standard)	16 GB	100 GB	50	~20,000	~1,000,000
Tier 3 (Performance)	32 GB	500 GB	200	~50,000	~10,000,000
Tier 4 (Professional)	64 GB+	1 TB+	500	~50,000	~25,000,000

Memory Limits:

MAX_MEMORIES limits live memories (schemas in the database), not the total number you can ever create
Export memories as shards (POST /api/v1/shards/export), delete them to free slots, and re-import later — there is no limit on the number of archived shards you can store on disk
Attempting to create memories beyond MAX_MEMORIES returns HTTP 400
Check current usage via GET /api/v1/memories/overview
Each memory adds minimal overhead (<1MB metadata + indexes); data growth is the real constraint
Notes without attachments are much smaller (~14 KB each) — adjust estimates for your workload

Request Headers

Header	Values	Description
`X-Fortemi-Memory`	Memory name	Routes request to specified memory (default: "default")
`Authorization`	Bearer token	API authentication (when `REQUIRE_AUTH=true`)

The X-Fortemi-Memory header routes all API requests to a specific memory archive. Without this header, requests operate on the default memory. See the Multi-Memory Guide for details.

Ollama Inference

Ollama is the default inference backend for local LLM inference without API costs.

Variable	Type	Default	Description
`OLLAMA_BASE`	String	`http://127.0.0.1:11434`	Ollama API endpoint URL (primary variable read by the backend)
`OLLAMA_URL`	String	`http://127.0.0.1:11434`	Alias for `OLLAMA_BASE` (checked as fallback by the vision handler and content summarizer)
`OLLAMA_HOST`	String	`http://localhost:11434`	Alias used by the Ollama discovery service
`OLLAMA_EMBED_MODEL`	String	`nomic-embed-text`	Model name for generating embeddings
`OLLAMA_GEN_MODEL`	String	`qwen3.5:27b`	Model name for text generation (standard/failover tier)
`OLLAMA_EMBED_DIM`	Integer	`768`	Vector dimensionality for embeddings. Must match the model's output dimension.
`MATRIC_EMBED_TIMEOUT_SECS`	Integer	`30`	Timeout in seconds for embedding requests to Ollama
`MATRIC_GEN_TIMEOUT_SECS`	Integer	`120`	Timeout in seconds for generation requests to Ollama
`MATRIC_OLLAMA_URL`	String	`http://127.0.0.1:11434`	Ollama URL used by the TOML-based inference config path
`MATRIC_OLLAMA_EMBEDDING_MODEL`	String	`nomic-embed-text`	Embedding model used by the TOML-based inference config path
`MATRIC_OLLAMA_GENERATION_MODEL`	String	`qwen3.5:27b`	Generation model used by the TOML-based inference config path

Example (Docker Desktop - macOS/Windows):

OLLAMA_BASE=http://host.docker.internal:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=llama3.2:3b
OLLAMA_EMBED_DIM=768

Example (Linux with Docker):

OLLAMA_BASE=http://172.17.0.1:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=qwen2.5:7b
OLLAMA_EMBED_DIM=768

Example (Performance Tuning):

OLLAMA_BASE=http://localhost:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=qwen2.5:7b
MATRIC_EMBED_TIMEOUT_SECS=30
MATRIC_GEN_TIMEOUT_SECS=180

OpenAI Inference

The OpenAI backend supports OpenAI's cloud API and any OpenAI-compatible endpoint (Azure OpenAI, vLLM, LocalAI, LM Studio, etc.).

Variable	Type	Default	Description
`INFERENCE_BACKEND`	String	`ollama`	Backend selection: `ollama` or `openai`
`OPENAI_API_KEY`	String	None	API key for OpenAI cloud (required for OpenAI cloud)
`OPENAI_BASE_URL`	String	`https://api.openai.com/v1`	OpenAI API base URL or compatible endpoint
`OPENAI_EMBED_MODEL`	String	`text-embedding-3-small`	Model name for embeddings
`OPENAI_GEN_MODEL`	String	`gpt-oss:20b`	Model name for text generation
`OPENAI_EMBED_DIM`	Integer	`1536`	Vector dimensionality for embeddings
`OPENAI_TIMEOUT`	Integer	`30`	Request timeout in seconds
`OPENAI_SKIP_TLS_VERIFY`	Boolean	`false`	Disable TLS certificate verification (insecure, for testing only)
`OPENAI_HTTP_REFERER`	String	None	Optional `HTTP-Referer` header sent with requests (useful for OpenRouter and compatible proxies)
`OPENAI_X_TITLE`	String	None	Optional `X-Title` header for identification in compatible API dashboards
`MATRIC_OPENAI_URL`	String	`https://api.openai.com/v1`	OpenAI URL used by the TOML-based inference config path
`MATRIC_OPENAI_API_KEY`	String	None	API key used by the TOML-based inference config path
`MATRIC_OPENAI_EMBEDDING_MODEL`	String	`text-embedding-3-small`	Embedding model used by the TOML-based inference config path
`MATRIC_OPENAI_GENERATION_MODEL`	String	`gpt-4o-mini`	Generation model used by the TOML-based inference config path

Example (OpenAI Cloud):

INFERENCE_BACKEND=openai
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_EMBED_MODEL=text-embedding-3-small
OPENAI_GEN_MODEL=gpt-4o-mini
OPENAI_EMBED_DIM=1536
OPENAI_TIMEOUT=120

Example (Azure OpenAI):

INFERENCE_BACKEND=openai
OPENAI_API_KEY=your-azure-key
OPENAI_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment
OPENAI_EMBED_MODEL=text-embedding-ada-002
OPENAI_GEN_MODEL=gpt-4

Example (vLLM Self-Hosted):

INFERENCE_BACKEND=openai
OPENAI_API_KEY=token
OPENAI_BASE_URL=http://vllm-server:8000/v1
OPENAI_GEN_MODEL=meta-llama/Llama-3.1-8B-Instruct
OPENAI_TIMEOUT=180

Example (LocalAI):

INFERENCE_BACKEND=openai
OPENAI_API_KEY=localai
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_EMBED_MODEL=text-embedding-ada-002
OPENAI_GEN_MODEL=gpt-3.5-turbo

MCP Server

The MCP (Model Context Protocol) server provides Claude/AI integration.

Variable	Type	Default	Description
`ISSUER_URL`	String	None	External base URL for OAuth and MCP discovery (required for MCP)
`MCP_CLIENT_ID`	String	None	OAuth client ID for token introspection (required for MCP auth)
`MCP_CLIENT_SECRET`	String	None	OAuth client secret for token introspection (required for MCP auth)
`MCP_BASE_URL`	String	`${ISSUER_URL}/mcp`	MCP protected resource URL (derived from ISSUER_URL)
`MCP_TRANSPORT`	String	`http`	Transport mode: `stdio` (direct process) or `http` (network)
`MCP_PORT`	Integer	`3001`	Port for MCP HTTP server (when transport=http)
`MCP_BASE_PATH`	String	`/mcp`	URL path prefix for the MCP server (when transport=http)
`MATRIC_API_URL`	String	`http://localhost:3000`	API server URL for the MCP server to connect to. Alias: `FORTEMI_URL`.
`FORTEMI_URL`	String	`http://localhost:3000`	Alias for `MATRIC_API_URL`. Used in Docker bundle deployments.
`FORTEMI_API_KEY`	String	None	API key for the MCP server to authenticate with the Fortemi API (when `REQUIRE_AUTH=true`).

Example (Docker Bundle):

ISSUER_URL=https://memory.example.com
MCP_CLIENT_ID=mm_xxxxxxxxxxxxx
MCP_CLIENT_SECRET=xxxxxxxxxxxxx
MCP_BASE_URL=https://memory.example.com/mcp
MCP_TRANSPORT=http
MCP_PORT=3001

Example (Claude Desktop - stdio):

MCP_TRANSPORT=stdio
MATRIC_API_URL=http://localhost:3000

OAuth Client Registration:

Before configuring MCP, register an OAuth client for token introspection:

curl -X POST http://localhost:3000/oauth/register \
  -H "Content-Type: application/json" \
  -d '{
    "client_name": "MCP Server",
    "grant_types": ["client_credentials"],
    "scope": "mcp read"
  }'

Save the returned client_id and client_secret to MCP_CLIENT_ID and MCP_CLIENT_SECRET.

Search Tuning

These feature flags control advanced search capabilities. They are disabled by default because they increase database complexity and require specific PostgreSQL extensions.

Variable	Type	Default	Description
`FTS_WEBSEARCH_TO_TSQUERY`	Boolean	`true`	Enable search operators (OR, NOT, phrase search with quotes)
`FTS_SCRIPT_DETECTION`	Boolean	`false`	Auto-detect query language/script for optimal tokenization
`FTS_TRIGRAM_FALLBACK`	Boolean	`false`	Enable emoji and symbol search via pg_trgm trigram indexes
`FTS_BIGRAM_CJK`	Boolean	`false`	Enable optimized CJK (Chinese/Japanese/Korean) search via pg_bigm
`FTS_MULTILINGUAL_CONFIGS`	Boolean	`false`	Enable language-specific FTS configurations for stemming

Why These Are Disabled by Default:

FTS_SCRIPT_DETECTION: Adds complexity to query processing; only needed for mixed-language queries
FTS_TRIGRAM_FALLBACK: Requires pg_trgm extension; only needed for emoji/symbol search
FTS_BIGRAM_CJK: Requires pg_bigm extension (not installed by default); only for CJK languages
FTS_MULTILINGUAL_CONFIGS: Requires multiple FTS dictionaries; increases storage and index size

Example (Minimal - English Only):

FTS_WEBSEARCH_TO_TSQUERY=true
FTS_SCRIPT_DETECTION=false
FTS_TRIGRAM_FALLBACK=false
FTS_BIGRAM_CJK=false
FTS_MULTILINGUAL_CONFIGS=false

Example (Multilingual Team):

FTS_WEBSEARCH_TO_TSQUERY=true
FTS_SCRIPT_DETECTION=true
FTS_TRIGRAM_FALLBACK=true
FTS_BIGRAM_CJK=false
FTS_MULTILINGUAL_CONFIGS=true

Example (Full CJK Support):

FTS_WEBSEARCH_TO_TSQUERY=true
FTS_SCRIPT_DETECTION=true
FTS_TRIGRAM_FALLBACK=true
FTS_BIGRAM_CJK=true
FTS_MULTILINGUAL_CONFIGS=true

Performance Impact:

Enabling all flags increases:

Index storage by approximately 30-50%
Index build time by 2-3x
Query planning overhead by 10-20ms per query

For small installations (< 10,000 notes), enable only the features you need. For large installations (> 100,000 notes), test performance impact before enabling.

Extraction Pipeline

These variables control the multi-tier concept extraction cascade: GLiNER (tier 0, CPU-based NER) → fast model (tier 1) → standard model (tier 2).

Concept Extraction

Variable	Type	Default	Description
`GLINER_BASE_URL`	String	`http://gliner:8090` (Docker bundle)	GLiNER NER service URL for CPU-based entity extraction (tier 0). Set to empty to disable.
`GLINER_MODEL`	String	(set by GLiNER sidecar)	GLiNER model name, consumed by the GLiNER sidecar container (e.g., `urchade/gliner_large-v2.1`).
`GLINER_THRESHOLD`	Float	(set by GLiNER sidecar)	Entity confidence threshold for the GLiNER sidecar (e.g., `0.3`).
`EXTRACTION_TARGET_CONCEPTS`	Integer	`5`	Target number of concepts to extract per note. GLiNER→fast model escalation triggers when below this threshold; fast→standard model escalation triggers at < target/2 (i.e., 3 with the default of 5).
`MATRIC_FAST_GEN_MODEL`	String	`qwen3.5:9b`	Fast generation model (tier 1) used for concept tagging and reference extraction when GLiNER yields too few results. Large documents are automatically chunked. Set to empty to disable.
`MATRIC_FAST_GEN_TIMEOUT_SECS`	Integer	`60`	Timeout in seconds for fast model generation requests.
`OLLAMA_GEN_MODEL`	String	`qwen3.5:27b`	Standard generation model (tier 2) used as failover when the fast model also yields insufficient concepts.

Extraction cascade:

GLiNER (tier 0, ~300ms, CPU)
  → if concepts < EXTRACTION_TARGET_CONCEPTS
    → MATRIC_FAST_GEN_MODEL (tier 1, chunked)
      → if concepts < EXTRACTION_TARGET_CONCEPTS / 2
        → OLLAMA_GEN_MODEL (tier 2, full context)

Example (Docker bundle defaults):

GLINER_BASE_URL=http://gliner:8090
EXTRACTION_TARGET_CONCEPTS=5
MATRIC_FAST_GEN_MODEL=qwen3.5:9b
OLLAMA_GEN_MODEL=qwen3.5:27b

Example (disable GLiNER, LLM-only extraction):

GLINER_BASE_URL=
EXTRACTION_TARGET_CONCEPTS=5
MATRIC_FAST_GEN_MODEL=qwen3.5:9b
OLLAMA_GEN_MODEL=qwen3.5:27b

Example (higher concept density for rich taxonomies):

EXTRACTION_TARGET_CONCEPTS=10
MATRIC_FAST_GEN_MODEL=qwen3.5:9b
OLLAMA_GEN_MODEL=qwen3.5:27b

Embedding Enrichment

Variable	Type	Default	Description
`EMBED_CONCEPT_MAX_DOC_FREQ`	Float	`0.8`	Maximum document frequency ratio for concepts included in embedding text enrichment. Concepts appearing in more than this fraction of notes are treated as "stopwords" and excluded. Range: 0.01–1.0.
`EMBED_INSTRUCTION_PREFIX`	String	`clustering:`	Instruction prefix prepended to embedding text. `nomic-embed-text` supports `clustering:` , `search_document:` , and `classification:` . Set to empty string to disable.

Example:

EMBED_CONCEPT_MAX_DOC_FREQ=0.8
EMBED_INSTRUCTION_PREFIX=clustering:

Vision (Image Description)

Variable	Type	Default	Description
`OLLAMA_VISION_MODEL`	String	`qwen3.5:9b`	Ollama vision model for image description and 3D model rendering. Set to empty to disable image extraction. Requires Ollama with a vision-capable model pulled. qwen3.5:9b is natively multimodal (unified generation and vision).

Example:

OLLAMA_VISION_MODEL=qwen3.5:9b  # natively multimodal; also used as fast gen model
# OLLAMA_VISION_MODEL=llava:7b  # Alternative (vision-only model)
# OLLAMA_VISION_MODEL=          # Disable

Audio Transcription (Whisper)

Variable	Type	Default	Description
`WHISPER_BASE_URL`	String	`http://localhost:8000`	URL for the Whisper-compatible transcription service. Set to empty to disable audio transcription. Deploy via `docker-compose.whisper.yml`.
`WHISPER_MODEL`	String	`Systran/faster-distil-whisper-large-v3`	Whisper model name to use for transcription.

Example:

WHISPER_BASE_URL=http://host.docker.internal:8000
WHISPER_MODEL=Systran/faster-distil-whisper-large-v3

3D Model Rendering

Variable	Type	Default	Description
`RENDERER_URL`	String	`http://localhost:8080`	URL for the Open3D renderer used for GLB/3D model multi-view rendering. The Docker bundle includes the renderer at this default address. Set to a custom URL for external renderer deployments.

Example:

RENDERER_URL=http://localhost:8080

OCR and Document Processing

Variable	Type	Default	Description
`OCR_ENABLED`	Boolean	`false`	Enable OCR-based text extraction for scanned PDFs and images. Requires LibreOffice and Tesseract.
`LIBREOFFICE_PATH`	String	`/usr/bin/libreoffice`	Path to the LibreOffice binary for document conversion (DOCX, XLSX, PPTX to PDF).

Example:

OCR_ENABLED=true
LIBREOFFICE_PATH=/usr/bin/libreoffice

Graph Linking

These variables tune the knowledge graph structure. All graph variables are read at job execution time — no restart required for changes.

Variable	Type	Default	Description
`GRAPH_LINKING_STRATEGY`	String	`hnsw_heuristic`	Linking strategy: `hnsw_heuristic` (HNSW Algorithm 4, diverse neighbor selection — recommended) or `threshold` (legacy epsilon-threshold).
`GRAPH_K_NEIGHBORS`	Integer	`0` (adaptive)	Maximum neighbors per node (M in HNSW). `0` enables adaptive mode: k = log₂(N) clamped to [5, 15]. Set explicitly (e.g., `8`) to override adaptive computation.
`GRAPH_MIN_SIMILARITY`	Float	`0.5`	Absolute similarity floor — no edges are created below this cosine similarity regardless of strategy. Range: 0.0–1.0.
`GRAPH_EXTEND_CANDIDATES`	Boolean	`false`	Extend HNSW candidate set with neighbors-of-neighbors (Algorithm 4 option). Increases recall at the cost of more comparisons.
`GRAPH_KEEP_PRUNED`	Boolean	`false`	Fill remaining neighbor slots from pruned candidates when the candidate set is exhausted (Algorithm 4 option).
`GRAPH_TAG_BOOST_WEIGHT`	Float	`0.3`	Weight for SKOS tag overlap in the blended linking score. `blended = (embedding_sim * (1 - w)) + (tag_overlap * w)`. Set to `0.0` to disable tag-based boost. Range: 0.0–1.0.
`GRAPH_NORMALIZATION_GAMMA`	Float	`1.0`	Gamma exponent for edge weight normalization during graph traversal. Applied as `normalized = ((score - min) / (max - min)) ^ gamma`. Values >1.0 amplify top-end differences; <1.0 compress them. Range: 0.1–5.0.
`GRAPH_SNN_THRESHOLD`	Float	`0.10`	Shared Nearest Neighbor pruning threshold. Edges with SNN score below this are pruned during `recompute_snn_scores`. SNN(A,B) = \|kNN(A) ∩ kNN(B)\| / k. Range: 0.0–1.0.
`GRAPH_COMMUNITY_RESOLUTION`	Float	`1.0`	Louvain community detection resolution parameter. Higher = more, smaller communities; lower = fewer, larger communities. Range: 0.1–10.0.
`GRAPH_PFNET_Q`	Integer	`2`	PFNET graph sparsification q parameter. q=2 is equivalent to the Relative Neighborhood Graph (Toussaint 1980). Higher q produces sparser graphs approaching the MST. Range: 2–10.
`GRAPH_STRUCTURAL_SCORE`	Float	`0.5`	Edge score assigned to structural (same-collection) edges. Controls the "gravity well" strength pulling exploration toward notes in the same collection. Range: 0.0–1.0.

Example (defaults — suitable for most deployments):

GRAPH_LINKING_STRATEGY=hnsw_heuristic
GRAPH_K_NEIGHBORS=0
GRAPH_MIN_SIMILARITY=0.5
GRAPH_EXTEND_CANDIDATES=false
GRAPH_KEEP_PRUNED=false
GRAPH_TAG_BOOST_WEIGHT=0.3
GRAPH_NORMALIZATION_GAMMA=1.0
GRAPH_SNN_THRESHOLD=0.10
GRAPH_COMMUNITY_RESOLUTION=1.0
GRAPH_PFNET_Q=2
GRAPH_STRUCTURAL_SCORE=0.5

Example (denser graph for tightly-related content):

GRAPH_LINKING_STRATEGY=hnsw_heuristic
GRAPH_K_NEIGHBORS=12
GRAPH_MIN_SIMILARITY=0.6
GRAPH_TAG_BOOST_WEIGHT=0.4
GRAPH_NORMALIZATION_GAMMA=1.5

OpenRouter Inference

OpenRouter provides access to 100+ LLMs via a single API. It is opt-in: the OPENROUTER_API_KEY variable activates the provider.

Variable	Type	Default	Description
`OPENROUTER_API_KEY`	String	None	OpenRouter API key. Setting this variable enables the OpenRouter provider for generation tasks.
`OPENROUTER_BASE_URL`	String	`https://openrouter.ai/api/v1`	OpenRouter API base URL.
`OPENROUTER_TIMEOUT`	Integer	`300`	Request timeout in seconds for OpenRouter calls.
`OPENROUTER_HTTP_REFERER`	String	None	Optional `HTTP-Referer` header sent to OpenRouter for attribution and rate limit exemptions.
`OPENROUTER_X_TITLE`	String	None	Optional `X-Title` header sent to OpenRouter for display in the OpenRouter dashboard.

Example:

OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxx
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_TIMEOUT=300
OPENROUTER_HTTP_REFERER=https://memory.example.com
OPENROUTER_X_TITLE=Matric Memory

Build Information

These variables are set automatically by the CI/CD pipeline and are read-only at runtime. They are exposed via the /health endpoint for build tracing.

Variable	Type	Default	Description
`MATRIC_GIT_SHA`	String	`unknown`	Git commit SHA of the running build. Set by CI during image build.
`MATRIC_BUILD_DATE`	String	`unknown`	Build timestamp. Set by CI during image build.

Inference Configuration (inference.toml)

The inference.toml file provides structured configuration for inference backends. It supports both Ollama and OpenAI configurations, with the ability to use different backends for different operations.

File Location

# Development
./inference.toml

# Production (Docker)
/app/inference.toml

Full Configuration Example

# =============================================================================
# Inference Backend Configuration
# =============================================================================
# This file configures LLM inference backends for embeddings and generation.
# Supports Ollama (local) and OpenAI (cloud or compatible APIs).

[inference]
# Backend selection: "ollama" or "openai"
# Can be overridden by INFERENCE_BACKEND environment variable
backend = "ollama"

# =============================================================================
# Ollama Configuration (Local LLM)
# =============================================================================
[inference.ollama]
# Ollama API endpoint
url = "http://localhost:11434"

# Model for generating embeddings
# Recommended: nomic-embed-text (768d), mxbai-embed-large (1024d)
embedding_model = "nomic-embed-text"

# Model for text generation (optional)
# Recommended: llama3.2:3b (fast), qwen2.5:7b (quality), llama3.1:8b (balanced)
generation_model = "llama3.2:3b"

# Embedding vector dimensionality
# Must match the model's output dimension
embedding_dimension = 768

# Context window size in tokens (optional)
# Larger values allow more context but use more memory
# num_ctx = 8192

# GPU layers to offload (optional)
# 99 = all layers on GPU (recommended for dedicated GPU)
# 0 = CPU only
# num_gpu = 99

# Concurrent request processing (optional)
# Higher values improve throughput but increase memory usage
# num_parallel = 1

# =============================================================================
# OpenAI Configuration (Cloud or Compatible APIs)
# =============================================================================
[inference.openai]
# OpenAI API base URL
# OpenAI cloud: https://api.openai.com/v1
# Azure: https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT
# vLLM: http://localhost:8000/v1
# LocalAI: http://localhost:8080/v1
base_url = "https://api.openai.com/v1"

# API key (can use environment variable reference)
# For cloud: sk-proj-xxxxx
# For local servers: any value (usually ignored)
api_key = "${OPENAI_API_KEY}"

# Model for generating embeddings
# OpenAI: text-embedding-3-small, text-embedding-3-large
# Azure: text-embedding-ada-002
embedding_model = "text-embedding-3-small"

# Model for text generation
# OpenAI: gpt-4o-mini, gpt-4o, gpt-4-turbo
# Azure: gpt-4, gpt-35-turbo
generation_model = "gpt-4o-mini"

# Embedding vector dimensionality
# text-embedding-3-small: 1536
# text-embedding-3-large: 3072
embedding_dimension = 1536

# Request timeout in seconds (optional)
# timeout = 120

# Maximum retry attempts (optional)
# max_retries = 3

# Disable TLS verification (insecure, testing only)
# skip_tls_verify = false

Backend Selection

The [inference] section controls which backend is used at runtime:

[inference]
backend = "ollama"  # Use Ollama

[inference]
backend = "openai"  # Use OpenAI

This can be overridden by the INFERENCE_BACKEND environment variable:

export INFERENCE_BACKEND=openai

Routing by Operation

You can configure different backends for embeddings vs generation by using both configurations and selecting models:

Use local Ollama for embeddings, cloud OpenAI for generation:

[inference]
backend = "ollama"  # Default to Ollama

[inference.ollama]
url = "http://localhost:11434"
embedding_model = "nomic-embed-text"
embedding_dimension = 768
# No generation_model specified

[inference.openai]
base_url = "https://api.openai.com/v1"
api_key = "${OPENAI_API_KEY}"
generation_model = "gpt-4o-mini"

The system will use Ollama for embeddings (cost-free, private) and OpenAI for generation (higher quality).

Fallback Chains

To implement fallback behavior (try local first, fall back to cloud):

Configure both backends in inference.toml
Set primary backend: backend = "ollama"
When Ollama fails (connection refused, model not found), manually switch to OpenAI via API retry or configuration update

Current implementation does not support automatic fallback. For high availability, consider deploying multiple Ollama instances with load balancing.

MCP Server Configuration

The MCP server enables integration with Claude Desktop, Claude Code, and other MCP-compatible clients.

OAuth Setup

Before using the MCP server, you must configure OAuth:

Step 1: Set ISSUER_URL

The ISSUER_URL is the external base URL where your Fortémi API is accessible:

# .env
ISSUER_URL=https://memory.example.com

This URL is used for OAuth discovery, token verification, and MCP resource identification.

Step 2: Register OAuth Client

curl -X POST https://memory.example.com/oauth/register \
  -H "Content-Type: application/json" \
  -d '{
    "client_name": "MCP Server",
    "grant_types": ["client_credentials"],
    "scope": "mcp read"
  }'

Response:

{
  "client_id": "mm_xxxxxxxxxxxxx",
  "client_secret": "xxxxxxxxxxxxx",
  "client_name": "MCP Server",
  "grant_types": ["client_credentials"],
  "scope": "mcp read"
}

Step 3: Configure MCP Credentials

Add the credentials to .env:

# .env
ISSUER_URL=https://memory.example.com
MCP_CLIENT_ID=mm_xxxxxxxxxxxxx
MCP_CLIENT_SECRET=xxxxxxxxxxxxx

Step 4: Restart Services

docker compose -f docker-compose.bundle.yml down
docker compose -f docker-compose.bundle.yml up -d

Step 5: Verify

curl https://memory.example.com/mcp/.well-known/oauth-protected-resource

Expected response:

{
  "resource": "https://memory.example.com/mcp",
  "authorization_servers": ["https://memory.example.com"],
  "scopes_supported": ["mcp", "read", "write"]
}

Transport Modes

The MCP server supports two transport modes:

HTTP Transport (Default)

Used for Docker bundle deployments and network access:

# .env
MCP_TRANSPORT=http
MCP_PORT=3001
MCP_BASE_URL=https://memory.example.com/mcp

The MCP server listens on port 3001 and is accessible via HTTP. Configure nginx to proxy /mcp to localhost:3001.

stdio Transport

Used for Claude Desktop when running as a direct subprocess:

# .env
MCP_TRANSPORT=stdio
MATRIC_API_URL=http://localhost:3000

The MCP server communicates via stdin/stdout instead of HTTP. This is configured in Claude Desktop's configuration file.

Claude Desktop Integration

To connect to Fortemi via stdio transport:

Configuration File Location:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Configuration:

{
  "mcpServers": {
    "fortemi": {
      "command": "node",
      "args": [
        "/absolute/path/to/Fortémi/mcp-server/build/index.js"
      ],
      "env": {
        "MCP_TRANSPORT": "stdio",
        "MATRIC_API_URL": "http://localhost:3000"
      }
    }
  }
}

Restart Claude Desktop to load the configuration.

Claude Code Integration

Claude Code uses URL-based MCP transport:

Project .mcp.json:

{
  "mcpServers": {
    "fortemi": {
      "url": "https://memory.example.com/mcp"
    }
  }
}

Claude Code will authenticate using OAuth and connect to the MCP server via HTTPS.

Example Configurations

Personal (Minimal)

For personal use with local Ollama, no authentication:

# .env
DATABASE_URL=postgres://matric:matric@localhost/matric
OLLAMA_BASE=http://localhost:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_EMBED_DIM=768
RUST_LOG=info
REQUIRE_AUTH=false
RATE_LIMIT_ENABLED=false

This provides:

Full-text search (immediate)
Semantic search (after embedding generation)
No rate limiting or authentication
Local inference (no API costs)

Team (With Auth)

For team deployment with authentication and rate limiting:

# .env
DATABASE_URL=postgres://matric:matric@db.internal:5432/matric_prod
HOST=0.0.0.0
PORT=3000
RUST_LOG=info

# Authentication
REQUIRE_AUTH=true
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_PERIOD_SECS=60
ISSUER_URL=https://memory.team.com

# MCP (for Claude integration)
MCP_CLIENT_ID=mm_xxxxxxxxxxxxx
MCP_CLIENT_SECRET=xxxxxxxxxxxxx
MCP_BASE_URL=https://memory.team.com/mcp
MCP_TRANSPORT=http
MCP_PORT=3001

# Ollama (local inference)
OLLAMA_BASE=http://ollama.internal:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=qwen2.5:7b
OLLAMA_EMBED_DIM=768

# Background worker
WORKER_ENABLED=true
WORKER_THREADS=8
JOB_POLL_INTERVAL_MS=60000

# Logging
LOG_FORMAT=json
LOG_FILE=/var/log/matric/api.log
LOG_ANSI=false

This provides:

OAuth authentication for all API endpoints
Rate limiting (1000 requests/minute per user)
MCP integration for Claude
Optimized Ollama configuration for performance
Structured JSON logging for analysis

Enterprise (Full)

For large enterprise deployment with multilingual search, cloud AI, and monitoring:

# .env
DATABASE_URL=postgres://matric:matric@db-cluster.internal:5432/matric_prod
HOST=0.0.0.0
PORT=3000
RUST_LOG=matric_api=info,matric_db=warn,matric_inference=info

# Authentication and rate limiting
REQUIRE_AUTH=true
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=10000
RATE_LIMIT_PERIOD_SECS=60
ISSUER_URL=https://knowledge.corp.com

# MCP server
MCP_CLIENT_ID=mm_xxxxxxxxxxxxx
MCP_CLIENT_SECRET=xxxxxxxxxxxxx
MCP_BASE_URL=https://knowledge.corp.com/mcp
MCP_TRANSPORT=http
MCP_PORT=3001

# Hybrid inference: Local embeddings + Cloud generation
INFERENCE_BACKEND=ollama
OLLAMA_BASE=http://ollama-cluster.internal:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_EMBED_DIM=768

OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_GEN_MODEL=gpt-4o
OPENAI_TIMEOUT=180

# Multilingual full-text search
FTS_WEBSEARCH_TO_TSQUERY=true
FTS_SCRIPT_DETECTION=true
FTS_TRIGRAM_FALLBACK=true
FTS_BIGRAM_CJK=false
FTS_MULTILINGUAL_CONFIGS=true

# Background worker optimization
WORKER_ENABLED=true
WORKER_THREADS=16
JOB_POLL_INTERVAL_MS=60000

# Production logging
LOG_FORMAT=json
LOG_FILE=/var/log/matric/api.log
LOG_ANSI=false

# Backup configuration
BACKUP_DEST=/var/backups/Fortémi
BACKUP_SCRIPT_PATH=/app/scripts/backup.sh

This provides:

Enterprise-grade authentication and rate limiting
Hybrid inference (local embeddings for privacy, cloud generation for quality)
Full multilingual search support
Optimized worker configuration for high throughput
Structured logging for monitoring and analysis
Automated backup configuration

Docker-Specific Considerations

Accessing Services on Docker Host

When running Fortemi in Docker and accessing services on the host machine:

macOS and Windows (Docker Desktop):

# Use host.docker.internal to access host services
OLLAMA_BASE=http://host.docker.internal:11434

Linux:

# Use Docker bridge network gateway IP
OLLAMA_BASE=http://172.17.0.1:11434

# Or use host network mode in docker-compose.bundle.yml:
# network_mode: "host"

Environment Variable Files

Docker Compose loads .env automatically from the project root. Variables set in docker-compose.bundle.yml take precedence over .env.

Precedence (highest to lowest):

Environment variables set in shell
Environment variables in docker-compose.bundle.yml
Variables in .env file
Built-in defaults

Container Restart After Changes

Environment variable changes require container restart:

docker compose -f docker-compose.bundle.yml down
docker compose -f docker-compose.bundle.yml up -d

Configuration changes take effect immediately on startup.

FilesExpand file tree

configuration.md

Latest commit

History

configuration.md

File metadata and controls

Configuration Reference

Overview

Configuration Hierarchy

Configuration Files

Docker Bundle Considerations

Environment Variables

Database

API Server

Authentication

Rate Limiting

Logging

Background Worker

Chat (Synchronous LLM)

Real-Time Events

File Storage

Tus Resumable Uploads

Memory Management

Request Headers

Ollama Inference

OpenAI Inference

MCP Server

Search Tuning

Extraction Pipeline

Concept Extraction

Embedding Enrichment

Vision (Image Description)

Audio Transcription (Whisper)

3D Model Rendering

OCR and Document Processing

Graph Linking

OpenRouter Inference

Build Information

Inference Configuration (inference.toml)

File Location

Full Configuration Example

Backend Selection

Routing by Operation

Fallback Chains

MCP Server Configuration

OAuth Setup

Transport Modes

HTTP Transport (Default)

stdio Transport

Claude Desktop Integration

Claude Code Integration

Example Configurations

Personal (Minimal)

Team (With Auth)

Enterprise (Full)

Docker-Specific Considerations

Accessing Services on Docker Host

Environment Variable Files

Container Restart After Changes

Related Documentation