This document is the single source of truth for all Fortémi configuration options. It covers environment variables, TOML configuration files, feature flags, and deployment-specific settings.
Fortemi uses a layered configuration approach:
- TOML configuration files - Structured config for inference backends
- Environment variables - Runtime settings, overrides, and secrets
- Built-in defaults - Sensible defaults for most use cases
Environment variables take precedence over TOML files, which take precedence over defaults.
| File | Location | Purpose |
|---|---|---|
.env |
Project root | Environment variables for local development |
inference.toml |
Config directory | Inference backend configuration |
docker-compose.bundle.yml |
Project root | Docker environment variables |
In Docker bundle deployments:
- Environment variables are set in
docker-compose.bundle.ymlor.envfile - The API container reads environment variables on startup
- Changes require container restart:
docker compose -f docker-compose.bundle.yml down && docker compose -f docker-compose.bundle.yml up -d - Use
host.docker.internalto access services on the Docker host (e.g., Ollama) - Use
172.17.0.1on Linux whenhost.docker.internalis unavailable
| Variable | Type | Default | Description |
|---|---|---|---|
DATABASE_URL |
String | postgres://matric:matric@localhost:5432/matric |
PostgreSQL connection URL with user, password, host, port, and database name |
Example:
DATABASE_URL=postgres://myuser:mypass@db.example.com:5432/matric_prod| Variable | Type | Default | Description |
|---|---|---|---|
HOST |
String | 0.0.0.0 |
IP address to bind the API server (0.0.0.0 = all interfaces) |
PORT |
Integer | 3000 |
Port number for the HTTP API server |
ALLOWED_ORIGINS |
String | http://localhost:3000 |
Comma-separated list of allowed CORS origins |
MATRIC_MAX_BODY_SIZE_BYTES |
Integer | 2147483648 |
Maximum request body size in bytes (default: 2 GB, needed for database backup uploads) |
MATRIC_MAX_UPLOAD_SIZE_BYTES |
Integer | 52428800 |
Maximum file upload size in bytes (default: 50 MB). Enforced at the multipart upload route and validated per-file. |
Example:
HOST=127.0.0.1 # Localhost only
PORT=8080 # Custom port
ALLOWED_ORIGINS=https://memory.example.com,http://localhost:3000
MATRIC_MAX_BODY_SIZE_BYTES=2147483648
MATRIC_MAX_UPLOAD_SIZE_BYTES=104857600 # 100 MB| Variable | Type | Default | Description |
|---|---|---|---|
REQUIRE_AUTH |
Boolean | false |
Require authentication on all /api/v1/* endpoints. When false, all endpoints are publicly accessible. |
ISSUER_URL |
String | http://<HOST>:<PORT> |
External base URL for OAuth discovery and MCP (e.g., https://memory.example.com). Required for OAuth/MCP. |
OAUTH_TOKEN_LIFETIME_SECS |
Integer | 3600 |
OAuth access token lifetime in seconds (1 hour). Shorter = more secure; longer = less re-authentication friction. |
OAUTH_MCP_TOKEN_LIFETIME_SECS |
Integer | 86400 |
MCP OAuth access token lifetime in seconds (24 hours). MCP sessions are interactive — shorter tokens cause mid-session disconnects. |
Example (Personal Use):
REQUIRE_AUTH=falseExample (Team Deployment):
REQUIRE_AUTH=true
ISSUER_URL=https://memory.team.com
OAUTH_TOKEN_LIFETIME_SECS=3600
OAUTH_MCP_TOKEN_LIFETIME_SECS=86400| Variable | Type | Default | Description |
|---|---|---|---|
RATE_LIMIT_ENABLED |
Boolean | false |
Enable rate limiting for API endpoints |
RATE_LIMIT_REQUESTS |
Integer | 100 |
Maximum requests per time window |
RATE_LIMIT_PERIOD_SECS |
Integer | 60 |
Rate limit time window in seconds |
Example:
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_PERIOD_SECS=60| Variable | Type | Default | Description |
|---|---|---|---|
RUST_LOG |
String | info |
Log level and filtering for Rust components (uses env_logger syntax) |
LOG_FORMAT |
String | pretty |
Log output format: pretty, json, or compact |
LOG_FILE |
String | None | Path to log file (logs to stdout if not set) |
LOG_ANSI |
Boolean | true |
Enable ANSI color codes in logs |
Common Configurations:
Production (default):
RUST_LOG=info
LOG_FORMAT=json
LOG_FILE=/var/log/matric/api.log
LOG_ANSI=falseAPI debugging:
RUST_LOG=matric_api=debug,infoInference debugging:
RUST_LOG=matric_inference=debug,infoSearch debugging:
RUST_LOG=matric_db=debug,matric_search=debug,infoFull debug (verbose):
RUST_LOG=debugSpecific module debugging:
RUST_LOG=matric_api::routes::search=trace,info| Variable | Type | Default | Description |
|---|---|---|---|
WORKER_ENABLED |
Boolean | true |
Enable background job processing (embeddings, linking, cleanup). Alias: JOB_WORKER_ENABLED. |
JOB_WORKER_ENABLED |
Boolean | true |
Enable/disable job processing in the worker process (takes precedence when set). |
WORKER_THREADS |
Integer | CPU cores | Number of Tokio worker threads for background jobs |
JOB_POLL_INTERVAL_MS |
Integer | 60000 |
Safety-net polling interval in milliseconds. The worker is event-driven (woken by NOTIFY); this interval only triggers as a fallback for crash recovery and race conditions. |
JOB_MAX_CONCURRENT |
Integer | 4 |
Maximum number of jobs that can run concurrently in the worker |
Example:
WORKER_ENABLED=true
WORKER_THREADS=4
JOB_POLL_INTERVAL_MS=60000
JOB_MAX_CONCURRENT=4| Variable | Type | Default | Description |
|---|---|---|---|
CHAT_MAX_CONCURRENT |
Integer | 1 |
Maximum concurrent chat requests hitting the GPU. Chat bypasses the job queue and calls Ollama directly via a tokio::Semaphore. When all permits are in use, new requests return 503. Increase if VRAM allows parallel inference. |
Example:
# Allow 2 concurrent chat sessions (requires sufficient VRAM)
CHAT_MAX_CONCURRENT=2| Variable | Type | Default | Description |
|---|---|---|---|
MATRIC_EVENT_BUS_CAPACITY |
Integer | 256 |
Broadcast channel capacity for the internal event bus. Increase for high-traffic deployments. |
SSE_REPLAY_BUFFER_SIZE |
Integer | 1024 |
Number of past events retained in the SSE replay buffer for Last-Event-ID reconnection support. |
SSE_COALESCE_WINDOW_MS |
Integer | 500 |
Deduplication window in milliseconds for low-priority SSE events (e.g., job.progress). Events with the same coalescing key are deduplicated within this window, keeping only the latest. Set to 0 to disable. |
MATRIC_WEBHOOK_TIMEOUT_SECS |
Integer | 10 |
Timeout in seconds for outgoing webhook HTTP requests. |
Example:
MATRIC_EVENT_BUS_CAPACITY=512
SSE_REPLAY_BUFFER_SIZE=2048
SSE_COALESCE_WINDOW_MS=500
MATRIC_WEBHOOK_TIMEOUT_SECS=10| Variable | Type | Default | Description |
|---|---|---|---|
FILE_STORAGE_PATH |
String | /var/lib/matric/files |
Directory for storing uploaded file attachments on disk |
Example:
FILE_STORAGE_PATH=/mnt/data/matric/filesSettings for the tus v1.0.0 resumable upload protocol. See File Attachments Guide for protocol details.
| Variable | Type | Default | Description |
|---|---|---|---|
TUS_UPLOAD_EXPIRY_HOURS |
Integer | 24 |
Hours before an incomplete tus upload session expires and is cleaned up |
TUS_CHUNK_MAX_SIZE |
Integer | 52428800 |
Maximum chunk size in bytes for tus PATCH requests (default: 50 MB) |
Example:
TUS_UPLOAD_EXPIRY_HOURS=48 # 2 days for slow connections
TUS_CHUNK_MAX_SIZE=104857600 # 100 MB chunks for fast networks| Variable | Type | Default | Description |
|---|---|---|---|
MAX_MEMORIES |
Integer | 10 |
Maximum number of live memory archives in the database |
DEFAULT_ARCHIVE_CACHE_TTL |
Integer | 60 |
Cache TTL in seconds for the default archive lookup. Reduces database lookups for the default memory on high-traffic deployments. |
DISABLE_SUPPORT_MEMORY |
Boolean | false |
Set to true to skip automatic loading of the built-in fortemi-docs support archive on first boot. |
Example:
# Scale with your hardware (see capacity formula below)
MAX_MEMORIES=50 # 16GB RAM, 100GB disk
MAX_MEMORIES=200 # 32GB RAM, 500GB disk
MAX_MEMORIES=500 # 64GB+ RAM, 1TB+ diskCapacity Planning:
Each empty memory adds ~1MB schema overhead (41 tables + indexes). The real cost is data growth within each memory. Average storage per note (with 20% attachment rate):
| Component | Per Note | Per 1,000 Notes |
|---|---|---|
| Note metadata + content | ~11 KB | 11 MB |
| Embeddings (768-dim) | ~3 KB | 3 MB |
| Attachments (avg 500KB, 20% rate) | ~100 KB | 100 MB |
| Thumbnails (100KB, 20% rate) | ~20 KB | 20 MB |
| Total average | ~134 KB | ~134 MB |
Capacity formula:
max_total_notes = available_storage / 134 KB
MAX_MEMORIES = max_total_notes / target_notes_per_memory
Recommended limits by hardware tier:
| Tier | RAM | Storage | MAX_MEMORIES | Notes per Memory | Total Notes |
|---|---|---|---|---|---|
| Tier 1 (Minimum) | 8 GB | 10 GB | 10 | ~5,000 | ~50,000 |
| Tier 2 (Standard) | 16 GB | 100 GB | 50 | ~20,000 | ~1,000,000 |
| Tier 3 (Performance) | 32 GB | 500 GB | 200 | ~50,000 | ~10,000,000 |
| Tier 4 (Professional) | 64 GB+ | 1 TB+ | 500 | ~50,000 | ~25,000,000 |
Memory Limits:
MAX_MEMORIESlimits live memories (schemas in the database), not the total number you can ever create- Export memories as shards (
POST /api/v1/shards/export), delete them to free slots, and re-import later — there is no limit on the number of archived shards you can store on disk - Attempting to create memories beyond
MAX_MEMORIESreturns HTTP 400 - Check current usage via
GET /api/v1/memories/overview - Each memory adds minimal overhead (<1MB metadata + indexes); data growth is the real constraint
- Notes without attachments are much smaller (~14 KB each) — adjust estimates for your workload
| Header | Values | Description |
|---|---|---|
X-Fortemi-Memory |
Memory name | Routes request to specified memory (default: "default") |
Authorization |
Bearer token | API authentication (when REQUIRE_AUTH=true) |
The X-Fortemi-Memory header routes all API requests to a specific memory archive. Without this header, requests operate on the default memory. See the Multi-Memory Guide for details.
Ollama is the default inference backend for local LLM inference without API costs.
| Variable | Type | Default | Description |
|---|---|---|---|
OLLAMA_BASE |
String | http://127.0.0.1:11434 |
Ollama API endpoint URL (primary variable read by the backend) |
OLLAMA_URL |
String | http://127.0.0.1:11434 |
Alias for OLLAMA_BASE (checked as fallback by the vision handler and content summarizer) |
OLLAMA_HOST |
String | http://localhost:11434 |
Alias used by the Ollama discovery service |
OLLAMA_EMBED_MODEL |
String | nomic-embed-text |
Model name for generating embeddings |
OLLAMA_GEN_MODEL |
String | qwen3.5:27b |
Model name for text generation (standard/failover tier) |
OLLAMA_EMBED_DIM |
Integer | 768 |
Vector dimensionality for embeddings. Must match the model's output dimension. |
MATRIC_EMBED_TIMEOUT_SECS |
Integer | 30 |
Timeout in seconds for embedding requests to Ollama |
MATRIC_GEN_TIMEOUT_SECS |
Integer | 120 |
Timeout in seconds for generation requests to Ollama |
MATRIC_OLLAMA_URL |
String | http://127.0.0.1:11434 |
Ollama URL used by the TOML-based inference config path |
MATRIC_OLLAMA_EMBEDDING_MODEL |
String | nomic-embed-text |
Embedding model used by the TOML-based inference config path |
MATRIC_OLLAMA_GENERATION_MODEL |
String | qwen3.5:27b |
Generation model used by the TOML-based inference config path |
Example (Docker Desktop - macOS/Windows):
OLLAMA_BASE=http://host.docker.internal:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=llama3.2:3b
OLLAMA_EMBED_DIM=768Example (Linux with Docker):
OLLAMA_BASE=http://172.17.0.1:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=qwen2.5:7b
OLLAMA_EMBED_DIM=768Example (Performance Tuning):
OLLAMA_BASE=http://localhost:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=qwen2.5:7b
MATRIC_EMBED_TIMEOUT_SECS=30
MATRIC_GEN_TIMEOUT_SECS=180The OpenAI backend supports OpenAI's cloud API and any OpenAI-compatible endpoint (Azure OpenAI, vLLM, LocalAI, LM Studio, etc.).
| Variable | Type | Default | Description |
|---|---|---|---|
INFERENCE_BACKEND |
String | ollama |
Backend selection: ollama or openai |
OPENAI_API_KEY |
String | None | API key for OpenAI cloud (required for OpenAI cloud) |
OPENAI_BASE_URL |
String | https://api.openai.com/v1 |
OpenAI API base URL or compatible endpoint |
OPENAI_EMBED_MODEL |
String | text-embedding-3-small |
Model name for embeddings |
OPENAI_GEN_MODEL |
String | gpt-oss:20b |
Model name for text generation |
OPENAI_EMBED_DIM |
Integer | 1536 |
Vector dimensionality for embeddings |
OPENAI_TIMEOUT |
Integer | 30 |
Request timeout in seconds |
OPENAI_SKIP_TLS_VERIFY |
Boolean | false |
Disable TLS certificate verification (insecure, for testing only) |
OPENAI_HTTP_REFERER |
String | None | Optional HTTP-Referer header sent with requests (useful for OpenRouter and compatible proxies) |
OPENAI_X_TITLE |
String | None | Optional X-Title header for identification in compatible API dashboards |
MATRIC_OPENAI_URL |
String | https://api.openai.com/v1 |
OpenAI URL used by the TOML-based inference config path |
MATRIC_OPENAI_API_KEY |
String | None | API key used by the TOML-based inference config path |
MATRIC_OPENAI_EMBEDDING_MODEL |
String | text-embedding-3-small |
Embedding model used by the TOML-based inference config path |
MATRIC_OPENAI_GENERATION_MODEL |
String | gpt-4o-mini |
Generation model used by the TOML-based inference config path |
Example (OpenAI Cloud):
INFERENCE_BACKEND=openai
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_EMBED_MODEL=text-embedding-3-small
OPENAI_GEN_MODEL=gpt-4o-mini
OPENAI_EMBED_DIM=1536
OPENAI_TIMEOUT=120Example (Azure OpenAI):
INFERENCE_BACKEND=openai
OPENAI_API_KEY=your-azure-key
OPENAI_BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment
OPENAI_EMBED_MODEL=text-embedding-ada-002
OPENAI_GEN_MODEL=gpt-4Example (vLLM Self-Hosted):
INFERENCE_BACKEND=openai
OPENAI_API_KEY=token
OPENAI_BASE_URL=http://vllm-server:8000/v1
OPENAI_GEN_MODEL=meta-llama/Llama-3.1-8B-Instruct
OPENAI_TIMEOUT=180Example (LocalAI):
INFERENCE_BACKEND=openai
OPENAI_API_KEY=localai
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_EMBED_MODEL=text-embedding-ada-002
OPENAI_GEN_MODEL=gpt-3.5-turboThe MCP (Model Context Protocol) server provides Claude/AI integration.
| Variable | Type | Default | Description |
|---|---|---|---|
ISSUER_URL |
String | None | External base URL for OAuth and MCP discovery (required for MCP) |
MCP_CLIENT_ID |
String | None | OAuth client ID for token introspection (required for MCP auth) |
MCP_CLIENT_SECRET |
String | None | OAuth client secret for token introspection (required for MCP auth) |
MCP_BASE_URL |
String | ${ISSUER_URL}/mcp |
MCP protected resource URL (derived from ISSUER_URL) |
MCP_TRANSPORT |
String | http |
Transport mode: stdio (direct process) or http (network) |
MCP_PORT |
Integer | 3001 |
Port for MCP HTTP server (when transport=http) |
MCP_BASE_PATH |
String | /mcp |
URL path prefix for the MCP server (when transport=http) |
MATRIC_API_URL |
String | http://localhost:3000 |
API server URL for the MCP server to connect to. Alias: FORTEMI_URL. |
FORTEMI_URL |
String | http://localhost:3000 |
Alias for MATRIC_API_URL. Used in Docker bundle deployments. |
FORTEMI_API_KEY |
String | None | API key for the MCP server to authenticate with the Fortemi API (when REQUIRE_AUTH=true). |
Example (Docker Bundle):
ISSUER_URL=https://memory.example.com
MCP_CLIENT_ID=mm_xxxxxxxxxxxxx
MCP_CLIENT_SECRET=xxxxxxxxxxxxx
MCP_BASE_URL=https://memory.example.com/mcp
MCP_TRANSPORT=http
MCP_PORT=3001Example (Claude Desktop - stdio):
MCP_TRANSPORT=stdio
MATRIC_API_URL=http://localhost:3000OAuth Client Registration:
Before configuring MCP, register an OAuth client for token introspection:
curl -X POST http://localhost:3000/oauth/register \
-H "Content-Type: application/json" \
-d '{
"client_name": "MCP Server",
"grant_types": ["client_credentials"],
"scope": "mcp read"
}'Save the returned client_id and client_secret to MCP_CLIENT_ID and MCP_CLIENT_SECRET.
These feature flags control advanced search capabilities. They are disabled by default because they increase database complexity and require specific PostgreSQL extensions.
| Variable | Type | Default | Description |
|---|---|---|---|
FTS_WEBSEARCH_TO_TSQUERY |
Boolean | true |
Enable search operators (OR, NOT, phrase search with quotes) |
FTS_SCRIPT_DETECTION |
Boolean | false |
Auto-detect query language/script for optimal tokenization |
FTS_TRIGRAM_FALLBACK |
Boolean | false |
Enable emoji and symbol search via pg_trgm trigram indexes |
FTS_BIGRAM_CJK |
Boolean | false |
Enable optimized CJK (Chinese/Japanese/Korean) search via pg_bigm |
FTS_MULTILINGUAL_CONFIGS |
Boolean | false |
Enable language-specific FTS configurations for stemming |
Why These Are Disabled by Default:
- FTS_SCRIPT_DETECTION: Adds complexity to query processing; only needed for mixed-language queries
- FTS_TRIGRAM_FALLBACK: Requires pg_trgm extension; only needed for emoji/symbol search
- FTS_BIGRAM_CJK: Requires pg_bigm extension (not installed by default); only for CJK languages
- FTS_MULTILINGUAL_CONFIGS: Requires multiple FTS dictionaries; increases storage and index size
Example (Minimal - English Only):
FTS_WEBSEARCH_TO_TSQUERY=true
FTS_SCRIPT_DETECTION=false
FTS_TRIGRAM_FALLBACK=false
FTS_BIGRAM_CJK=false
FTS_MULTILINGUAL_CONFIGS=falseExample (Multilingual Team):
FTS_WEBSEARCH_TO_TSQUERY=true
FTS_SCRIPT_DETECTION=true
FTS_TRIGRAM_FALLBACK=true
FTS_BIGRAM_CJK=false
FTS_MULTILINGUAL_CONFIGS=trueExample (Full CJK Support):
FTS_WEBSEARCH_TO_TSQUERY=true
FTS_SCRIPT_DETECTION=true
FTS_TRIGRAM_FALLBACK=true
FTS_BIGRAM_CJK=true
FTS_MULTILINGUAL_CONFIGS=truePerformance Impact:
Enabling all flags increases:
- Index storage by approximately 30-50%
- Index build time by 2-3x
- Query planning overhead by 10-20ms per query
For small installations (< 10,000 notes), enable only the features you need. For large installations (> 100,000 notes), test performance impact before enabling.
These variables control the multi-tier concept extraction cascade: GLiNER (tier 0, CPU-based NER) → fast model (tier 1) → standard model (tier 2).
| Variable | Type | Default | Description |
|---|---|---|---|
GLINER_BASE_URL |
String | http://gliner:8090 (Docker bundle) |
GLiNER NER service URL for CPU-based entity extraction (tier 0). Set to empty to disable. |
GLINER_MODEL |
String | (set by GLiNER sidecar) | GLiNER model name, consumed by the GLiNER sidecar container (e.g., urchade/gliner_large-v2.1). |
GLINER_THRESHOLD |
Float | (set by GLiNER sidecar) | Entity confidence threshold for the GLiNER sidecar (e.g., 0.3). |
EXTRACTION_TARGET_CONCEPTS |
Integer | 5 |
Target number of concepts to extract per note. GLiNER→fast model escalation triggers when below this threshold; fast→standard model escalation triggers at < target/2 (i.e., 3 with the default of 5). |
MATRIC_FAST_GEN_MODEL |
String | qwen3.5:9b |
Fast generation model (tier 1) used for concept tagging and reference extraction when GLiNER yields too few results. Large documents are automatically chunked. Set to empty to disable. |
MATRIC_FAST_GEN_TIMEOUT_SECS |
Integer | 60 |
Timeout in seconds for fast model generation requests. |
OLLAMA_GEN_MODEL |
String | qwen3.5:27b |
Standard generation model (tier 2) used as failover when the fast model also yields insufficient concepts. |
Extraction cascade:
GLiNER (tier 0, ~300ms, CPU)
→ if concepts < EXTRACTION_TARGET_CONCEPTS
→ MATRIC_FAST_GEN_MODEL (tier 1, chunked)
→ if concepts < EXTRACTION_TARGET_CONCEPTS / 2
→ OLLAMA_GEN_MODEL (tier 2, full context)
Example (Docker bundle defaults):
GLINER_BASE_URL=http://gliner:8090
EXTRACTION_TARGET_CONCEPTS=5
MATRIC_FAST_GEN_MODEL=qwen3.5:9b
OLLAMA_GEN_MODEL=qwen3.5:27bExample (disable GLiNER, LLM-only extraction):
GLINER_BASE_URL=
EXTRACTION_TARGET_CONCEPTS=5
MATRIC_FAST_GEN_MODEL=qwen3.5:9b
OLLAMA_GEN_MODEL=qwen3.5:27bExample (higher concept density for rich taxonomies):
EXTRACTION_TARGET_CONCEPTS=10
MATRIC_FAST_GEN_MODEL=qwen3.5:9b
OLLAMA_GEN_MODEL=qwen3.5:27b| Variable | Type | Default | Description |
|---|---|---|---|
EMBED_CONCEPT_MAX_DOC_FREQ |
Float | 0.8 |
Maximum document frequency ratio for concepts included in embedding text enrichment. Concepts appearing in more than this fraction of notes are treated as "stopwords" and excluded. Range: 0.01–1.0. |
EMBED_INSTRUCTION_PREFIX |
String | clustering: |
Instruction prefix prepended to embedding text. nomic-embed-text supports clustering: , search_document: , and classification: . Set to empty string to disable. |
Example:
EMBED_CONCEPT_MAX_DOC_FREQ=0.8
EMBED_INSTRUCTION_PREFIX=clustering:| Variable | Type | Default | Description |
|---|---|---|---|
OLLAMA_VISION_MODEL |
String | qwen3.5:9b |
Ollama vision model for image description and 3D model rendering. Set to empty to disable image extraction. Requires Ollama with a vision-capable model pulled. qwen3.5:9b is natively multimodal (unified generation and vision). |
Example:
OLLAMA_VISION_MODEL=qwen3.5:9b # natively multimodal; also used as fast gen model
# OLLAMA_VISION_MODEL=llava:7b # Alternative (vision-only model)
# OLLAMA_VISION_MODEL= # Disable| Variable | Type | Default | Description |
|---|---|---|---|
WHISPER_BASE_URL |
String | http://localhost:8000 |
URL for the Whisper-compatible transcription service. Set to empty to disable audio transcription. Deploy via docker-compose.whisper.yml. |
WHISPER_MODEL |
String | Systran/faster-distil-whisper-large-v3 |
Whisper model name to use for transcription. |
Example:
WHISPER_BASE_URL=http://host.docker.internal:8000
WHISPER_MODEL=Systran/faster-distil-whisper-large-v3| Variable | Type | Default | Description |
|---|---|---|---|
RENDERER_URL |
String | http://localhost:8080 |
URL for the Open3D renderer used for GLB/3D model multi-view rendering. The Docker bundle includes the renderer at this default address. Set to a custom URL for external renderer deployments. |
Example:
RENDERER_URL=http://localhost:8080| Variable | Type | Default | Description |
|---|---|---|---|
OCR_ENABLED |
Boolean | false |
Enable OCR-based text extraction for scanned PDFs and images. Requires LibreOffice and Tesseract. |
LIBREOFFICE_PATH |
String | /usr/bin/libreoffice |
Path to the LibreOffice binary for document conversion (DOCX, XLSX, PPTX to PDF). |
Example:
OCR_ENABLED=true
LIBREOFFICE_PATH=/usr/bin/libreofficeThese variables tune the knowledge graph structure. All graph variables are read at job execution time — no restart required for changes.
| Variable | Type | Default | Description |
|---|---|---|---|
GRAPH_LINKING_STRATEGY |
String | hnsw_heuristic |
Linking strategy: hnsw_heuristic (HNSW Algorithm 4, diverse neighbor selection — recommended) or threshold (legacy epsilon-threshold). |
GRAPH_K_NEIGHBORS |
Integer | 0 (adaptive) |
Maximum neighbors per node (M in HNSW). 0 enables adaptive mode: k = log₂(N) clamped to [5, 15]. Set explicitly (e.g., 8) to override adaptive computation. |
GRAPH_MIN_SIMILARITY |
Float | 0.5 |
Absolute similarity floor — no edges are created below this cosine similarity regardless of strategy. Range: 0.0–1.0. |
GRAPH_EXTEND_CANDIDATES |
Boolean | false |
Extend HNSW candidate set with neighbors-of-neighbors (Algorithm 4 option). Increases recall at the cost of more comparisons. |
GRAPH_KEEP_PRUNED |
Boolean | false |
Fill remaining neighbor slots from pruned candidates when the candidate set is exhausted (Algorithm 4 option). |
GRAPH_TAG_BOOST_WEIGHT |
Float | 0.3 |
Weight for SKOS tag overlap in the blended linking score. blended = (embedding_sim * (1 - w)) + (tag_overlap * w). Set to 0.0 to disable tag-based boost. Range: 0.0–1.0. |
GRAPH_NORMALIZATION_GAMMA |
Float | 1.0 |
Gamma exponent for edge weight normalization during graph traversal. Applied as normalized = ((score - min) / (max - min)) ^ gamma. Values >1.0 amplify top-end differences; <1.0 compress them. Range: 0.1–5.0. |
GRAPH_SNN_THRESHOLD |
Float | 0.10 |
Shared Nearest Neighbor pruning threshold. Edges with SNN score below this are pruned during recompute_snn_scores. SNN(A,B) = |kNN(A) ∩ kNN(B)| / k. Range: 0.0–1.0. |
GRAPH_COMMUNITY_RESOLUTION |
Float | 1.0 |
Louvain community detection resolution parameter. Higher = more, smaller communities; lower = fewer, larger communities. Range: 0.1–10.0. |
GRAPH_PFNET_Q |
Integer | 2 |
PFNET graph sparsification q parameter. q=2 is equivalent to the Relative Neighborhood Graph (Toussaint 1980). Higher q produces sparser graphs approaching the MST. Range: 2–10. |
GRAPH_STRUCTURAL_SCORE |
Float | 0.5 |
Edge score assigned to structural (same-collection) edges. Controls the "gravity well" strength pulling exploration toward notes in the same collection. Range: 0.0–1.0. |
Example (defaults — suitable for most deployments):
GRAPH_LINKING_STRATEGY=hnsw_heuristic
GRAPH_K_NEIGHBORS=0
GRAPH_MIN_SIMILARITY=0.5
GRAPH_EXTEND_CANDIDATES=false
GRAPH_KEEP_PRUNED=false
GRAPH_TAG_BOOST_WEIGHT=0.3
GRAPH_NORMALIZATION_GAMMA=1.0
GRAPH_SNN_THRESHOLD=0.10
GRAPH_COMMUNITY_RESOLUTION=1.0
GRAPH_PFNET_Q=2
GRAPH_STRUCTURAL_SCORE=0.5Example (denser graph for tightly-related content):
GRAPH_LINKING_STRATEGY=hnsw_heuristic
GRAPH_K_NEIGHBORS=12
GRAPH_MIN_SIMILARITY=0.6
GRAPH_TAG_BOOST_WEIGHT=0.4
GRAPH_NORMALIZATION_GAMMA=1.5OpenRouter provides access to 100+ LLMs via a single API. It is opt-in: the OPENROUTER_API_KEY variable activates the provider.
| Variable | Type | Default | Description |
|---|---|---|---|
OPENROUTER_API_KEY |
String | None | OpenRouter API key. Setting this variable enables the OpenRouter provider for generation tasks. |
OPENROUTER_BASE_URL |
String | https://openrouter.ai/api/v1 |
OpenRouter API base URL. |
OPENROUTER_TIMEOUT |
Integer | 300 |
Request timeout in seconds for OpenRouter calls. |
OPENROUTER_HTTP_REFERER |
String | None | Optional HTTP-Referer header sent to OpenRouter for attribution and rate limit exemptions. |
OPENROUTER_X_TITLE |
String | None | Optional X-Title header sent to OpenRouter for display in the OpenRouter dashboard. |
Example:
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxx
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_TIMEOUT=300
OPENROUTER_HTTP_REFERER=https://memory.example.com
OPENROUTER_X_TITLE=Matric MemoryThese variables are set automatically by the CI/CD pipeline and are read-only at runtime. They are exposed via the /health endpoint for build tracing.
| Variable | Type | Default | Description |
|---|---|---|---|
MATRIC_GIT_SHA |
String | unknown |
Git commit SHA of the running build. Set by CI during image build. |
MATRIC_BUILD_DATE |
String | unknown |
Build timestamp. Set by CI during image build. |
The inference.toml file provides structured configuration for inference backends. It supports both Ollama and OpenAI configurations, with the ability to use different backends for different operations.
# Development
./inference.toml
# Production (Docker)
/app/inference.toml# =============================================================================
# Inference Backend Configuration
# =============================================================================
# This file configures LLM inference backends for embeddings and generation.
# Supports Ollama (local) and OpenAI (cloud or compatible APIs).
[inference]
# Backend selection: "ollama" or "openai"
# Can be overridden by INFERENCE_BACKEND environment variable
backend = "ollama"
# =============================================================================
# Ollama Configuration (Local LLM)
# =============================================================================
[inference.ollama]
# Ollama API endpoint
url = "http://localhost:11434"
# Model for generating embeddings
# Recommended: nomic-embed-text (768d), mxbai-embed-large (1024d)
embedding_model = "nomic-embed-text"
# Model for text generation (optional)
# Recommended: llama3.2:3b (fast), qwen2.5:7b (quality), llama3.1:8b (balanced)
generation_model = "llama3.2:3b"
# Embedding vector dimensionality
# Must match the model's output dimension
embedding_dimension = 768
# Context window size in tokens (optional)
# Larger values allow more context but use more memory
# num_ctx = 8192
# GPU layers to offload (optional)
# 99 = all layers on GPU (recommended for dedicated GPU)
# 0 = CPU only
# num_gpu = 99
# Concurrent request processing (optional)
# Higher values improve throughput but increase memory usage
# num_parallel = 1
# =============================================================================
# OpenAI Configuration (Cloud or Compatible APIs)
# =============================================================================
[inference.openai]
# OpenAI API base URL
# OpenAI cloud: https://api.openai.com/v1
# Azure: https://YOUR-RESOURCE.openai.azure.com/openai/deployments/YOUR-DEPLOYMENT
# vLLM: http://localhost:8000/v1
# LocalAI: http://localhost:8080/v1
base_url = "https://api.openai.com/v1"
# API key (can use environment variable reference)
# For cloud: sk-proj-xxxxx
# For local servers: any value (usually ignored)
api_key = "${OPENAI_API_KEY}"
# Model for generating embeddings
# OpenAI: text-embedding-3-small, text-embedding-3-large
# Azure: text-embedding-ada-002
embedding_model = "text-embedding-3-small"
# Model for text generation
# OpenAI: gpt-4o-mini, gpt-4o, gpt-4-turbo
# Azure: gpt-4, gpt-35-turbo
generation_model = "gpt-4o-mini"
# Embedding vector dimensionality
# text-embedding-3-small: 1536
# text-embedding-3-large: 3072
embedding_dimension = 1536
# Request timeout in seconds (optional)
# timeout = 120
# Maximum retry attempts (optional)
# max_retries = 3
# Disable TLS verification (insecure, testing only)
# skip_tls_verify = falseThe [inference] section controls which backend is used at runtime:
[inference]
backend = "ollama" # Use Ollama[inference]
backend = "openai" # Use OpenAIThis can be overridden by the INFERENCE_BACKEND environment variable:
export INFERENCE_BACKEND=openaiYou can configure different backends for embeddings vs generation by using both configurations and selecting models:
Use local Ollama for embeddings, cloud OpenAI for generation:
[inference]
backend = "ollama" # Default to Ollama
[inference.ollama]
url = "http://localhost:11434"
embedding_model = "nomic-embed-text"
embedding_dimension = 768
# No generation_model specified
[inference.openai]
base_url = "https://api.openai.com/v1"
api_key = "${OPENAI_API_KEY}"
generation_model = "gpt-4o-mini"The system will use Ollama for embeddings (cost-free, private) and OpenAI for generation (higher quality).
To implement fallback behavior (try local first, fall back to cloud):
- Configure both backends in
inference.toml - Set primary backend:
backend = "ollama" - When Ollama fails (connection refused, model not found), manually switch to OpenAI via API retry or configuration update
Current implementation does not support automatic fallback. For high availability, consider deploying multiple Ollama instances with load balancing.
The MCP server enables integration with Claude Desktop, Claude Code, and other MCP-compatible clients.
Before using the MCP server, you must configure OAuth:
Step 1: Set ISSUER_URL
The ISSUER_URL is the external base URL where your Fortémi API is accessible:
# .env
ISSUER_URL=https://memory.example.comThis URL is used for OAuth discovery, token verification, and MCP resource identification.
Step 2: Register OAuth Client
Register a client for the MCP server to introspect tokens:
curl -X POST https://memory.example.com/oauth/register \
-H "Content-Type: application/json" \
-d '{
"client_name": "MCP Server",
"grant_types": ["client_credentials"],
"scope": "mcp read"
}'Response:
{
"client_id": "mm_xxxxxxxxxxxxx",
"client_secret": "xxxxxxxxxxxxx",
"client_name": "MCP Server",
"grant_types": ["client_credentials"],
"scope": "mcp read"
}Step 3: Configure MCP Credentials
Add the credentials to .env:
# .env
ISSUER_URL=https://memory.example.com
MCP_CLIENT_ID=mm_xxxxxxxxxxxxx
MCP_CLIENT_SECRET=xxxxxxxxxxxxxStep 4: Restart Services
docker compose -f docker-compose.bundle.yml down
docker compose -f docker-compose.bundle.yml up -dStep 5: Verify
curl https://memory.example.com/mcp/.well-known/oauth-protected-resourceExpected response:
{
"resource": "https://memory.example.com/mcp",
"authorization_servers": ["https://memory.example.com"],
"scopes_supported": ["mcp", "read", "write"]
}The MCP server supports two transport modes:
Used for Docker bundle deployments and network access:
# .env
MCP_TRANSPORT=http
MCP_PORT=3001
MCP_BASE_URL=https://memory.example.com/mcpThe MCP server listens on port 3001 and is accessible via HTTP. Configure nginx to proxy /mcp to localhost:3001.
Used for Claude Desktop when running as a direct subprocess:
# .env
MCP_TRANSPORT=stdio
MATRIC_API_URL=http://localhost:3000The MCP server communicates via stdin/stdout instead of HTTP. This is configured in Claude Desktop's configuration file.
To connect to Fortemi via stdio transport:
Configuration File Location:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Configuration:
{
"mcpServers": {
"fortemi": {
"command": "node",
"args": [
"/absolute/path/to/Fortémi/mcp-server/build/index.js"
],
"env": {
"MCP_TRANSPORT": "stdio",
"MATRIC_API_URL": "http://localhost:3000"
}
}
}
}Restart Claude Desktop to load the configuration.
Claude Code uses URL-based MCP transport:
Project .mcp.json:
{
"mcpServers": {
"fortemi": {
"url": "https://memory.example.com/mcp"
}
}
}Claude Code will authenticate using OAuth and connect to the MCP server via HTTPS.
For personal use with local Ollama, no authentication:
# .env
DATABASE_URL=postgres://matric:matric@localhost/matric
OLLAMA_BASE=http://localhost:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_EMBED_DIM=768
RUST_LOG=info
REQUIRE_AUTH=false
RATE_LIMIT_ENABLED=falseThis provides:
- Full-text search (immediate)
- Semantic search (after embedding generation)
- No rate limiting or authentication
- Local inference (no API costs)
For team deployment with authentication and rate limiting:
# .env
DATABASE_URL=postgres://matric:matric@db.internal:5432/matric_prod
HOST=0.0.0.0
PORT=3000
RUST_LOG=info
# Authentication
REQUIRE_AUTH=true
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_PERIOD_SECS=60
ISSUER_URL=https://memory.team.com
# MCP (for Claude integration)
MCP_CLIENT_ID=mm_xxxxxxxxxxxxx
MCP_CLIENT_SECRET=xxxxxxxxxxxxx
MCP_BASE_URL=https://memory.team.com/mcp
MCP_TRANSPORT=http
MCP_PORT=3001
# Ollama (local inference)
OLLAMA_BASE=http://ollama.internal:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_GEN_MODEL=qwen2.5:7b
OLLAMA_EMBED_DIM=768
# Background worker
WORKER_ENABLED=true
WORKER_THREADS=8
JOB_POLL_INTERVAL_MS=60000
# Logging
LOG_FORMAT=json
LOG_FILE=/var/log/matric/api.log
LOG_ANSI=falseThis provides:
- OAuth authentication for all API endpoints
- Rate limiting (1000 requests/minute per user)
- MCP integration for Claude
- Optimized Ollama configuration for performance
- Structured JSON logging for analysis
For large enterprise deployment with multilingual search, cloud AI, and monitoring:
# .env
DATABASE_URL=postgres://matric:matric@db-cluster.internal:5432/matric_prod
HOST=0.0.0.0
PORT=3000
RUST_LOG=matric_api=info,matric_db=warn,matric_inference=info
# Authentication and rate limiting
REQUIRE_AUTH=true
RATE_LIMIT_ENABLED=true
RATE_LIMIT_REQUESTS=10000
RATE_LIMIT_PERIOD_SECS=60
ISSUER_URL=https://knowledge.corp.com
# MCP server
MCP_CLIENT_ID=mm_xxxxxxxxxxxxx
MCP_CLIENT_SECRET=xxxxxxxxxxxxx
MCP_BASE_URL=https://knowledge.corp.com/mcp
MCP_TRANSPORT=http
MCP_PORT=3001
# Hybrid inference: Local embeddings + Cloud generation
INFERENCE_BACKEND=ollama
OLLAMA_BASE=http://ollama-cluster.internal:11434
OLLAMA_EMBED_MODEL=nomic-embed-text
OLLAMA_EMBED_DIM=768
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxx
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_GEN_MODEL=gpt-4o
OPENAI_TIMEOUT=180
# Multilingual full-text search
FTS_WEBSEARCH_TO_TSQUERY=true
FTS_SCRIPT_DETECTION=true
FTS_TRIGRAM_FALLBACK=true
FTS_BIGRAM_CJK=false
FTS_MULTILINGUAL_CONFIGS=true
# Background worker optimization
WORKER_ENABLED=true
WORKER_THREADS=16
JOB_POLL_INTERVAL_MS=60000
# Production logging
LOG_FORMAT=json
LOG_FILE=/var/log/matric/api.log
LOG_ANSI=false
# Backup configuration
BACKUP_DEST=/var/backups/Fortémi
BACKUP_SCRIPT_PATH=/app/scripts/backup.shThis provides:
- Enterprise-grade authentication and rate limiting
- Hybrid inference (local embeddings for privacy, cloud generation for quality)
- Full multilingual search support
- Optimized worker configuration for high throughput
- Structured logging for monitoring and analysis
- Automated backup configuration
When running Fortemi in Docker and accessing services on the host machine:
macOS and Windows (Docker Desktop):
# Use host.docker.internal to access host services
OLLAMA_BASE=http://host.docker.internal:11434Linux:
# Use Docker bridge network gateway IP
OLLAMA_BASE=http://172.17.0.1:11434
# Or use host network mode in docker-compose.bundle.yml:
# network_mode: "host"Docker Compose loads .env automatically from the project root. Variables set in docker-compose.bundle.yml take precedence over .env.
Precedence (highest to lowest):
- Environment variables set in shell
- Environment variables in
docker-compose.bundle.yml - Variables in
.envfile - Built-in defaults
Environment variable changes require container restart:
docker compose -f docker-compose.bundle.yml down
docker compose -f docker-compose.bundle.yml up -dConfiguration changes take effect immediately on startup.
- Inference Backends - Detailed backend documentation and model selection
- Operations Guide - Deployment, monitoring, and troubleshooting
- Getting Started - Quick start guide for new users
- MCP Server - Claude integration and MCP protocol details
- Multilingual FTS - Search feature flags and language support
- Authentication - OAuth setup and user management
- Multi-Memory Guide - Parallel memory archives and federated search
- Search Guide - Search modes and query syntax
- Hardware Planning - Capacity planning and performance optimization