Skip to content

enhancement: evaluate larger embedding model to improve retrieval quality #26

@RutgerBos

Description

@RutgerBos

Problem

The system uses all-MiniLM-L6-v2 as its embedding model (hardcoded default in retrievers.py). This is a 6-layer, 22M parameter model optimised for speed. It performs reasonably on general semantic similarity but struggles with:

  • Short, terse, factual text (typical of memory notes)
  • Domain-specific terminology
  • Nuanced queries that require deeper semantic understanding

Options worth evaluating

Model Params Notes
all-MiniLM-L6-v2 (current) 22M Fast, low quality ceiling
all-MiniLM-L12-v2 33M Same family, 2× layers, meaningful quality bump for low cost
all-mpnet-base-v2 109M Best general-purpose SentenceTransformer, strong on short texts
nomic-embed-text (via Ollama) Keeps everything local and on-GPU, fits the project's local-only stance

Suggested approach

  1. Fix the L2 → cosine metric bug (bug: ChromaDB collection uses L2 distance instead of cosine — degrades semantic search quality #24) first so benchmarks are meaningful
  2. Run a small retrieval eval against the existing memory store with each model
  3. Make the model name configurable via AMEM_EMBEDDING_MODEL env var (it's already a constructor parameter — just needs wiring to the env)

Note

Switching models on an existing persistent collection requires rebuilding the index (same migration caveat as #24). The MCP server's in-memory collection rebuilds fresh each session, so it's unaffected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions