Problem
All ChromaDB collections are created without specifying a distance metric, so they default to L2 (squared Euclidean). Sentence embedding models (including the all-MiniLM-L6-v2 used here) are trained to produce vectors where cosine similarity is the meaningful distance measure. Using L2 treats vector magnitude as significant when it isn't, producing lower-quality nearest-neighbour retrieval.
Where it happens
retrievers.py — both ChromaRetriever (line 59) and PersistentChromaRetriever (line 203):
self.collection = self.client.get_or_create_collection(
name=collection_name, embedding_function=self.embedding_function
# ← no metadata={"hnsw:space": "cosine"}
)
Fix
self.collection = self.client.get_or_create_collection(
name=collection_name,
embedding_function=self.embedding_function,
metadata={"hnsw:space": "cosine"},
)
Apply the same change to all three get_or_create_collection calls in retrievers.py.
Migration note
The hnsw:space setting is locked at collection creation time and cannot be changed on an existing collection. Any existing persistent ChromaDB collection will need to be recreated (export documents → delete collection → recreate with cosine → re-import). The in-memory ChromaRetriever used by the MCP server is unaffected since it starts fresh each session.
Problem
All ChromaDB collections are created without specifying a distance metric, so they default to L2 (squared Euclidean). Sentence embedding models (including the
all-MiniLM-L6-v2used here) are trained to produce vectors where cosine similarity is the meaningful distance measure. Using L2 treats vector magnitude as significant when it isn't, producing lower-quality nearest-neighbour retrieval.Where it happens
retrievers.py— bothChromaRetriever(line 59) andPersistentChromaRetriever(line 203):Fix
Apply the same change to all three
get_or_create_collectioncalls inretrievers.py.Migration note
The
hnsw:spacesetting is locked at collection creation time and cannot be changed on an existing collection. Any existing persistent ChromaDB collection will need to be recreated (export documents → delete collection → recreate with cosine → re-import). The in-memoryChromaRetrieverused by the MCP server is unaffected since it starts fresh each session.