Skip to content

bug: search score field is L2 distance, not similarity — semantics are inverted #23

@RutgerBos

Description

@RutgerBos

Problem

search_memories returns a score field that is actually ChromaDB's raw L2 distance (lower = more similar). The field name implies higher = better, which is the opposite of what the value means.

Where it happens

memory_system.py search_agentic() (line 595) and search() (line 493) both assign:

memory_dict['score'] = results['distances'][0][i]

ChromaDB's distances is ascending L2 distance — 0.0 means identical, higher means less similar.

Impact

  • The MCP search_memories tool exposes this raw distance as score to all callers
  • Result ordering is currently correct (ChromaDB pre-sorts ascending), so end-users aren't seeing inverted results today
  • Any caller who filters by score threshold (score > 0.5) or sorts by score descending will retrieve the least relevant memories
  • The bug becomes critical the moment any re-ranking, threshold filtering, or score-based logic is added

Fix

Convert distance to a similarity value before exposing it. For L2 distance a simple inversion works: score = 1 / (1 + distance), giving values in (0, 1] where 1.0 = identical. Better still, fix the distance metric to cosine (see related issue) and compute score = 1 - distance.

Rename the field or add a clear docstring noting the direction if conversion is deferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions