Skip to content

SEP-065: FTS5 index for scouting RAG retrieval #31

@samudzi

Description

@samudzi

Summary

Replace the naive token-overlap heuristic in scouting_knowledge_sources() with SQLite FTS5 full-text search using BM25 ranking for matching deep research queries against scouting feasibility findings.

Motivation

The current keyword-overlap algorithm (token intersection with 3-char minimum) in aggregator.py misses relevant findings when vocabulary differs between the research query and the stored feasibility text. FTS5 with Porter stemming and BM25 ranking would dramatically improve recall with zero new dependencies since Zorora already uses SQLite.

Proposed Change

  • Add an FTS5 virtual table indexing feasibility findings (key_finding, conclusion, tab, asset_name, technology, country)
  • Replace token-overlap logic in scouting_knowledge_sources() with FTS5 MATCH query
  • Create the FTS5 table on first use and populate via triggers or on-demand rebuild

Acceptance Criteria

  • scouting_knowledge_sources() uses FTS5 MATCH instead of token intersection
  • Relevant feasibility findings are returned for queries that share no exact tokens but are semantically related via stemming
  • Existing tests updated to verify FTS5 path

References

  • Inspired by Sift memory architecture (Edmonds, 2026) — SQLite FTS5 + BM25 as embedding-free retrieval
  • Foundation for SEP-066 (anticipated queries)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions