Summary
Replace the naive token-overlap heuristic in scouting_knowledge_sources() with SQLite FTS5 full-text search using BM25 ranking for matching deep research queries against scouting feasibility findings.
Motivation
The current keyword-overlap algorithm (token intersection with 3-char minimum) in aggregator.py misses relevant findings when vocabulary differs between the research query and the stored feasibility text. FTS5 with Porter stemming and BM25 ranking would dramatically improve recall with zero new dependencies since Zorora already uses SQLite.
Proposed Change
- Add an FTS5 virtual table indexing feasibility findings (
key_finding, conclusion, tab, asset_name, technology, country)
- Replace token-overlap logic in
scouting_knowledge_sources() with FTS5 MATCH query
- Create the FTS5 table on first use and populate via triggers or on-demand rebuild
Acceptance Criteria
scouting_knowledge_sources() uses FTS5 MATCH instead of token intersection
- Relevant feasibility findings are returned for queries that share no exact tokens but are semantically related via stemming
- Existing tests updated to verify FTS5 path
References
- Inspired by Sift memory architecture (Edmonds, 2026) — SQLite FTS5 + BM25 as embedding-free retrieval
- Foundation for SEP-066 (anticipated queries)
Summary
Replace the naive token-overlap heuristic in
scouting_knowledge_sources()with SQLite FTS5 full-text search using BM25 ranking for matching deep research queries against scouting feasibility findings.Motivation
The current keyword-overlap algorithm (token intersection with 3-char minimum) in
aggregator.pymisses relevant findings when vocabulary differs between the research query and the stored feasibility text. FTS5 with Porter stemming and BM25 ranking would dramatically improve recall with zero new dependencies since Zorora already uses SQLite.Proposed Change
key_finding,conclusion,tab,asset_name,technology,country)scouting_knowledge_sources()with FTS5 MATCH queryAcceptance Criteria
scouting_knowledge_sources()uses FTS5 MATCH instead of token intersectionReferences