Test#2
Conversation
- Implemented ONNX Runtime CPU-optimized embedding backend - Added Reciprocal Rank Fusion (RRF) for blended sparse+dense retrieval - ONNX backend with graceful fallback to hash embeddings - RRF provides deterministic fusion of BM25 and vector search results - Added onnxruntime and tokenizers to optional dependencies - Configured RRF with k=60 (industry standard) - ONNX uses mean pooling and L2 normalization Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review Summary by QodoAdd ONNX embedding backend and Reciprocal Rank Fusion
WalkthroughsDescription• Implemented ONNX Runtime CPU-optimized embedding backend with graceful fallback • Added Reciprocal Rank Fusion (RRF) for deterministic hybrid retrieval fusion • Changed default embedding backend from hash to ONNX with improved model • Added ONNX session caching and RRF configuration options • Updated optional dependencies with onnxruntime, tokenizers, huggingface_hub Diagramflowchart LR
A["Retrieval Request"] --> B["FTS5/BM25 Results"]
A --> C["Vector Search Results"]
B --> D["RRF Fusion"]
C --> D
D --> E["Ranked Candidates"]
F["ONNX Backend"] -.->|"Embeddings"| C
G["Hash Backend"] -.->|"Fallback"| F
File Changes1. scripts/token_reducer/config.py
|
Code Review by Qodo
1. RRF scores filtered out
|
CI Feedback 🧐A test triggered by this PR failed. Here is an AI-generated analysis of the failure:
|
| for chunk_id, candidate in candidates.items(): | ||
| candidate.final_score = rrf_scores[chunk_id] | ||
|
|
||
| # Sort by RRF score descending | ||
| ranked = sorted(candidates.values(), key=lambda c: c.final_score, reverse=True) | ||
| return ranked | ||
|
|
||
|
|
||
| def rerank_candidates( | ||
| query: str, | ||
| fts_hits: list[Candidate], | ||
| vector_hits: list[Candidate], | ||
| top_k: int, | ||
| ) -> tuple[list[Candidate], list[Candidate]]: | ||
| from .config import should_use_rrf | ||
|
|
||
| # Use RRF if enabled, otherwise fall back to weighted scoring | ||
| if should_use_rrf() and vector_hits: | ||
| ranked = reciprocal_rank_fusion(fts_hits, vector_hits) | ||
| return ranked[:top_k], ranked |
There was a problem hiding this comment.
1. Rrf scores filtered out 🐞 Bug ≡ Correctness
With RRF enabled, reciprocal_rank_fusion() assigns Candidate.final_score to an RRF value (~1/(k+rank)), which is always far below the default relevance_floor=0.15, so compress_candidates() stops immediately and returns no bullets/packet content when vector retrieval is used.
Agent Prompt
## Issue description
RRF ranking currently overwrites `Candidate.final_score` with very small reciprocal-rank values. Downstream compression uses `final_score` with a default `relevance_floor=0.15`, causing the compressor to break immediately and produce empty results when RRF is active.
## Issue Context
- RRF scores for k=60 are ~0.016 (rank=1) per system, far below the 0.15 floor.
- `compress_candidates()` assumes `final_score` is on the existing weighted-score scale.
## Fix Focus Areas
- scripts/token_reducer/retriever.py[324-429]
- scripts/token_reducer/compressor.py[232-263]
- scripts/token_reducer/config.py[17-34]
## What to change
Choose one consistent approach:
1) **Use RRF for ordering only**: compute an `rrf_score` for sorting, but still compute the existing weighted `final_score` (and overlap/fts/vector scores) before passing candidates to the compressor; or
2) **Adjust relevance floor for RRF**: when RRF is enabled, either disable the relevance-floor early-break or use a much lower floor appropriate to RRF (and ensure the floor is configurable);
3) If you keep RRF as `final_score`, **rescale/normalize** RRF scores to the same magnitude expected by `relevance_floor`.
Add/adjust a small unit-style check (if present in repo) or a lightweight assertion path to prevent returning an empty packet solely due to scoring-scale mismatch.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Pull request overview
This PR enhances the token_reducer retrieval pipeline by adding Reciprocal Rank Fusion (RRF) for hybrid reranking and introducing an ONNX Runtime embedding backend (with caching), along with related configuration and optional dependencies.
Changes:
- Add RRF-based fusion for combining FTS/BM25 and vector retrieval results.
- Add ONNX Runtime embedding backend with session/tokenizer caching and hub/local model resolution.
- Update defaults/config to enable ONNX + RRF and adjust optional requirements.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/token_reducer/retriever.py | Adds RRF fusion and routes reranking through it when enabled. |
| scripts/token_reducer/embeddings.py | Adds ONNX session loading/caching and ONNX embedding generation + backend dispatch support. |
| scripts/token_reducer/db.py | Fixes dependency indexing count to reflect actual inserted rows (INSERT OR IGNORE). |
| scripts/token_reducer/config.py | Adds ONNX/RRF defaults and runtime configuration helpers/caches. |
| requirements-optional.txt | Adds optional ONNX-related dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from .config import get_rrf_k | ||
|
|
||
| if k <= 0: | ||
| k = get_rrf_k() | ||
|
|
There was a problem hiding this comment.
reciprocal_rank_fusion() imports get_rrf_k(), but with the current defaults k is always 60 and get_rrf_k() is never used (since rerank_candidates() calls reciprocal_rank_fusion(...) without passing k). This makes configure_rrf(k=...) ineffective. Consider defaulting k from get_rrf_k() when k is None/omitted, or explicitly passing k=get_rrf_k() from rerank_candidates().
|
|
||
| try: | ||
| from huggingface_hub import hf_hub_download # type: ignore | ||
|
|
||
| for candidate in _ONNX_CANDIDATE_FILENAMES: | ||
| try: | ||
| onnx_path = hf_hub_download(repo_id=model_path, filename=candidate) | ||
| break | ||
| except Exception: | ||
| continue | ||
|
|
||
| if onnx_path is None: | ||
| raise RuntimeError(f"No ONNX model file found in HuggingFace repo '{model_path}'") | ||
|
|
||
| tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json") | ||
|
|
||
| except Exception as hf_err: | ||
| model_dir = Path(model_path) | ||
| if not model_dir.exists(): | ||
| raise RuntimeError( | ||
| f"ONNX model not found at '{model_path}'. " | ||
| "Provide a valid HuggingFace model ID or local directory path." | ||
| ) from hf_err | ||
|
|
||
| for candidate in _ONNX_CANDIDATE_FILENAMES: | ||
| p = model_dir / candidate | ||
| if p.exists(): | ||
| onnx_path = str(p) | ||
| break | ||
|
|
||
| if onnx_path is None: | ||
| raise RuntimeError( | ||
| f"No ONNX model file found locally in '{model_path}'" | ||
| ) from hf_err | ||
|
|
||
| tokenizer_path = str(model_dir / "tokenizer.json") | ||
|
|
There was a problem hiding this comment.
The HuggingFace download try covers both the ONNX model and tokenizer.json, but any failure (including a missing tokenizer.json in the repo, a transient HF error, or ImportError for huggingface_hub) falls into the local-path fallback branch. This can produce misleading errors (treating a valid HF model ID as a missing local path) and can discard a successfully downloaded onnx_path. Consider handling ImportError for huggingface_hub separately, and distinguishing “repo exists but required files missing” from “local path missing”.
| try: | |
| from huggingface_hub import hf_hub_download # type: ignore | |
| for candidate in _ONNX_CANDIDATE_FILENAMES: | |
| try: | |
| onnx_path = hf_hub_download(repo_id=model_path, filename=candidate) | |
| break | |
| except Exception: | |
| continue | |
| if onnx_path is None: | |
| raise RuntimeError(f"No ONNX model file found in HuggingFace repo '{model_path}'") | |
| tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json") | |
| except Exception as hf_err: | |
| model_dir = Path(model_path) | |
| if not model_dir.exists(): | |
| raise RuntimeError( | |
| f"ONNX model not found at '{model_path}'. " | |
| "Provide a valid HuggingFace model ID or local directory path." | |
| ) from hf_err | |
| for candidate in _ONNX_CANDIDATE_FILENAMES: | |
| p = model_dir / candidate | |
| if p.exists(): | |
| onnx_path = str(p) | |
| break | |
| if onnx_path is None: | |
| raise RuntimeError( | |
| f"No ONNX model file found locally in '{model_path}'" | |
| ) from hf_err | |
| tokenizer_path = str(model_dir / "tokenizer.json") | |
| hf_import_err: Exception | None = None | |
| hf_onnx_err: Exception | None = None | |
| try: | |
| from huggingface_hub import hf_hub_download # type: ignore | |
| except ImportError as exc: | |
| hf_hub_download = None | |
| hf_import_err = exc | |
| if hf_hub_download is not None: | |
| for candidate in _ONNX_CANDIDATE_FILENAMES: | |
| try: | |
| onnx_path = hf_hub_download(repo_id=model_path, filename=candidate) | |
| break | |
| except Exception as exc: | |
| hf_onnx_err = exc | |
| continue | |
| if onnx_path is not None: | |
| try: | |
| tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json") | |
| except Exception as exc: | |
| raise RuntimeError( | |
| f"Found ONNX model in HuggingFace repo '{model_path}', " | |
| "but required file 'tokenizer.json' is missing or could not be downloaded." | |
| ) from exc | |
| if onnx_path is None: | |
| model_dir = Path(model_path) | |
| if model_dir.exists(): | |
| for candidate in _ONNX_CANDIDATE_FILENAMES: | |
| p = model_dir / candidate | |
| if p.exists(): | |
| onnx_path = str(p) | |
| break | |
| if onnx_path is None: | |
| raise RuntimeError( | |
| f"No ONNX model file found locally in '{model_path}'" | |
| ) from hf_onnx_err | |
| tokenizer_path = str(model_dir / "tokenizer.json") | |
| elif hf_import_err is not None: | |
| raise RuntimeError( | |
| "ONNX model not found locally and huggingface_hub is not installed. " | |
| "Provide a local directory path or install huggingface_hub to load a HuggingFace model ID." | |
| ) from hf_import_err | |
| elif hf_onnx_err is not None: | |
| raise RuntimeError( | |
| f"Unable to download an ONNX model file from HuggingFace repo '{model_path}', " | |
| "and no local directory exists at that path." | |
| ) from hf_onnx_err | |
| else: | |
| raise RuntimeError( | |
| f"ONNX model not found at '{model_path}'. " | |
| "Provide a valid HuggingFace model ID or local directory path." | |
| ) |
| tokenizer_path = str(model_dir / "tokenizer.json") | ||
|
|
There was a problem hiding this comment.
In the local-path fallback, tokenizer_path is set to <model_dir>/tokenizer.json but its existence is not checked before calling Tokenizer.from_file(). If the tokenizer file is missing, the error will be a low-level file error rather than a clear message about the required file layout. Consider validating tokenizer_path exists and raising a RuntimeError with actionable guidance.
| tokenizer_path = str(model_dir / "tokenizer.json") | |
| local_tokenizer_path = model_dir / "tokenizer.json" | |
| if not local_tokenizer_path.exists(): | |
| raise RuntimeError( | |
| f"Missing tokenizer file in local ONNX model directory '{model_path}'. " | |
| "Expected 'tokenizer.json' alongside the ONNX model files " | |
| f"({', '.join(_ONNX_CANDIDATE_FILENAMES)}). " | |
| "Provide a valid HuggingFace model ID or a local directory with the required files." | |
| ) from hf_err | |
| tokenizer_path = str(local_tokenizer_path) |
| # Model path can be local file or HuggingFace hub model ID | ||
| DEFAULT_ONNX_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2" |
There was a problem hiding this comment.
DEFAULT_ONNX_MODEL_PATH is introduced but (based on a repo-wide search) is not referenced anywhere else. If the intended default is already covered by DEFAULT_EMBEDDING_MODEL, consider removing this constant or wiring it into the CLI/config flow so it doesn’t drift unused.
| # Model path can be local file or HuggingFace hub model ID | |
| DEFAULT_ONNX_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2" | |
| # The default model identifier is provided by DEFAULT_EMBEDDING_MODEL. |
| @@ -74,6 +192,17 @@ def resolve_embedding_backend( | |||
| if backend == "hash": | |||
| return "hash", None | |||
|
|
|||
| if backend == "onnx": | |||
| try: | |||
| get_onnx_session(requested_model) | |||
| return "onnx", requested_model | |||
| except Exception as exc: | |||
| print( | |||
| f"[warn] ONNX embedding backend unavailable ({exc}). Falling back to hash embeddings.", | |||
| file=sys.stderr, | |||
| ) | |||
| return "hash", None | |||
|
|
|||
There was a problem hiding this comment.
New ONNX support was added to resolve_embedding_backend() / embed_text(), but tests/test_embeddings.py does not currently exercise the onnx backend branch (even if only to assert graceful fallback when optional deps/models aren’t available). Adding coverage here would help prevent regressions in the default-backend selection and fallback behavior.
No description provided.