Test by Madhan230205 · Pull Request #2 · Madhan230205/token-reducer

Madhan230205 · 2026-04-03T02:59:30Z

No description provided.

- Implemented ONNX Runtime CPU-optimized embedding backend - Added Reciprocal Rank Fusion (RRF) for blended sparse+dense retrieval - ONNX backend with graceful fallback to hash embeddings - RRF provides deterministic fusion of BM25 and vector search results - Added onnxruntime and tokenizers to optional dependencies - Configured RRF with k=60 (industry standard) - ONNX uses mean pooling and L2 normalization Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

qodo-code-review · 2026-04-03T02:59:47Z

Review Summary by Qodo

Add ONNX embedding backend and Reciprocal Rank Fusion

✨ Enhancement

Walkthroughs

Description

• Implemented ONNX Runtime CPU-optimized embedding backend with graceful fallback
• Added Reciprocal Rank Fusion (RRF) for deterministic hybrid retrieval fusion
• Changed default embedding backend from hash to ONNX with improved model
• Added ONNX session caching and RRF configuration options
• Updated optional dependencies with onnxruntime, tokenizers, huggingface_hub

Diagram

flowchart LR
  A["Retrieval Request"] --> B["FTS5/BM25 Results"]
  A --> C["Vector Search Results"]
  B --> D["RRF Fusion"]
  C --> D
  D --> E["Ranked Candidates"]
  F["ONNX Backend"] -.->|"Embeddings"| C
  G["Hash Backend"] -.->|"Fallback"| F

File Changes

1. scripts/token_reducer/config.py ⚙️ Configuration changes +33/-2

Add ONNX and RRF configuration defaults

• Changed default embedding backend from hash to onnx and model to
 sentence-transformers/all-MiniLM-L6-v2
• Added ONNX Runtime configuration constants: DEFAULT_ONNX_MODEL_PATH, DEFAULT_ONNX_MAX_LENGTH
• Added RRF configuration: DEFAULT_RRF_K=60 and DEFAULT_USE_RRF=True
• Added _ONNX_SESSION_CACHE for caching ONNX sessions and RRF configuration functions
• Implemented should_use_rrf(), get_rrf_k(), and configure_rrf() functions

scripts/token_reducer/config.py

2. scripts/token_reducer/embeddings.py ✨ Enhancement +140/-1

Implement ONNX embedding generation and session management

• Implemented get_onnx_session() to load and cache ONNX Runtime sessions with HuggingFace hub
 support
• Implemented embed_text_onnx() using attention-mask-weighted mean pooling and L2 normalization
• Added ONNX backend resolution in resolve_embedding_backend() with fallback to hash embeddings
• Added ONNX embedding path in embed_text() function with error handling and fallback
• Supports quantized (model_quantized.onnx) and full-precision ONNX models

scripts/token_reducer/embeddings.py

3. scripts/token_reducer/retriever.py ✨ Enhancement +61/-0

Add Reciprocal Rank Fusion for hybrid retrieval

• Implemented reciprocal_rank_fusion() function combining FTS5 and vector results using RRF
 scoring
• RRF score formula: sum(1 / (k + rank)) across retrieval systems with configurable k constant
• Modified rerank_candidates() to use RRF when enabled, with fallback to weighted scoring
• Merges candidate information from both retrieval systems maintaining vector rank and score

scripts/token_reducer/retriever.py

View more (2)

4. scripts/token_reducer/db.py 🐞 Bug fix +3/-2

Fix dependency indexing count tracking

• Fixed index_file_dependencies() to check cursor rowcount instead of always incrementing count
• Changed to capture cursor result from conn.execute() for accurate insertion tracking

scripts/token_reducer/db.py

5. requirements-optional.txt Dependencies +5/-0

Add ONNX and HuggingFace dependencies

• Added onnxruntime>=1.17.0 for fast CPU-based inference
• Added tokenizers>=0.15.0 for ONNX model tokenization
• Added huggingface_hub>=0.20.0 for downloading ONNX models from HuggingFace

requirements-optional.txt

qodo-code-review · 2026-04-03T02:59:48Z

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

1. RRF scores filtered out 🐞 Bug ≡ Correctness

Description

With RRF enabled, reciprocal_rank_fusion() assigns Candidate.final_score to an RRF value
(~1/(k+rank)), which is always far below the default relevance_floor=0.15, so
compress_candidates() stops immediately and returns no bullets/packet content when vector
retrieval is used.

Code

scripts/token_reducer/retriever.py[R369-388]

+    for chunk_id, candidate in candidates.items():
+        candidate.final_score = rrf_scores[chunk_id]
+
+    # Sort by RRF score descending
+    ranked = sorted(candidates.values(), key=lambda c: c.final_score, reverse=True)
+    return ranked
+
+
def rerank_candidates(
    query: str,
    fts_hits: list[Candidate],
    vector_hits: list[Candidate],
    top_k: int,
) -> tuple[list[Candidate], list[Candidate]]:
+    from .config import should_use_rrf
+
+    # Use RRF if enabled, otherwise fall back to weighted scoring
+    if should_use_rrf() and vector_hits:
+        ranked = reciprocal_rank_fusion(fts_hits, vector_hits)
+        return ranked[:top_k], ranked

Evidence
RRF assigns final_score to values around 1/(60+rank); even the best-case sum across two systems is
~0.033, but compression stops when final_score < relevance_floor and the default floor is 0.15, so
the loop breaks before producing output.
scripts/token_reducer/retriever.py[324-389]
scripts/token_reducer/compressor.py[232-263]
scripts/token_reducer/config.py[17-34]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
RRF ranking currently overwrites `Candidate.final_score` with very small reciprocal-rank values. Downstream compression uses `final_score` with a default `relevance_floor=0.15`, causing the compressor to break immediately and produce empty results when RRF is active.

## Issue Context
- RRF scores for k=60 are ~0.016 (rank=1) per system, far below the 0.15 floor.
- `compress_candidates()` assumes `final_score` is on the existing weighted-score scale.

## Fix Focus Areas
- scripts/token_reducer/retriever.py[324-429]
- scripts/token_reducer/compressor.py[232-263]
- scripts/token_reducer/config.py[17-34]

## What to change
Choose one consistent approach:
1) **Use RRF for ordering only**: compute an `rrf_score` for sorting, but still compute the existing weighted `final_score` (and overlap/fts/vector scores) before passing candidates to the compressor; or
2) **Adjust relevance floor for RRF**: when RRF is enabled, either disable the relevance-floor early-break or use a much lower floor appropriate to RRF (and ensure the floor is configurable);
3) If you keep RRF as `final_score`, **rescale/normalize** RRF scores to the same magnitude expected by `relevance_floor`.

Add/adjust a small unit-style check (if present in repo) or a lightweight assertion path to prevent returning an empty packet solely due to scoring-scale mismatch.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. RRF k config ignored 🐞 Bug ≡ Correctness

Description

configure_rrf(k=...) does not affect ranking because reciprocal_rank_fusion() only reads
get_rrf_k() when the caller passes k<=0, but rerank_candidates() always calls it with the
default k=60.

Code

scripts/token_reducer/retriever.py[R324-388]

+def reciprocal_rank_fusion(
+    fts_hits: list[Candidate],
+    vector_hits: list[Candidate],
+    k: int = 60,
+) -> list[Candidate]:
+    """Combine retrieval results using Reciprocal Rank Fusion (RRF).
+
+    RRF score = sum(1 / (k + rank)) across all retrieval systems.
+    This is a parameter-free, deterministic fusion method that works well
+    for combining BM25 and semantic search.
+
+    Args:
+        fts_hits: Candidates from FTS5/BM25 retrieval (already ranked)
+        vector_hits: Candidates from vector/semantic retrieval (already ranked)
+        k: RRF constant (typically 60). Higher values reduce top position impact.
+
+    Returns:
+        Merged and re-ranked candidates using RRF scores.
+    """
+    from .config import get_rrf_k
+
+    if k <= 0:
+        k = get_rrf_k()
+
+    rrf_scores: dict[int, float] = {}
+    candidates: dict[int, Candidate] = {}
+
+    # Add FTS5/BM25 ranks
+    for rank, candidate in enumerate(fts_hits, start=1):
+        rrf_scores[candidate.chunk_id] = 1.0 / (k + rank)
+        candidates[candidate.chunk_id] = candidate
+
+    # Add vector ranks
+    for rank, candidate in enumerate(vector_hits, start=1):
+        chunk_id = candidate.chunk_id
+        rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0.0) + (1.0 / (k + rank))
+
+        # Merge candidate info
+        if chunk_id in candidates:
+            candidates[chunk_id].vector_rank = candidate.vector_rank
+            candidates[chunk_id].vector_score = candidate.vector_score
+        else:
+            candidates[chunk_id] = candidate
+
+    # Assign final RRF scores
+    for chunk_id, candidate in candidates.items():
+        candidate.final_score = rrf_scores[chunk_id]
+
+    # Sort by RRF score descending
+    ranked = sorted(candidates.values(), key=lambda c: c.final_score, reverse=True)
+    return ranked
+
+
def rerank_candidates(
    query: str,
    fts_hits: list[Candidate],
    vector_hits: list[Candidate],
    top_k: int,
) -> tuple[list[Candidate], list[Candidate]]:
+    from .config import should_use_rrf
+
+    # Use RRF if enabled, otherwise fall back to weighted scoring
+    if should_use_rrf() and vector_hits:
+        ranked = reciprocal_rank_fusion(fts_hits, vector_hits)
+        return ranked[:top_k], ranked

Evidence
The RRF function signature defaults k to 60 and the config value is only used behind a k<=0
guard; the call site passes no k, so configuration is ignored on the normal path.
scripts/token_reducer/retriever.py[324-388]
scripts/token_reducer/config.py[111-126]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The configured RRF constant `k` (set via `configure_rrf()` / returned by `get_rrf_k()`) is not applied because the default call path always uses `k=60`.

## Issue Context
`reciprocal_rank_fusion()` only calls `get_rrf_k()` when `k <= 0`, but `rerank_candidates()` calls `reciprocal_rank_fusion(fts_hits, vector_hits)` without passing `k`.

## Fix Focus Areas
- scripts/token_reducer/retriever.py[324-389]
- scripts/token_reducer/config.py[111-126]

## What to change
- In `rerank_candidates()`, pass `k=get_rrf_k()` (and optionally make `reciprocal_rank_fusion()` default `k` be `-1` or `None` so config is used by default).
- Keep behavior deterministic and ensure `k` validation (must be >0) is centralized.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ONNX file path broken 🐞 Bug ≡ Correctness

Description

The config/docs state the ONNX model path can be a local file, but get_onnx_session() treats any
non-HF path as a directory and only searches for onnx/model*.onnx under it, so passing a direct
.onnx file path will fail with “No ONNX model file found locally”.

Code

scripts/token_reducer/embeddings.py[R112-132]

+    except Exception as hf_err:
+        model_dir = Path(model_path)
+        if not model_dir.exists():
+            raise RuntimeError(
+                f"ONNX model not found at '{model_path}'. "
+                "Provide a valid HuggingFace model ID or local directory path."
+            ) from hf_err
+
+        for candidate in _ONNX_CANDIDATE_FILENAMES:
+            p = model_dir / candidate
+            if p.exists():
+                onnx_path = str(p)
+                break
+
+        if onnx_path is None:
+            raise RuntimeError(
+                f"No ONNX model file found locally in '{model_path}'"
+            ) from hf_err
+
+        tokenizer_path = str(model_dir / "tokenizer.json")
+

Evidence

Config comments claim local file support, but the local-path fallback uses Path(model_path) and
constructs model_dir / candidate paths, which is only correct when model_path is a directory
(not a file).

scripts/token_reducer/config.py[24-27]
scripts/token_reducer/embeddings.py[75-137]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`get_onnx_session()` does not support `model_path` pointing directly to a `.onnx` file, despite configuration comments indicating that a local file path is acceptable.

## Issue Context
The local fallback path assumes `model_path` is a directory and searches for fixed relative candidates; if a user supplies `/path/to/model.onnx`, the code searches `/path/to/model.onnx/onnx/model.onnx` and fails.

## Fix Focus Areas
- scripts/token_reducer/embeddings.py[75-137]
- scripts/token_reducer/config.py[24-27]

## What to change
- If `Path(model_path).is_file()`: treat it as the ONNX model file directly.
- Define how to locate `tokenizer.json` for the file-path case (e.g., sibling file next to the model, or allow an explicit tokenizer path).
- Alternatively (or additionally), update the config comments/docs to say "local directory" if file support is not intended.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. CLI help omits onnx 🐞 Bug ⚙ Maintainability

Description

The CLI --embedding-backend help text still advertises only "hash | ml" even though the PR adds
and defaults to the onnx backend, making --help misleading.

Code

scripts/token_reducer/config.py[R17-18]

+DEFAULT_EMBEDDING_BACKEND = "onnx"
+DEFAULT_EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"

Evidence
The default embedding backend is now onnx, but the CLI option help string does not include onnx,
so users may not discover or trust the new backend option from the CLI interface.
scripts/token_reducer/config.py[15-19]
scripts/token_reducer/cli.py[69-80]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
CLI help text for `--embedding-backend` is outdated and does not mention `onnx`, despite `onnx` being supported and the default.

## Fix Focus Areas
- scripts/token_reducer/cli.py[69-80]
- scripts/token_reducer/config.py[15-19]

## What to change
- Update the help string to include `onnx` (e.g., `help="hash | ml | onnx"`).
- Optionally add a short note that `onnx` requires optional dependencies.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

qodo-code-review · 2026-04-03T02:59:52Z

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: Lint (ruff)
Failed stage: Run ruff format --check scripts/ tests/ [❌]
Failed test name: ""
Failure summary: The action failed at the formatting check step `ruff format --check scripts/ tests/`. `ruff` reported that `scripts/token_reducer/embeddings.py` is not properly formatted (`Would reformat:` `scripts/token_reducer/embeddings.py`), so the `--check` run exited with code `1`.
Relevant error logs: 1: ##[group]Runner Image Provisioner 2: Hosted Compute Agent ... 151: ##[endgroup] 152: All checks passed! 153: ##[group]Run ruff format --check scripts/ tests/ 154: �[36;1mruff format --check scripts/ tests/�[0m 155: shell: /usr/bin/bash -e {0} 156: env: 157: pythonLocation: /opt/hostedtoolcache/Python/3.11.15/x64 158: PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.11.15/x64/lib/pkgconfig 159: Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.15/x64 160: Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.15/x64 161: Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.15/x64 162: LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.11.15/x64/lib 163: ##[endgroup] 164: Would reformat: scripts/token_reducer/embeddings.py 165: 1 file would be reformatted, 16 files already formatted 166: ##[error]Process completed with exit code 1. 167: Post job cleanup.

qodo-code-review · 2026-04-03T03:03:11Z

+    for chunk_id, candidate in candidates.items():
+        candidate.final_score = rrf_scores[chunk_id]
+
+    # Sort by RRF score descending
+    ranked = sorted(candidates.values(), key=lambda c: c.final_score, reverse=True)
+    return ranked
+
+
 def rerank_candidates(
    query: str,
    fts_hits: list[Candidate],
    vector_hits: list[Candidate],
    top_k: int,
 ) -> tuple[list[Candidate], list[Candidate]]:
+    from .config import should_use_rrf
+
+    # Use RRF if enabled, otherwise fall back to weighted scoring
+    if should_use_rrf() and vector_hits:
+        ranked = reciprocal_rank_fusion(fts_hits, vector_hits)
+        return ranked[:top_k], ranked


1. Rrf scores filtered out 🐞 Bug ≡ Correctness

With RRF enabled, reciprocal_rank_fusion() assigns Candidate.final_score to an RRF value (~1/(k+rank)), which is always far below the default relevance_floor=0.15, so compress_candidates() stops immediately and returns no bullets/packet content when vector retrieval is used.

Agent Prompt

## Issue description RRF ranking currently overwrites `Candidate.final_score` with very small reciprocal-rank values. Downstream compression uses `final_score` with a default `relevance_floor=0.15`, causing the compressor to break immediately and produce empty results when RRF is active. ## Issue Context - RRF scores for k=60 are ~0.016 (rank=1) per system, far below the 0.15 floor. - `compress_candidates()` assumes `final_score` is on the existing weighted-score scale. ## Fix Focus Areas - scripts/token_reducer/retriever.py[324-429] - scripts/token_reducer/compressor.py[232-263] - scripts/token_reducer/config.py[17-34] ## What to change Choose one consistent approach: 1) **Use RRF for ordering only**: compute an `rrf_score` for sorting, but still compute the existing weighted `final_score` (and overlap/fts/vector scores) before passing candidates to the compressor; or 2) **Adjust relevance floor for RRF**: when RRF is enabled, either disable the relevance-floor early-break or use a much lower floor appropriate to RRF (and ensure the floor is configurable); 3) If you keep RRF as `final_score`, **rescale/normalize** RRF scores to the same magnitude expected by `relevance_floor`. Add/adjust a small unit-style check (if present in repo) or a lightweight assertion path to prevent returning an empty packet solely due to scoring-scale mismatch.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copilot

Pull request overview

This PR enhances the token_reducer retrieval pipeline by adding Reciprocal Rank Fusion (RRF) for hybrid reranking and introducing an ONNX Runtime embedding backend (with caching), along with related configuration and optional dependencies.

Changes:

Add RRF-based fusion for combining FTS/BM25 and vector retrieval results.
Add ONNX Runtime embedding backend with session/tokenizer caching and hub/local model resolution.
Update defaults/config to enable ONNX + RRF and adjust optional requirements.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
scripts/token_reducer/retriever.py	Adds RRF fusion and routes reranking through it when enabled.
scripts/token_reducer/embeddings.py	Adds ONNX session loading/caching and ONNX embedding generation + backend dispatch support.
scripts/token_reducer/db.py	Fixes dependency indexing count to reflect actual inserted rows (INSERT OR IGNORE).
scripts/token_reducer/config.py	Adds ONNX/RRF defaults and runtime configuration helpers/caches.
requirements-optional.txt	Adds optional ONNX-related dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-03T03:04:31Z

+    from .config import get_rrf_k
+
+    if k <= 0:
+        k = get_rrf_k()
+


reciprocal_rank_fusion() imports get_rrf_k(), but with the current defaults k is always 60 and get_rrf_k() is never used (since rerank_candidates() calls reciprocal_rank_fusion(...) without passing k). This makes configure_rrf(k=...) ineffective. Consider defaulting k from get_rrf_k() when k is None/omitted, or explicitly passing k=get_rrf_k() from rerank_candidates().

Copilot · 2026-04-03T03:04:31Z

+
+    try:
+        from huggingface_hub import hf_hub_download  # type: ignore
+
+        for candidate in _ONNX_CANDIDATE_FILENAMES:
+            try:
+                onnx_path = hf_hub_download(repo_id=model_path, filename=candidate)
+                break
+            except Exception:
+                continue
+
+        if onnx_path is None:
+            raise RuntimeError(f"No ONNX model file found in HuggingFace repo '{model_path}'")
+
+        tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json")
+
+    except Exception as hf_err:
+        model_dir = Path(model_path)
+        if not model_dir.exists():
+            raise RuntimeError(
+                f"ONNX model not found at '{model_path}'. "
+                "Provide a valid HuggingFace model ID or local directory path."
+            ) from hf_err
+
+        for candidate in _ONNX_CANDIDATE_FILENAMES:
+            p = model_dir / candidate
+            if p.exists():
+                onnx_path = str(p)
+                break
+
+        if onnx_path is None:
+            raise RuntimeError(
+                f"No ONNX model file found locally in '{model_path}'"
+            ) from hf_err
+
+        tokenizer_path = str(model_dir / "tokenizer.json")
+


The HuggingFace download try covers both the ONNX model and tokenizer.json, but any failure (including a missing tokenizer.json in the repo, a transient HF error, or ImportError for huggingface_hub) falls into the local-path fallback branch. This can produce misleading errors (treating a valid HF model ID as a missing local path) and can discard a successfully downloaded onnx_path. Consider handling ImportError for huggingface_hub separately, and distinguishing “repo exists but required files missing” from “local path missing”.

Suggested change

try:

from huggingface_hub import hf_hub_download # type: ignore

for candidate in _ONNX_CANDIDATE_FILENAMES:

try:

onnx_path = hf_hub_download(repo_id=model_path, filename=candidate)

break

except Exception:

continue

if onnx_path is None:

raise RuntimeError(f"No ONNX model file found in HuggingFace repo '{model_path}'")

tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json")

except Exception as hf_err:

model_dir = Path(model_path)

if not model_dir.exists():

raise RuntimeError(

f"ONNX model not found at '{model_path}'. "

"Provide a valid HuggingFace model ID or local directory path."

) from hf_err

for candidate in _ONNX_CANDIDATE_FILENAMES:

p = model_dir / candidate

if p.exists():

onnx_path = str(p)

break

if onnx_path is None:

raise RuntimeError(

f"No ONNX model file found locally in '{model_path}'"

) from hf_err

tokenizer_path = str(model_dir / "tokenizer.json")

hf_import_err: Exception | None = None

hf_onnx_err: Exception | None = None

try:

from huggingface_hub import hf_hub_download # type: ignore

except ImportError as exc:

hf_hub_download = None

hf_import_err = exc

if hf_hub_download is not None:

for candidate in _ONNX_CANDIDATE_FILENAMES:

try:

onnx_path = hf_hub_download(repo_id=model_path, filename=candidate)

break

except Exception as exc:

hf_onnx_err = exc

continue

if onnx_path is not None:

try:

tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json")

except Exception as exc:

raise RuntimeError(

f"Found ONNX model in HuggingFace repo '{model_path}', "

"but required file 'tokenizer.json' is missing or could not be downloaded."

) from exc

if onnx_path is None:

model_dir = Path(model_path)

if model_dir.exists():

for candidate in _ONNX_CANDIDATE_FILENAMES:

p = model_dir / candidate

if p.exists():

onnx_path = str(p)

break

if onnx_path is None:

raise RuntimeError(

f"No ONNX model file found locally in '{model_path}'"

) from hf_onnx_err

tokenizer_path = str(model_dir / "tokenizer.json")

elif hf_import_err is not None:

raise RuntimeError(

"ONNX model not found locally and huggingface_hub is not installed. "

"Provide a local directory path or install huggingface_hub to load a HuggingFace model ID."

) from hf_import_err

elif hf_onnx_err is not None:

raise RuntimeError(

f"Unable to download an ONNX model file from HuggingFace repo '{model_path}', "

"and no local directory exists at that path."

) from hf_onnx_err

else:

raise RuntimeError(

f"ONNX model not found at '{model_path}'. "

"Provide a valid HuggingFace model ID or local directory path."

)

Copilot · 2026-04-03T03:04:31Z

+        tokenizer_path = str(model_dir / "tokenizer.json")
+


In the local-path fallback, tokenizer_path is set to <model_dir>/tokenizer.json but its existence is not checked before calling Tokenizer.from_file(). If the tokenizer file is missing, the error will be a low-level file error rather than a clear message about the required file layout. Consider validating tokenizer_path exists and raising a RuntimeError with actionable guidance.

Suggested change

tokenizer_path = str(model_dir / "tokenizer.json")

local_tokenizer_path = model_dir / "tokenizer.json"

if not local_tokenizer_path.exists():

raise RuntimeError(

f"Missing tokenizer file in local ONNX model directory '{model_path}'. "

"Expected 'tokenizer.json' alongside the ONNX model files "

f"({', '.join(_ONNX_CANDIDATE_FILENAMES)}). "

"Provide a valid HuggingFace model ID or a local directory with the required files."

) from hf_err

tokenizer_path = str(local_tokenizer_path)

Copilot · 2026-04-03T03:04:31Z

+# Model path can be local file or HuggingFace hub model ID
+DEFAULT_ONNX_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2"


DEFAULT_ONNX_MODEL_PATH is introduced but (based on a repo-wide search) is not referenced anywhere else. If the intended default is already covered by DEFAULT_EMBEDDING_MODEL, consider removing this constant or wiring it into the CLI/config flow so it doesn’t drift unused.

Suggested change

# Model path can be local file or HuggingFace hub model ID

DEFAULT_ONNX_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2"

# The default model identifier is provided by DEFAULT_EMBEDDING_MODEL.

Copilot · 2026-04-03T03:04:32Z

@@ -74,6 +192,17 @@ def resolve_embedding_backend(
    if backend == "hash":
        return "hash", None

+    if backend == "onnx":
+        try:
+            get_onnx_session(requested_model)
+            return "onnx", requested_model
+        except Exception as exc:
+            print(
+                f"[warn] ONNX embedding backend unavailable ({exc}). Falling back to hash embeddings.",
+                file=sys.stderr,
+            )
+            return "hash", None
+


New ONNX support was added to resolve_embedding_backend() / embed_text(), but tests/test_embeddings.py does not currently exercise the onnx backend branch (even if only to assert graceful fallback when optional deps/models aren’t available). Adding coverage here would help prevent regressions in the default-backend selection and fallback behavior.

Madhan230205 and others added 2 commits April 3, 2026 08:24

Architectural change

7cb3231

Copilot AI review requested due to automatic review settings April 3, 2026 02:59

Copilot started reviewing on behalf of Madhan230205 April 3, 2026 03:00 View session

Madhan230205 merged commit 1db0f40 into main Apr 3, 2026
4 of 5 checks passed

qodo-code-review Bot reviewed Apr 3, 2026

View reviewed changes

Copilot AI reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test#2

Test#2
Madhan230205 merged 2 commits into
mainfrom
test

Madhan230205 commented Apr 3, 2026

Uh oh!

qodo-code-review Bot commented Apr 3, 2026

Uh oh!

qodo-code-review Bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented Apr 3, 2026

Uh oh!

Uh oh!

qodo-code-review Bot Apr 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Copilot AI Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        tokenizer_path = str(model_dir / "tokenizer.json")
+        local_tokenizer_path = model_dir / "tokenizer.json"
+        if not local_tokenizer_path.exists():
+            raise RuntimeError(
+                f"Missing tokenizer file in local ONNX model directory '{model_path}'. "
+                "Expected 'tokenizer.json' alongside the ONNX model files "
+                f"({', '.join(_ONNX_CANDIDATE_FILENAMES)}). "
+                "Provide a valid HuggingFace model ID or a local directory with the required files."
+            ) from hf_err
+        tokenizer_path = str(local_tokenizer_path)

		# Model path can be local file or HuggingFace hub model ID
		DEFAULT_ONNX_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2"

	# Model path can be local file or HuggingFace hub model ID
	DEFAULT_ONNX_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2"
	# The default model identifier is provided by DEFAULT_EMBEDDING_MODEL.

Conversation

Madhan230205 commented Apr 3, 2026

Uh oh!

qodo-code-review Bot commented Apr 3, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

qodo-code-review Bot commented Apr 3, 2026

CI Feedback 🧐

Uh oh!

Uh oh!

qodo-code-review Bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qodo-code-review Bot commented Apr 3, 2026 •

edited

Loading