Skip to content

Test#2

Merged
Madhan230205 merged 2 commits into
mainfrom
test
Apr 3, 2026
Merged

Test#2
Madhan230205 merged 2 commits into
mainfrom
test

Conversation

@Madhan230205

Copy link
Copy Markdown
Owner

No description provided.

Madhan230205 and others added 2 commits April 3, 2026 08:24
- Implemented ONNX Runtime CPU-optimized embedding backend
- Added Reciprocal Rank Fusion (RRF) for blended sparse+dense retrieval
- ONNX backend with graceful fallback to hash embeddings
- RRF provides deterministic fusion of BM25 and vector search results
- Added onnxruntime and tokenizers to optional dependencies
- Configured RRF with k=60 (industry standard)
- ONNX uses mean pooling and L2 normalization

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 3, 2026 02:59
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Add ONNX embedding backend and Reciprocal Rank Fusion

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implemented ONNX Runtime CPU-optimized embedding backend with graceful fallback
• Added Reciprocal Rank Fusion (RRF) for deterministic hybrid retrieval fusion
• Changed default embedding backend from hash to ONNX with improved model
• Added ONNX session caching and RRF configuration options
• Updated optional dependencies with onnxruntime, tokenizers, huggingface_hub
Diagram
flowchart LR
  A["Retrieval Request"] --> B["FTS5/BM25 Results"]
  A --> C["Vector Search Results"]
  B --> D["RRF Fusion"]
  C --> D
  D --> E["Ranked Candidates"]
  F["ONNX Backend"] -.->|"Embeddings"| C
  G["Hash Backend"] -.->|"Fallback"| F
Loading

Grey Divider

File Changes

1. scripts/token_reducer/config.py ⚙️ Configuration changes +33/-2

Add ONNX and RRF configuration defaults

• Changed default embedding backend from hash to onnx and model to
 sentence-transformers/all-MiniLM-L6-v2
• Added ONNX Runtime configuration constants: DEFAULT_ONNX_MODEL_PATH, DEFAULT_ONNX_MAX_LENGTH
• Added RRF configuration: DEFAULT_RRF_K=60 and DEFAULT_USE_RRF=True
• Added _ONNX_SESSION_CACHE for caching ONNX sessions and RRF configuration functions
• Implemented should_use_rrf(), get_rrf_k(), and configure_rrf() functions

scripts/token_reducer/config.py


2. scripts/token_reducer/embeddings.py ✨ Enhancement +140/-1

Implement ONNX embedding generation and session management

• Implemented get_onnx_session() to load and cache ONNX Runtime sessions with HuggingFace hub
 support
• Implemented embed_text_onnx() using attention-mask-weighted mean pooling and L2 normalization
• Added ONNX backend resolution in resolve_embedding_backend() with fallback to hash embeddings
• Added ONNX embedding path in embed_text() function with error handling and fallback
• Supports quantized (model_quantized.onnx) and full-precision ONNX models

scripts/token_reducer/embeddings.py


3. scripts/token_reducer/retriever.py ✨ Enhancement +61/-0

Add Reciprocal Rank Fusion for hybrid retrieval

• Implemented reciprocal_rank_fusion() function combining FTS5 and vector results using RRF
 scoring
• RRF score formula: sum(1 / (k + rank)) across retrieval systems with configurable k constant
• Modified rerank_candidates() to use RRF when enabled, with fallback to weighted scoring
• Merges candidate information from both retrieval systems maintaining vector rank and score

scripts/token_reducer/retriever.py


View more (2)
4. scripts/token_reducer/db.py 🐞 Bug fix +3/-2

Fix dependency indexing count tracking

• Fixed index_file_dependencies() to check cursor rowcount instead of always incrementing count
• Changed to capture cursor result from conn.execute() for accurate insertion tracking

scripts/token_reducer/db.py


5. requirements-optional.txt Dependencies +5/-0

Add ONNX and HuggingFace dependencies

• Added onnxruntime>=1.17.0 for fast CPU-based inference
• Added tokenizers>=0.15.0 for ONNX model tokenization
• Added huggingface_hub>=0.20.0 for downloading ONNX models from HuggingFace

requirements-optional.txt


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Apr 3, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

Grey Divider


Action required

1. RRF scores filtered out 🐞 Bug ≡ Correctness
Description
With RRF enabled, reciprocal_rank_fusion() assigns Candidate.final_score to an RRF value
(~1/(k+rank)), which is always far below the default relevance_floor=0.15, so
compress_candidates() stops immediately and returns no bullets/packet content when vector
retrieval is used.
Code

scripts/token_reducer/retriever.py[R369-388]

+    for chunk_id, candidate in candidates.items():
+        candidate.final_score = rrf_scores[chunk_id]
+
+    # Sort by RRF score descending
+    ranked = sorted(candidates.values(), key=lambda c: c.final_score, reverse=True)
+    return ranked
+
+
def rerank_candidates(
    query: str,
    fts_hits: list[Candidate],
    vector_hits: list[Candidate],
    top_k: int,
) -> tuple[list[Candidate], list[Candidate]]:
+    from .config import should_use_rrf
+
+    # Use RRF if enabled, otherwise fall back to weighted scoring
+    if should_use_rrf() and vector_hits:
+        ranked = reciprocal_rank_fusion(fts_hits, vector_hits)
+        return ranked[:top_k], ranked
Evidence
RRF assigns final_score to values around 1/(60+rank); even the best-case sum across two systems is
~0.033, but compression stops when final_score < relevance_floor and the default floor is 0.15, so
the loop breaks before producing output.

scripts/token_reducer/retriever.py[324-389]
scripts/token_reducer/compressor.py[232-263]
scripts/token_reducer/config.py[17-34]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
RRF ranking currently overwrites `Candidate.final_score` with very small reciprocal-rank values. Downstream compression uses `final_score` with a default `relevance_floor=0.15`, causing the compressor to break immediately and produce empty results when RRF is active.

## Issue Context
- RRF scores for k=60 are ~0.016 (rank=1) per system, far below the 0.15 floor.
- `compress_candidates()` assumes `final_score` is on the existing weighted-score scale.

## Fix Focus Areas
- scripts/token_reducer/retriever.py[324-429]
- scripts/token_reducer/compressor.py[232-263]
- scripts/token_reducer/config.py[17-34]

## What to change
Choose one consistent approach:
1) **Use RRF for ordering only**: compute an `rrf_score` for sorting, but still compute the existing weighted `final_score` (and overlap/fts/vector scores) before passing candidates to the compressor; or
2) **Adjust relevance floor for RRF**: when RRF is enabled, either disable the relevance-floor early-break or use a much lower floor appropriate to RRF (and ensure the floor is configurable);
3) If you keep RRF as `final_score`, **rescale/normalize** RRF scores to the same magnitude expected by `relevance_floor`.

Add/adjust a small unit-style check (if present in repo) or a lightweight assertion path to prevent returning an empty packet solely due to scoring-scale mismatch.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. RRF k config ignored 🐞 Bug ≡ Correctness
Description
configure_rrf(k=...) does not affect ranking because reciprocal_rank_fusion() only reads
get_rrf_k() when the caller passes k<=0, but rerank_candidates() always calls it with the
default k=60.
Code

scripts/token_reducer/retriever.py[R324-388]

+def reciprocal_rank_fusion(
+    fts_hits: list[Candidate],
+    vector_hits: list[Candidate],
+    k: int = 60,
+) -> list[Candidate]:
+    """Combine retrieval results using Reciprocal Rank Fusion (RRF).
+
+    RRF score = sum(1 / (k + rank)) across all retrieval systems.
+    This is a parameter-free, deterministic fusion method that works well
+    for combining BM25 and semantic search.
+
+    Args:
+        fts_hits: Candidates from FTS5/BM25 retrieval (already ranked)
+        vector_hits: Candidates from vector/semantic retrieval (already ranked)
+        k: RRF constant (typically 60). Higher values reduce top position impact.
+
+    Returns:
+        Merged and re-ranked candidates using RRF scores.
+    """
+    from .config import get_rrf_k
+
+    if k <= 0:
+        k = get_rrf_k()
+
+    rrf_scores: dict[int, float] = {}
+    candidates: dict[int, Candidate] = {}
+
+    # Add FTS5/BM25 ranks
+    for rank, candidate in enumerate(fts_hits, start=1):
+        rrf_scores[candidate.chunk_id] = 1.0 / (k + rank)
+        candidates[candidate.chunk_id] = candidate
+
+    # Add vector ranks
+    for rank, candidate in enumerate(vector_hits, start=1):
+        chunk_id = candidate.chunk_id
+        rrf_scores[chunk_id] = rrf_scores.get(chunk_id, 0.0) + (1.0 / (k + rank))
+
+        # Merge candidate info
+        if chunk_id in candidates:
+            candidates[chunk_id].vector_rank = candidate.vector_rank
+            candidates[chunk_id].vector_score = candidate.vector_score
+        else:
+            candidates[chunk_id] = candidate
+
+    # Assign final RRF scores
+    for chunk_id, candidate in candidates.items():
+        candidate.final_score = rrf_scores[chunk_id]
+
+    # Sort by RRF score descending
+    ranked = sorted(candidates.values(), key=lambda c: c.final_score, reverse=True)
+    return ranked
+
+
def rerank_candidates(
    query: str,
    fts_hits: list[Candidate],
    vector_hits: list[Candidate],
    top_k: int,
) -> tuple[list[Candidate], list[Candidate]]:
+    from .config import should_use_rrf
+
+    # Use RRF if enabled, otherwise fall back to weighted scoring
+    if should_use_rrf() and vector_hits:
+        ranked = reciprocal_rank_fusion(fts_hits, vector_hits)
+        return ranked[:top_k], ranked
Evidence
The RRF function signature defaults k to 60 and the config value is only used behind a k<=0
guard; the call site passes no k, so configuration is ignored on the normal path.

scripts/token_reducer/retriever.py[324-388]
scripts/token_reducer/config.py[111-126]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The configured RRF constant `k` (set via `configure_rrf()` / returned by `get_rrf_k()`) is not applied because the default call path always uses `k=60`.

## Issue Context
`reciprocal_rank_fusion()` only calls `get_rrf_k()` when `k <= 0`, but `rerank_candidates()` calls `reciprocal_rank_fusion(fts_hits, vector_hits)` without passing `k`.

## Fix Focus Areas
- scripts/token_reducer/retriever.py[324-389]
- scripts/token_reducer/config.py[111-126]

## What to change
- In `rerank_candidates()`, pass `k=get_rrf_k()` (and optionally make `reciprocal_rank_fusion()` default `k` be `-1` or `None` so config is used by default).
- Keep behavior deterministic and ensure `k` validation (must be >0) is centralized.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. ONNX file path broken 🐞 Bug ≡ Correctness
Description
The config/docs state the ONNX model path can be a local file, but get_onnx_session() treats any
non-HF path as a directory and only searches for onnx/model*.onnx under it, so passing a direct
.onnx file path will fail with “No ONNX model file found locally”.
Code

scripts/token_reducer/embeddings.py[R112-132]

+    except Exception as hf_err:
+        model_dir = Path(model_path)
+        if not model_dir.exists():
+            raise RuntimeError(
+                f"ONNX model not found at '{model_path}'. "
+                "Provide a valid HuggingFace model ID or local directory path."
+            ) from hf_err
+
+        for candidate in _ONNX_CANDIDATE_FILENAMES:
+            p = model_dir / candidate
+            if p.exists():
+                onnx_path = str(p)
+                break
+
+        if onnx_path is None:
+            raise RuntimeError(
+                f"No ONNX model file found locally in '{model_path}'"
+            ) from hf_err
+
+        tokenizer_path = str(model_dir / "tokenizer.json")
+
Evidence
Config comments claim local file support, but the local-path fallback uses Path(model_path) and
constructs model_dir / candidate paths, which is only correct when model_path is a directory
(not a file).

scripts/token_reducer/config.py[24-27]
scripts/token_reducer/embeddings.py[75-137]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`get_onnx_session()` does not support `model_path` pointing directly to a `.onnx` file, despite configuration comments indicating that a local file path is acceptable.

## Issue Context
The local fallback path assumes `model_path` is a directory and searches for fixed relative candidates; if a user supplies `/path/to/model.onnx`, the code searches `/path/to/model.onnx/onnx/model.onnx` and fails.

## Fix Focus Areas
- scripts/token_reducer/embeddings.py[75-137]
- scripts/token_reducer/config.py[24-27]

## What to change
- If `Path(model_path).is_file()`: treat it as the ONNX model file directly.
- Define how to locate `tokenizer.json` for the file-path case (e.g., sibling file next to the model, or allow an explicit tokenizer path).
- Alternatively (or additionally), update the config comments/docs to say "local directory" if file support is not intended.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Advisory comments

4. CLI help omits onnx 🐞 Bug ⚙ Maintainability
Description
The CLI --embedding-backend help text still advertises only "hash | ml" even though the PR adds
and defaults to the onnx backend, making --help misleading.
Code

scripts/token_reducer/config.py[R17-18]

+DEFAULT_EMBEDDING_BACKEND = "onnx"
+DEFAULT_EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
Evidence
The default embedding backend is now onnx, but the CLI option help string does not include onnx,
so users may not discover or trust the new backend option from the CLI interface.

scripts/token_reducer/config.py[15-19]
scripts/token_reducer/cli.py[69-80]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
CLI help text for `--embedding-backend` is outdated and does not mention `onnx`, despite `onnx` being supported and the default.

## Fix Focus Areas
- scripts/token_reducer/cli.py[69-80]
- scripts/token_reducer/config.py[15-19]

## What to change
- Update the help string to include `onnx` (e.g., `help="hash | ml | onnx"`).
- Optionally add a short note that `onnx` requires optional dependencies.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@qodo-code-review

Copy link
Copy Markdown

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: Lint (ruff)

Failed stage: Run ruff format --check scripts/ tests/ [❌]

Failed test name: ""

Failure summary:

The action failed at the formatting check step ruff format --check scripts/ tests/.
ruff reported
that scripts/token_reducer/embeddings.py is not properly formatted (Would reformat:
scripts/token_reducer/embeddings.py), so the --check run exited with code 1.

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

151:  ##[endgroup]
152:  All checks passed!
153:  ##[group]Run ruff format --check scripts/ tests/
154:  �[36;1mruff format --check scripts/ tests/�[0m
155:  shell: /usr/bin/bash -e {0}
156:  env:
157:  pythonLocation: /opt/hostedtoolcache/Python/3.11.15/x64
158:  PKG_CONFIG_PATH: /opt/hostedtoolcache/Python/3.11.15/x64/lib/pkgconfig
159:  Python_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.15/x64
160:  Python2_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.15/x64
161:  Python3_ROOT_DIR: /opt/hostedtoolcache/Python/3.11.15/x64
162:  LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.11.15/x64/lib
163:  ##[endgroup]
164:  Would reformat: scripts/token_reducer/embeddings.py
165:  1 file would be reformatted, 16 files already formatted
166:  ##[error]Process completed with exit code 1.
167:  Post job cleanup.

@Madhan230205 Madhan230205 merged commit 1db0f40 into main Apr 3, 2026
4 of 5 checks passed
Comment on lines +369 to +388
for chunk_id, candidate in candidates.items():
candidate.final_score = rrf_scores[chunk_id]

# Sort by RRF score descending
ranked = sorted(candidates.values(), key=lambda c: c.final_score, reverse=True)
return ranked


def rerank_candidates(
query: str,
fts_hits: list[Candidate],
vector_hits: list[Candidate],
top_k: int,
) -> tuple[list[Candidate], list[Candidate]]:
from .config import should_use_rrf

# Use RRF if enabled, otherwise fall back to weighted scoring
if should_use_rrf() and vector_hits:
ranked = reciprocal_rank_fusion(fts_hits, vector_hits)
return ranked[:top_k], ranked

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Rrf scores filtered out 🐞 Bug ≡ Correctness

With RRF enabled, reciprocal_rank_fusion() assigns Candidate.final_score to an RRF value
(~1/(k+rank)), which is always far below the default relevance_floor=0.15, so
compress_candidates() stops immediately and returns no bullets/packet content when vector
retrieval is used.
Agent Prompt
## Issue description
RRF ranking currently overwrites `Candidate.final_score` with very small reciprocal-rank values. Downstream compression uses `final_score` with a default `relevance_floor=0.15`, causing the compressor to break immediately and produce empty results when RRF is active.

## Issue Context
- RRF scores for k=60 are ~0.016 (rank=1) per system, far below the 0.15 floor.
- `compress_candidates()` assumes `final_score` is on the existing weighted-score scale.

## Fix Focus Areas
- scripts/token_reducer/retriever.py[324-429]
- scripts/token_reducer/compressor.py[232-263]
- scripts/token_reducer/config.py[17-34]

## What to change
Choose one consistent approach:
1) **Use RRF for ordering only**: compute an `rrf_score` for sorting, but still compute the existing weighted `final_score` (and overlap/fts/vector scores) before passing candidates to the compressor; or
2) **Adjust relevance floor for RRF**: when RRF is enabled, either disable the relevance-floor early-break or use a much lower floor appropriate to RRF (and ensure the floor is configurable);
3) If you keep RRF as `final_score`, **rescale/normalize** RRF scores to the same magnitude expected by `relevance_floor`.

Add/adjust a small unit-style check (if present in repo) or a lightweight assertion path to prevent returning an empty packet solely due to scoring-scale mismatch.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the token_reducer retrieval pipeline by adding Reciprocal Rank Fusion (RRF) for hybrid reranking and introducing an ONNX Runtime embedding backend (with caching), along with related configuration and optional dependencies.

Changes:

  • Add RRF-based fusion for combining FTS/BM25 and vector retrieval results.
  • Add ONNX Runtime embedding backend with session/tokenizer caching and hub/local model resolution.
  • Update defaults/config to enable ONNX + RRF and adjust optional requirements.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scripts/token_reducer/retriever.py Adds RRF fusion and routes reranking through it when enabled.
scripts/token_reducer/embeddings.py Adds ONNX session loading/caching and ONNX embedding generation + backend dispatch support.
scripts/token_reducer/db.py Fixes dependency indexing count to reflect actual inserted rows (INSERT OR IGNORE).
scripts/token_reducer/config.py Adds ONNX/RRF defaults and runtime configuration helpers/caches.
requirements-optional.txt Adds optional ONNX-related dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +343 to +347
from .config import get_rrf_k

if k <= 0:
k = get_rrf_k()

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reciprocal_rank_fusion() imports get_rrf_k(), but with the current defaults k is always 60 and get_rrf_k() is never used (since rerank_candidates() calls reciprocal_rank_fusion(...) without passing k). This makes configure_rrf(k=...) ineffective. Consider defaulting k from get_rrf_k() when k is None/omitted, or explicitly passing k=get_rrf_k() from rerank_candidates().

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +132

try:
from huggingface_hub import hf_hub_download # type: ignore

for candidate in _ONNX_CANDIDATE_FILENAMES:
try:
onnx_path = hf_hub_download(repo_id=model_path, filename=candidate)
break
except Exception:
continue

if onnx_path is None:
raise RuntimeError(f"No ONNX model file found in HuggingFace repo '{model_path}'")

tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json")

except Exception as hf_err:
model_dir = Path(model_path)
if not model_dir.exists():
raise RuntimeError(
f"ONNX model not found at '{model_path}'. "
"Provide a valid HuggingFace model ID or local directory path."
) from hf_err

for candidate in _ONNX_CANDIDATE_FILENAMES:
p = model_dir / candidate
if p.exists():
onnx_path = str(p)
break

if onnx_path is None:
raise RuntimeError(
f"No ONNX model file found locally in '{model_path}'"
) from hf_err

tokenizer_path = str(model_dir / "tokenizer.json")

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HuggingFace download try covers both the ONNX model and tokenizer.json, but any failure (including a missing tokenizer.json in the repo, a transient HF error, or ImportError for huggingface_hub) falls into the local-path fallback branch. This can produce misleading errors (treating a valid HF model ID as a missing local path) and can discard a successfully downloaded onnx_path. Consider handling ImportError for huggingface_hub separately, and distinguishing “repo exists but required files missing” from “local path missing”.

Suggested change
try:
from huggingface_hub import hf_hub_download # type: ignore
for candidate in _ONNX_CANDIDATE_FILENAMES:
try:
onnx_path = hf_hub_download(repo_id=model_path, filename=candidate)
break
except Exception:
continue
if onnx_path is None:
raise RuntimeError(f"No ONNX model file found in HuggingFace repo '{model_path}'")
tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json")
except Exception as hf_err:
model_dir = Path(model_path)
if not model_dir.exists():
raise RuntimeError(
f"ONNX model not found at '{model_path}'. "
"Provide a valid HuggingFace model ID or local directory path."
) from hf_err
for candidate in _ONNX_CANDIDATE_FILENAMES:
p = model_dir / candidate
if p.exists():
onnx_path = str(p)
break
if onnx_path is None:
raise RuntimeError(
f"No ONNX model file found locally in '{model_path}'"
) from hf_err
tokenizer_path = str(model_dir / "tokenizer.json")
hf_import_err: Exception | None = None
hf_onnx_err: Exception | None = None
try:
from huggingface_hub import hf_hub_download # type: ignore
except ImportError as exc:
hf_hub_download = None
hf_import_err = exc
if hf_hub_download is not None:
for candidate in _ONNX_CANDIDATE_FILENAMES:
try:
onnx_path = hf_hub_download(repo_id=model_path, filename=candidate)
break
except Exception as exc:
hf_onnx_err = exc
continue
if onnx_path is not None:
try:
tokenizer_path = hf_hub_download(repo_id=model_path, filename="tokenizer.json")
except Exception as exc:
raise RuntimeError(
f"Found ONNX model in HuggingFace repo '{model_path}', "
"but required file 'tokenizer.json' is missing or could not be downloaded."
) from exc
if onnx_path is None:
model_dir = Path(model_path)
if model_dir.exists():
for candidate in _ONNX_CANDIDATE_FILENAMES:
p = model_dir / candidate
if p.exists():
onnx_path = str(p)
break
if onnx_path is None:
raise RuntimeError(
f"No ONNX model file found locally in '{model_path}'"
) from hf_onnx_err
tokenizer_path = str(model_dir / "tokenizer.json")
elif hf_import_err is not None:
raise RuntimeError(
"ONNX model not found locally and huggingface_hub is not installed. "
"Provide a local directory path or install huggingface_hub to load a HuggingFace model ID."
) from hf_import_err
elif hf_onnx_err is not None:
raise RuntimeError(
f"Unable to download an ONNX model file from HuggingFace repo '{model_path}', "
"and no local directory exists at that path."
) from hf_onnx_err
else:
raise RuntimeError(
f"ONNX model not found at '{model_path}'. "
"Provide a valid HuggingFace model ID or local directory path."
)

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +132
tokenizer_path = str(model_dir / "tokenizer.json")

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the local-path fallback, tokenizer_path is set to <model_dir>/tokenizer.json but its existence is not checked before calling Tokenizer.from_file(). If the tokenizer file is missing, the error will be a low-level file error rather than a clear message about the required file layout. Consider validating tokenizer_path exists and raising a RuntimeError with actionable guidance.

Suggested change
tokenizer_path = str(model_dir / "tokenizer.json")
local_tokenizer_path = model_dir / "tokenizer.json"
if not local_tokenizer_path.exists():
raise RuntimeError(
f"Missing tokenizer file in local ONNX model directory '{model_path}'. "
"Expected 'tokenizer.json' alongside the ONNX model files "
f"({', '.join(_ONNX_CANDIDATE_FILENAMES)}). "
"Provide a valid HuggingFace model ID or a local directory with the required files."
) from hf_err
tokenizer_path = str(local_tokenizer_path)

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +26
# Model path can be local file or HuggingFace hub model ID
DEFAULT_ONNX_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2"

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_ONNX_MODEL_PATH is introduced but (based on a repo-wide search) is not referenced anywhere else. If the intended default is already covered by DEFAULT_EMBEDDING_MODEL, consider removing this constant or wiring it into the CLI/config flow so it doesn’t drift unused.

Suggested change
# Model path can be local file or HuggingFace hub model ID
DEFAULT_ONNX_MODEL_PATH = "sentence-transformers/all-MiniLM-L6-v2"
# The default model identifier is provided by DEFAULT_EMBEDDING_MODEL.

Copilot uses AI. Check for mistakes.
Comment on lines 187 to +205
@@ -74,6 +192,17 @@ def resolve_embedding_backend(
if backend == "hash":
return "hash", None

if backend == "onnx":
try:
get_onnx_session(requested_model)
return "onnx", requested_model
except Exception as exc:
print(
f"[warn] ONNX embedding backend unavailable ({exc}). Falling back to hash embeddings.",
file=sys.stderr,
)
return "hash", None

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New ONNX support was added to resolve_embedding_backend() / embed_text(), but tests/test_embeddings.py does not currently exercise the onnx backend branch (even if only to assert graceful fallback when optional deps/models aren’t available). Adding coverage here would help prevent regressions in the default-backend selection and fallback behavior.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants