Skip to content

feat: support ONNX models that require token_type_ids#607

Draft
j-sperling wants to merge 3 commits into
zilliztech:mainfrom
j-sperling:feat/onnx-token-type-ids
Draft

feat: support ONNX models that require token_type_ids#607
j-sperling wants to merge 3 commits into
zilliztech:mainfrom
j-sperling:feat/onnx-token-type-ids

Conversation

@j-sperling

@j-sperling j-sperling commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Summary

  • BERT-family ONNX exports declare a token_type_ids input, and session.run() requires every declared input in the feed — so models like Xenova/all-MiniLM-L6-v2 and Xenova/bge-small-en-v1.5 currently fail with Required inputs (['token_type_ids']) are missing. This detects declared inputs once at init and feeds all-zero segment ids when required (correct for single-sequence embedding). XLM-R-family models (the bge-m3 default) are unaffected.
  • Why it matters: it unlocks a small-model tier for bulk ingest without touching the default. Verified locally: Xenova/bge-small-en-v1.5 (33M, CLS-pooling-native — matching this provider's CLS pooling) loads, passes a semantic sanity check, and embeds a 64-text batch in ~0.02s vs ~6.6s for the 568M int8 bge-m3 default on the same CPU. Corpus-scale context: indexing 186K chunks with the default took >4.5h on an M-series CPU; a small-tier model brings that to minutes via embedding.model config, no code change.
  • Deliberately not changing DEFAULT_MODELS["onnx"]: a default swap changes embedding dimension (1024 → 384) and would break existing Milvus collections on upgrade. Users opt in per install via memsearch config set embedding.model.
  • Caveat worth documenting: Xenova/multilingual-e5-small now loads but is not recommended — e5 models expect query:/passage: prefixes and mean pooling, and scored poorly on the sanity check under CLS pooling.

Test plan

  • tests/test_embeddings_onnx_inputs.py: stub-session tests that token_type_ids is fed as zeros (shape-matched to input_ids) when declared, and omitted when not — no model download needed
  • Full suite: 238 passed, 7 skipped
  • ruff check / ruff format --check clean
  • Live load + embed + semantic sanity for bge-small-en-v1.5, all-MiniLM-L6-v2, multilingual-e5-small; default bge-m3 path unchanged

BERT-family ONNX exports (Xenova/all-MiniLM-L6-v2, Xenova/bge-small-en-v1.5)
declare a token_type_ids input, and session.run() requires every declared
input to be fed, so such models fail to embed with "Required inputs
(['token_type_ids']) are missing". Detect the declared inputs once at init
and feed all-zero segment ids when required (single-sequence embedding).

This unlocks a small-model tier for bulk ingest: bge-small-en-v1.5 (33M,
CLS-pooling-native, matching this provider's pooling) embeds a 64-text
batch in ~0.02s vs ~6.6s for the 568M default on the same CPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant