perf: raise ONNX default batch size to 64 by j-sperling · Pull Request #608 · zilliztech/memsearch

j-sperling · 2026-07-05T06:39:22Z

Summary

Raises OnnxEmbedding._DEFAULT_BATCH_SIZE from 32 to 64. Measured on the default gpahal/bge-m3-onnx-int8 (CPU, Apple M-series, 128 texts x ~230 tokens): 14.1s @ batch 16, 13.2s @ 32, 11.7s @ 64, 10.6s @ 128 — so 64 is ~11% faster than the current default. On corpus-scale indexing (observed >4.5h for 186K chunks with this model) that's tens of minutes.
128 is deliberately not the default: the tokenizer pads to the longest text in a batch with max_length=8192, so a worst-case batch of long texts at 128-wide materializes multi-GB activation tensors. 64 keeps the worst case bounded while capturing most of the win; users can still set embedding.batch_size = 128 in config where memory allows.
embedding.batch_size config continues to override; 0 still means "provider default".

Test plan

Full suite passes (no test pins the ONNX default batch size)
ruff check / ruff format --check clean
Timing measurements above reproduce with a 4-text warmup + time.perf_counter around embed()

Measured on the default int8 bge-m3 model (CPU, Apple M-series, 128 texts): 13.2s at batch 32, 11.7s at 64, 10.6s at 128. 64 gives ~11% indexing throughput over the old default; 128 is left to explicit configuration because worst-case padded batches of 8192-token inputs materialize multi-GB activation tensors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: raise ONNX default batch size to 64#608

perf: raise ONNX default batch size to 64#608
j-sperling wants to merge 1 commit into
zilliztech:mainfrom
j-sperling:perf/onnx-default-batch-size

j-sperling commented Jul 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

j-sperling commented Jul 5, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant