Improve BufferedUnionScorer performance by ~9% by PSeitz · Pull Request #2942 · quickwit-oss/tantivy

PSeitz · 2026-05-31T19:14:24Z

Improves Batch-handling of BufferedUnionScorer for better performance.

BufferedUnionScorer now has two refill paths.
The old path scores from the scorer's current doc one by one.

The new path is for scorers that can score a doc passed in explicitly, which can work on batches:

fill a buffer with docs below the current horizon
fetch term frequencies for to those docs
insert the buffered docs into the union window
score them with score_doc(doc, term_freq)

score_doc(doc, term_freq) is differnt because the scorer's current doc is not the current doc being scored.
This direction is required for a more batch oriented approach to scoring.

TermScorer implements score_doc, but most scorers like a union scorer will not.
For those scorers we will check upfront if they implement score_doc and if not we will fallback to the old path.

BM25 fieldnorm score caches are also reused per thread.
This reduces the different memory addresses we need to access.

Benchmark

Caveat

The performance improvement mostly affects terms with many hits. Low number of hits degrade in perforamnce. (this should be fixable)

Future Work

The biggest consumer with 17% is BufferedUnionScorer::advance_buffered. A collect_block for scores may help

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 672bf45235

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10955437d2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Add horizon-limited buffering APIs for docsets and scorers so buffered union can refill from block-oriented postings while preserving term frequencies. This lets term scorers score buffered docs directly and reduces per-document refill overhead for dense unions.

Reuse BM25 TF normalization caches for weights with the same average fieldnorm using a bounded thread-local LRU. This avoids recomputing and duplicating the cache for many terms on the same field without adding cross-thread contention.

BufferedUnionScorer can use score_doc during refill only when the score combiner needs scores. DoNothingCombiner now advertises that scoring is unnecessary, preserving the no-score path for count collectors and avoiding wasted score_doc calls. Add a regression test that verifies DoNothingCombiner does not invoke score() or score_doc() while counting a buffered union.

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Comment thread src/query/exclude.rs Outdated

PSeitz force-pushed the faster_union branch 2 times, most recently from 51902f6 to 1095543 Compare May 31, 2026 19:23

chatgpt-codex-connector Bot reviewed May 31, 2026

View reviewed changes

Comment thread src/query/union/buffered_union.rs

PSeitz force-pushed the faster_union branch 2 times, most recently from 679dc20 to 74f37f0 Compare May 31, 2026 19:58

PSeitz changed the title ~~Improve scored unions by ~9%~~ Improve scored and count-only unions by ~9% May 31, 2026

PSeitz added 7 commits June 12, 2026 09:29

Defer terminated scorer removal during buffered refill

cc28fb7

Split buffered refill from scorer removal

eb1aabf

Share BM25 fieldnorm caches per thread

8e20e59

Reuse BM25 TF normalization caches for weights with the same average fieldnorm using a bounded thread-local LRU. This avoids recomputing and duplicating the cache for many terms on the same field without adding cross-thread contention.

Clarify postings copy variable names

0cbed3d

cargo fmt, remove impl

91db5c5

PSeitz-dd force-pushed the faster_union branch from 74f37f0 to 72a1d4a Compare June 12, 2026 07:29

PSeitz changed the title ~~Improve scored and count-only unions by ~9%~~ Improve unions by ~9% Jun 12, 2026

PSeitz changed the title ~~Improve unions by ~9%~~ Improve unions performance by ~9% Jun 12, 2026

PSeitz changed the title ~~Improve unions performance by ~9%~~ Improve BufferedUnionScorer performance by ~9% Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve BufferedUnionScorer performance by ~9%#2942

Improve BufferedUnionScorer performance by ~9%#2942
PSeitz wants to merge 7 commits into
mainfrom
faster_union

PSeitz commented May 31, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

PSeitz commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Caveat

Future Work

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

PSeitz commented May 31, 2026 •

edited

Loading