Improve BufferedUnionScorer performance by ~9%#2942
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 672bf45235
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
51902f6 to
1095543
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 10955437d2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
679dc20 to
74f37f0
Compare
Add horizon-limited buffering APIs for docsets and scorers so buffered union can refill from block-oriented postings while preserving term frequencies. This lets term scorers score buffered docs directly and reduces per-document refill overhead for dense unions.
Reuse BM25 TF normalization caches for weights with the same average fieldnorm using a bounded thread-local LRU. This avoids recomputing and duplicating the cache for many terms on the same field without adding cross-thread contention.
BufferedUnionScorer can use score_doc during refill only when the score combiner needs scores. DoNothingCombiner now advertises that scoring is unnecessary, preserving the no-score path for count collectors and avoiding wasted score_doc calls. Add a regression test that verifies DoNothingCombiner does not invoke score() or score_doc() while counting a buffered union.
Improves Batch-handling of
BufferedUnionScorerfor better performance.BufferedUnionScorernow has two refill paths.The old path scores from the scorer's current doc one by one.
The new path is for scorers that can score a doc passed in explicitly, which can work on batches:
score_doc(doc, term_freq)score_doc(doc, term_freq)is differnt because the scorer's current doc is not the current doc being scored.This direction is required for a more batch oriented approach to scoring.
TermScorerimplements score_doc, but most scorers like a union scorer will not.For those scorers we will check upfront if they implement
score_docand if not we will fallback to the old path.BM25 fieldnorm score caches are also reused per thread.
This reduces the different memory addresses we need to access.
Benchmark
Caveat
The performance improvement mostly affects terms with many hits. Low number of hits degrade in perforamnce. (this should be fixable)
Future Work
The biggest consumer with 17% is
BufferedUnionScorer::advance_buffered. A collect_block for scores may help