Skip to content

Speed up range-query intersections via seek_danger on RangeDocSet (up to ~50x faster)#2963

Open
PSeitz-dd wants to merge 1 commit into
mainfrom
seek_danger_range_doc_set
Open

Speed up range-query intersections via seek_danger on RangeDocSet (up to ~50x faster)#2963
PSeitz-dd wants to merge 1 commit into
mainfrom
seek_danger_range_doc_set

Conversation

@PSeitz-dd

Copy link
Copy Markdown
Contributor

A regular seek on RangeDocSet is costly: on a miss it fetches blocks and
scans the column forward to materialize the next matching doc. As a
non-leading docset in an intersection that work is wasted — the driver only
asks "does this candidate match?". seek_danger answers that with a cheap
point lookup via Column::values_for_doc, returning a lower bound on a miss
and leaving forward progress to the caller.

Forward seek_danger through ConstScorer.

Benchmarks (bool_queries_with_range, _all_results / DocSetCollector):

dense and 0.1% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.0827ms (-4.60%)     Median: 0.0825ms (-4.82%)     [0.0809ms .. 0.0891ms]    Output: 43
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.1937ms (-3.70%)     Median: 0.1930ms (-3.59%)     [0.1806ms .. 0.2044ms]    Output: 100
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 0.0367ms (-92.67%)    Median: 0.0365ms (-92.65%)    [0.0340ms .. 0.0398ms]    Output: 43
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 0.1052ms (-98.05%)    Median: 0.1050ms (-97.98%)    [0.1009ms .. 0.1117ms]    Output: 100
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.7147ms (-51.42%)    Median: 2.7075ms (-49.58%)    [2.6806ms .. 2.7799ms]    Output: 968
dense and 1% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.4373ms (-9.71%)     Median: 0.4357ms (-10.12%)    [0.4117ms .. 0.4711ms]    Output: 463
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.2342ms (-2.50%)     Median: 0.2338ms (-2.56%)     [0.2247ms .. 0.2452ms]    Output: 1_054
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 0.3956ms (-82.86%)    Median: 0.3943ms (-82.90%)    [0.3815ms .. 0.4119ms]    Output: 463
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 0.4896ms (-91.16%)    Median: 0.4862ms (-90.81%)    [0.4797ms .. 0.5084ms]    Output: 1_054
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.7108ms (-50.81%)    Median: 2.6925ms (-49.51%)    [2.6688ms .. 2.7868ms]    Output: 968
dense and 10% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.9869ms (-3.71%)     Median: 0.9833ms (-3.83%)     [0.9518ms .. 1.1218ms]    Output: 4_914
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.6352ms (-3.74%)     Median: 0.6363ms (-3.32%)     [0.6158ms .. 0.6488ms]    Output: 10_152
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 3.1264ms (+0.39%)     Median: 3.1466ms (+1.34%)     [3.0261ms .. 3.2051ms]    Output: 4_914
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 4.1547ms (-31.12%)    Median: 4.0933ms (-28.55%)    [3.7648ms .. 4.7600ms]    Output: 10_152
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.6973ms (-52.30%)    Median: 2.6901ms (-49.86%)    [2.6689ms .. 2.7677ms]    Output: 968

Gains are largest when the range query is the non-leading docset of a low-cardinality intersection.

@PSeitz-dd PSeitz-dd force-pushed the seek_danger_range_doc_set branch from 882ea34 to 7577a0b Compare June 17, 2026 10:03

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 882ea344ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +213 to +215
// `target` is not in the docset. The next match is strictly greater than `target`, so
// `target + 1` is a valid lower bound. We may leave the docset in an invalid state.
SeekDangerResult::SeekLowerBound(target + 1)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid returning only target+1 on sparse misses

When a sparse fast-field range is the non-leading side of an intersection whose lead scorer is moderately dense, Intersection::advance uses this returned lower bound as the next candidate, so returning only target + 1 forces a point lookup for nearly every lead posting until the next range hit. Before this override, the default seek_danger called seek, which let RangeDocSet use its batched range scan to jump to the next matching range doc; this change can therefore turn sparse-range intersections such as a ~50%-matching term AND a handful-of-docs range into millions of per-doc lookups.

Useful? React with 👍 / 👎.

@PSeitz PSeitz force-pushed the seek_danger_range_doc_set branch from 7577a0b to a398adc Compare June 22, 2026 10:47
… to ~50x faster)

A regular seek on RangeDocSet is costly: on a miss it fetches blocks and
scans the column forward to materialize the next matching doc. As a
non-leading docset in an intersection that work is wasted — the driver only
asks "does this candidate match?". seek_danger answers that with a cheap
point lookup via Column::values_for_doc, returning a lower bound on a miss
and leaving forward progress to the caller.

Forward seek_danger through ConstScorer.

Benchmarks (bool_queries_with_range, _all_results / DocSetCollector):

```
dense and 0.1% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.0827ms (-4.60%)     Median: 0.0825ms (-4.82%)     [0.0809ms .. 0.0891ms]    Output: 43
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.1937ms (-3.70%)     Median: 0.1930ms (-3.59%)     [0.1806ms .. 0.2044ms]    Output: 100
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 0.0367ms (-92.67%)    Median: 0.0365ms (-92.65%)    [0.0340ms .. 0.0398ms]    Output: 43
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 0.1052ms (-98.05%)    Median: 0.1050ms (-97.98%)    [0.1009ms .. 0.1117ms]    Output: 100
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.7147ms (-51.42%)    Median: 2.7075ms (-49.58%)    [2.6806ms .. 2.7799ms]    Output: 968
dense and 1% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.4373ms (-9.71%)     Median: 0.4357ms (-10.12%)    [0.4117ms .. 0.4711ms]    Output: 463
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.2342ms (-2.50%)     Median: 0.2338ms (-2.56%)     [0.2247ms .. 0.2452ms]    Output: 1_054
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 0.3956ms (-82.86%)    Median: 0.3943ms (-82.90%)    [0.3815ms .. 0.4119ms]    Output: 463
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 0.4896ms (-91.16%)    Median: 0.4862ms (-90.81%)    [0.4797ms .. 0.5084ms]    Output: 1_054
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.7108ms (-50.81%)    Median: 2.6925ms (-49.51%)    [2.6688ms .. 2.7868ms]    Output: 968
dense and 10% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.9869ms (-3.71%)     Median: 0.9833ms (-3.83%)     [0.9518ms .. 1.1218ms]    Output: 4_914
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.6352ms (-3.74%)     Median: 0.6363ms (-3.32%)     [0.6158ms .. 0.6488ms]    Output: 10_152
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 3.1264ms (+0.39%)     Median: 3.1466ms (+1.34%)     [3.0261ms .. 3.2051ms]    Output: 4_914
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 4.1547ms (-31.12%)    Median: 4.0933ms (-28.55%)    [3.7648ms .. 4.7600ms]    Output: 10_152
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.6973ms (-52.30%)    Median: 2.6901ms (-49.86%)    [2.6689ms .. 2.7677ms]    Output: 968
```

Gains are largest when the range query is the non-leading docset of a low-cardinality intersection.
@PSeitz PSeitz force-pushed the seek_danger_range_doc_set branch from a398adc to ca0ed87 Compare June 22, 2026 10:47

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca0ed87d79

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread benches/bool_queries_with_range.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants