Speed up range-query intersections via seek_danger on RangeDocSet (up to ~50x faster) by PSeitz-dd · Pull Request #2963 · quickwit-oss/tantivy

PSeitz-dd · 2026-06-17T10:02:33Z

A regular seek on RangeDocSet is costly: on a miss it fetches blocks and
scans the column forward to materialize the next matching doc. As a
non-leading docset in an intersection that work is wasted — the driver only
asks "does this candidate match?". seek_danger answers that with a cheap
point lookup via Column::values_for_doc, returning a lower bound on a miss
and leaving forward progress to the caller.

Forward seek_danger through ConstScorer.

Benchmarks (bool_queries_with_range, _all_results / DocSetCollector):

dense and 0.1% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.0827ms (-4.60%)     Median: 0.0825ms (-4.82%)     [0.0809ms .. 0.0891ms]    Output: 43
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.1937ms (-3.70%)     Median: 0.1930ms (-3.59%)     [0.1806ms .. 0.2044ms]    Output: 100
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 0.0367ms (-92.67%)    Median: 0.0365ms (-92.65%)    [0.0340ms .. 0.0398ms]    Output: 43
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 0.1052ms (-98.05%)    Median: 0.1050ms (-97.98%)    [0.1009ms .. 0.1117ms]    Output: 100
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.7147ms (-51.42%)    Median: 2.7075ms (-49.58%)    [2.6806ms .. 2.7799ms]    Output: 968
dense and 1% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.4373ms (-9.71%)     Median: 0.4357ms (-10.12%)    [0.4117ms .. 0.4711ms]    Output: 463
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.2342ms (-2.50%)     Median: 0.2338ms (-2.56%)     [0.2247ms .. 0.2452ms]    Output: 1_054
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 0.3956ms (-82.86%)    Median: 0.3943ms (-82.90%)    [0.3815ms .. 0.4119ms]    Output: 463
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 0.4896ms (-91.16%)    Median: 0.4862ms (-90.81%)    [0.4797ms .. 0.5084ms]    Output: 1_054
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.7108ms (-50.81%)    Median: 2.6925ms (-49.51%)    [2.6688ms .. 2.7868ms]    Output: 968
dense and 10% a
a_AND_num_rand:[0_TO_9]_all_results                                 Avg: 0.9869ms (-3.71%)     Median: 0.9833ms (-3.83%)     [0.9518ms .. 1.1218ms]    Output: 4_914
a_AND_num_asc:[0_TO_9]_all_results                                  Avg: 0.6352ms (-3.74%)     Median: 0.6363ms (-3.32%)     [0.6158ms .. 0.6488ms]    Output: 10_152
a_AND_num_rand_fast:[0_TO_9]_all_results                            Avg: 3.1264ms (+0.39%)     Median: 3.1466ms (+1.34%)     [3.0261ms .. 3.2051ms]    Output: 4_914
a_AND_num_asc_fast:[0_TO_9]_all_results                             Avg: 4.1547ms (-31.12%)    Median: 4.0933ms (-28.55%)    [3.7648ms .. 4.7600ms]    Output: 10_152
num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results        Avg: 2.6973ms (-52.30%)    Median: 2.6901ms (-49.86%)    [2.6689ms .. 2.7677ms]    Output: 968

Gains are largest when the range query is the non-leading docset of a low-cardinality intersection.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 882ea344ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-17T10:13:27Z

+            // `target` is not in the docset. The next match is strictly greater than `target`, so
+            // `target + 1` is a valid lower bound. We may leave the docset in an invalid state.
+            SeekDangerResult::SeekLowerBound(target + 1)


Avoid returning only target+1 on sparse misses

When a sparse fast-field range is the non-leading side of an intersection whose lead scorer is moderately dense, Intersection::advance uses this returned lower bound as the next candidate, so returning only target + 1 forces a point lookup for nearly every lead posting until the next range hit. Before this override, the default seek_danger called seek, which let RangeDocSet use its batched range scan to jump to the next matching range doc; this change can therefore turn sparse-range intersections such as a ~50%-matching term AND a handful-of-docs range into millions of per-doc lookups.

Useful? React with 👍 / 👎.

… to ~50x faster) A regular seek on RangeDocSet is costly: on a miss it fetches blocks and scans the column forward to materialize the next matching doc. As a non-leading docset in an intersection that work is wasted — the driver only asks "does this candidate match?". seek_danger answers that with a cheap point lookup via Column::values_for_doc, returning a lower bound on a miss and leaving forward progress to the caller. Forward seek_danger through ConstScorer. Benchmarks (bool_queries_with_range, _all_results / DocSetCollector): ``` dense and 0.1% a a_AND_num_rand:[0_TO_9]_all_results Avg: 0.0827ms (-4.60%) Median: 0.0825ms (-4.82%) [0.0809ms .. 0.0891ms] Output: 43 a_AND_num_asc:[0_TO_9]_all_results Avg: 0.1937ms (-3.70%) Median: 0.1930ms (-3.59%) [0.1806ms .. 0.2044ms] Output: 100 a_AND_num_rand_fast:[0_TO_9]_all_results Avg: 0.0367ms (-92.67%) Median: 0.0365ms (-92.65%) [0.0340ms .. 0.0398ms] Output: 43 a_AND_num_asc_fast:[0_TO_9]_all_results Avg: 0.1052ms (-98.05%) Median: 0.1050ms (-97.98%) [0.1009ms .. 0.1117ms] Output: 100 num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results Avg: 2.7147ms (-51.42%) Median: 2.7075ms (-49.58%) [2.6806ms .. 2.7799ms] Output: 968 dense and 1% a a_AND_num_rand:[0_TO_9]_all_results Avg: 0.4373ms (-9.71%) Median: 0.4357ms (-10.12%) [0.4117ms .. 0.4711ms] Output: 463 a_AND_num_asc:[0_TO_9]_all_results Avg: 0.2342ms (-2.50%) Median: 0.2338ms (-2.56%) [0.2247ms .. 0.2452ms] Output: 1_054 a_AND_num_rand_fast:[0_TO_9]_all_results Avg: 0.3956ms (-82.86%) Median: 0.3943ms (-82.90%) [0.3815ms .. 0.4119ms] Output: 463 a_AND_num_asc_fast:[0_TO_9]_all_results Avg: 0.4896ms (-91.16%) Median: 0.4862ms (-90.81%) [0.4797ms .. 0.5084ms] Output: 1_054 num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results Avg: 2.7108ms (-50.81%) Median: 2.6925ms (-49.51%) [2.6688ms .. 2.7868ms] Output: 968 dense and 10% a a_AND_num_rand:[0_TO_9]_all_results Avg: 0.9869ms (-3.71%) Median: 0.9833ms (-3.83%) [0.9518ms .. 1.1218ms] Output: 4_914 a_AND_num_asc:[0_TO_9]_all_results Avg: 0.6352ms (-3.74%) Median: 0.6363ms (-3.32%) [0.6158ms .. 0.6488ms] Output: 10_152 a_AND_num_rand_fast:[0_TO_9]_all_results Avg: 3.1264ms (+0.39%) Median: 3.1466ms (+1.34%) [3.0261ms .. 3.2051ms] Output: 4_914 a_AND_num_asc_fast:[0_TO_9]_all_results Avg: 4.1547ms (-31.12%) Median: 4.0933ms (-28.55%) [3.7648ms .. 4.7600ms] Output: 10_152 num_rand_fast:[0_TO_9]_AND_num_asc_fast:[0_TO_9]_all_results Avg: 2.6973ms (-52.30%) Median: 2.6901ms (-49.86%) [2.6689ms .. 2.7677ms] Output: 968 ``` Gains are largest when the range query is the non-leading docset of a low-cardinality intersection.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca0ed87d79

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

PSeitz-dd force-pushed the seek_danger_range_doc_set branch from 882ea34 to 7577a0b Compare June 17, 2026 10:03

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

PSeitz requested review from fulmicoton, fulmicoton-dd and trinity-1686a June 18, 2026 07:43

PSeitz force-pushed the seek_danger_range_doc_set branch from 7577a0b to a398adc Compare June 22, 2026 10:47

PSeitz force-pushed the seek_danger_range_doc_set branch from a398adc to ca0ed87 Compare June 22, 2026 10:47

chatgpt-codex-connector Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread benches/bool_queries_with_range.rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Speed up range-query intersections via seek_danger on RangeDocSet (up to ~50x faster)#2963

Speed up range-query intersections via seek_danger on RangeDocSet (up to ~50x faster)#2963
PSeitz-dd wants to merge 1 commit into
mainfrom
seek_danger_range_doc_set

PSeitz-dd commented Jun 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

PSeitz-dd commented Jun 17, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants