-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The comparison kernels in arrow-ord (eq, neq, lt, lt_eq, gt, gt_eq) do not support RunEndEncoded arrays. Passing a REE array to any of these functions hits an unreachable!() panic, forcing callers to manually decode REE to flat arrays before comparing. This defeats the compression benefit of REE — particularly for filter operations (scalar comparison) which are the dominant use case.
This is part of the ongoing effort to support REE in compute kernels (#3520).
Describe the solution you'd like
Add REE support to compare_op in arrow-ord/src/cmp.rs, following the existing dictionary unwrapping pattern:
-
REE vs scalar (the filter case): compare only the physical values (one per run), then bulk-expand the boolean result to logical length. This should be O(m) comparisons + O(n) expansion.
-
REE vs REE: walk both run_ends arrays simultaneously, compare once per aligned segment, and bulk-fill the result. O(m1+m2) comparisons instead of O(n) per-element.
-
Mixed cases (REE vs plain, REE wrapping dictionary): fall back to materializing a logical-length index vector and using the existing vectored comparison path.
Describe alternatives you've considered
-
Decode REE to flat before comparing: simpler (zero changes to
cmp.rs) but materializes the full logical array, losing all compression benefit. -
Treat REE as dictionary (implement
AnyDictionaryArrayforRunArray): would reuse the existing dict path but would just hide the same logical-length allocation insidenormalized_keys(). The access patterns differ fundamentally — dictionary keys are stored per-element, while REE run-ends are compressed.
Additional context
None.