Skip to content

Support RunEndEncoded arrays in comparison kernels (eq, lt, etc.) #9620

@asubiotto

Description

@asubiotto

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The comparison kernels in arrow-ord (eq, neq, lt, lt_eq, gt, gt_eq) do not support RunEndEncoded arrays. Passing a REE array to any of these functions hits an unreachable!() panic, forcing callers to manually decode REE to flat arrays before comparing. This defeats the compression benefit of REE — particularly for filter operations (scalar comparison) which are the dominant use case.

This is part of the ongoing effort to support REE in compute kernels (#3520).

Describe the solution you'd like

Add REE support to compare_op in arrow-ord/src/cmp.rs, following the existing dictionary unwrapping pattern:

  1. REE vs scalar (the filter case): compare only the physical values (one per run), then bulk-expand the boolean result to logical length. This should be O(m) comparisons + O(n) expansion.

  2. REE vs REE: walk both run_ends arrays simultaneously, compare once per aligned segment, and bulk-fill the result. O(m1+m2) comparisons instead of O(n) per-element.

  3. Mixed cases (REE vs plain, REE wrapping dictionary): fall back to materializing a logical-length index vector and using the existing vectored comparison path.

Describe alternatives you've considered

  • Decode REE to flat before comparing: simpler (zero changes to cmp.rs) but materializes the full logical array, losing all compression benefit.

  • Treat REE as dictionary (implement AnyDictionaryArray for RunArray): would reuse the existing dict path but would just hide the same logical-length allocation inside normalized_keys(). The access patterns differ fundamentally — dictionary keys are stored per-element, while REE run-ends are compressed.

Additional context

None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions