Skip to content

perf: Support LEFT/RIGHT OUTER Spatial Joins for RangeJoinExec and DistanceJoinExec #2570

@petern48

Description

@petern48

We currently only support other join types for BroadcastIndexJoinExec. In #2561, a contributor added docs for a workaround for how they achieved using Sedona's RangeJoinExec for a LEFT JOIN. It involved rewriting the query with a CTE. While it works, we should support this behavior natively, so users don't have to resort to this workaround or, worse, move forward with the unoptimized behavior. Currently, we're falling back to BroadcastNestedLoopJoin from Spark if BroadcastIndexJoinExec isn't triggered.

Here's the workaround in the docs. We should also remove this from the docs once this is achieved.

WITH inner_join AS (
    SELECT
        dfA.a_id
        , dfB.b_id
    FROM dfA, dfB
    WHERE ST_INTERSECTS(dfA.geometry, dfB.geometry)
)

SELECT
    dfA.*,
    inner_join.b_id
FROM dfA
LEFT JOIN inner_join
   ON dfA.a_id = inner_join.a_id;

More context:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions