Skip to content

refactor(rust/sedona-query-planner): Move query planner and utilities to dedicated crate#735

Merged
paleolimbot merged 26 commits intoapache:mainfrom
paleolimbot:join-common
Apr 7, 2026
Merged

refactor(rust/sedona-query-planner): Move query planner and utilities to dedicated crate#735
paleolimbot merged 26 commits intoapache:mainfrom
paleolimbot:join-common

Conversation

@paleolimbot
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot commented Mar 24, 2026

This PR separates the planning component from the sedona-spatial-join crate into sedona-query-planner. There are a few things related to planning (e.g., optimizer rules, statistics propagation) that are in our future and a few things regarding joins in our future (GPU joins, Geography joins) and I would like to separate out the execution to make it easier to experiment without touching the internals of sedona-spatial-join (which has been carefully optimized!).

This is inspired by #722 (review) but is a standalone improvement regardless, particularly so that there's an obvious place to put other optimizer rules and analyzer code. I'll stack a PR on top of this to show how it will help more cleanly separate the GPU join, too.

Comment on lines +44 to +66
/// Arguments passed to a [`SpatialJoinFactory`] when planning a spatial join.
pub struct PlanSpatialJoinArgs<'a> {
pub physical_left: &'a Arc<dyn ExecutionPlan>,
pub physical_right: &'a Arc<dyn ExecutionPlan>,
pub spatial_predicate: &'a SpatialPredicate,
pub remainder: Option<&'a JoinFilter>,
pub join_type: &'a JoinType,
pub join_options: &'a SpatialJoinOptions,
pub options: &'a Arc<ConfigOptions>,
}

/// Factory trait for creating spatial join physical plans.
///
/// Implementations decide whether they can handle a given spatial predicate and,
/// if so, produce an appropriate [`ExecutionPlan`].
pub trait SpatialJoinFactory: std::fmt::Debug + Send + Sync {
/// Given the provided arguments, produce an [ExecutionPlan] implementing
/// the join operation or `None` if this implementation cannot execute the join
fn plan_spatial_join(
&self,
args: &PlanSpatialJoinArgs<'_>,
) -> Result<Option<Arc<dyn ExecutionPlan>>>;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new piece of code that allows the query planner to live here (with the default join injected in the sedona crate when we construct the session state).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming it to SpatialJoinPhysicalPlanner or SpatialJoinPlanner since the main function of this trait is producing physical plans. SpatialJoinFactory is a bit vague since there are lots of components related to spatial join.

Comment on lines +54 to +71
impl SpatialJoinFactory for DefaultSpatialJoinFactory {
fn plan_spatial_join(
&self,
args: &PlanSpatialJoinArgs<'_>,
) -> Result<Option<Arc<dyn ExecutionPlan>>> {
if !is_spatial_predicate_supported(
args.spatial_predicate,
&args.physical_left.schema(),
&args.physical_right.schema(),
)? {
return Ok(None);
}

let should_swap = !matches!(
args.spatial_predicate,
SpatialPredicate::KNearestNeighbors(_)
) && args.join_type.supports_swap()
&& should_swap_join_order(args.physical_left.as_ref(), args.physical_right.as_ref())?;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the implementation of the factory for the default join (the only modification here is using args...previously this was plan_extension()).

Comment on lines +149 to +158
let mut planner = SedonaQueryPlanner::new();
#[cfg(feature = "spatial-join")]
{
state_builder = sedona_spatial_join::register_planner(state_builder)?;
use sedona_spatial_join::factory::DefaultSpatialJoinFactory;

planner = planner.with_spatial_join_factory(Arc::new(DefaultSpatialJoinFactory::new()));
}

state_builder = register_spatial_join_logical_optimizer(state_builder)?;
state_builder = state_builder.with_query_planner(Arc::new(planner));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it is used when creating the session

Comment on lines +44 to +66
/// Arguments passed to a [`SpatialJoinFactory`] when planning a spatial join.
pub struct PlanSpatialJoinArgs<'a> {
pub physical_left: &'a Arc<dyn ExecutionPlan>,
pub physical_right: &'a Arc<dyn ExecutionPlan>,
pub spatial_predicate: &'a SpatialPredicate,
pub remainder: Option<&'a JoinFilter>,
pub join_type: &'a JoinType,
pub join_options: &'a SpatialJoinOptions,
pub options: &'a Arc<ConfigOptions>,
}

/// Factory trait for creating spatial join physical plans.
///
/// Implementations decide whether they can handle a given spatial predicate and,
/// if so, produce an appropriate [`ExecutionPlan`].
pub trait SpatialJoinFactory: std::fmt::Debug + Send + Sync {
/// Given the provided arguments, produce an [ExecutionPlan] implementing
/// the join operation or `None` if this implementation cannot execute the join
fn plan_spatial_join(
&self,
args: &PlanSpatialJoinArgs<'_>,
) -> Result<Option<Arc<dyn ExecutionPlan>>>;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming it to SpatialJoinPhysicalPlanner or SpatialJoinPlanner since the main function of this trait is producing physical plans. SpatialJoinFactory is a bit vague since there are lots of components related to spatial join.

@paleolimbot
Copy link
Copy Markdown
Member Author

@pwrliang Can you give this a review? Are there any other pieces it would be helpful if I exposed here? (Also you can do this in the follow up that implements the GPU join as you run across sharp edges)

Copy link
Copy Markdown
Contributor

@pwrliang pwrliang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. I have some issues with visibility. I have annotated them, not sure whether there're more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[cfg(test)] does not work cross-crate. We may resort to #[allow(dead_code)]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed it (now that it's pub there's no warning!)

Comment on lines 24 to 25
pub(crate) mod spatial_index;
pub(crate) mod spatial_index_builder;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also change the visibility to pub for spatial_index and spatial_index_builder as SpatialIndex trait has been declared as pub in this PR?

@@ -18,25 +18,25 @@
pub mod evaluated_batch;
pub mod exec;
mod index;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may expose index to pub

Comment on lines 44 to 46
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may change the visiblity to pub or add some methods to update these variables, otherwise we cannot use SpatialJoinBuildMetrics out of the crate.

@paleolimbot
Copy link
Copy Markdown
Member Author

Thanks @pwrliang! I think I got all the visibility issues you noted but if you run into more when doing the GPU join feel free to update.

If there are no objections I'll merge tomorrow!

@paleolimbot paleolimbot merged commit 40e1460 into apache:main Apr 7, 2026
17 checks passed
@paleolimbot paleolimbot deleted the join-common branch April 7, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants