refactor(rust/sedona-query-planner): Move query planner and utilities to dedicated crate#735
Conversation
| /// Arguments passed to a [`SpatialJoinFactory`] when planning a spatial join. | ||
| pub struct PlanSpatialJoinArgs<'a> { | ||
| pub physical_left: &'a Arc<dyn ExecutionPlan>, | ||
| pub physical_right: &'a Arc<dyn ExecutionPlan>, | ||
| pub spatial_predicate: &'a SpatialPredicate, | ||
| pub remainder: Option<&'a JoinFilter>, | ||
| pub join_type: &'a JoinType, | ||
| pub join_options: &'a SpatialJoinOptions, | ||
| pub options: &'a Arc<ConfigOptions>, | ||
| } | ||
|
|
||
| /// Factory trait for creating spatial join physical plans. | ||
| /// | ||
| /// Implementations decide whether they can handle a given spatial predicate and, | ||
| /// if so, produce an appropriate [`ExecutionPlan`]. | ||
| pub trait SpatialJoinFactory: std::fmt::Debug + Send + Sync { | ||
| /// Given the provided arguments, produce an [ExecutionPlan] implementing | ||
| /// the join operation or `None` if this implementation cannot execute the join | ||
| fn plan_spatial_join( | ||
| &self, | ||
| args: &PlanSpatialJoinArgs<'_>, | ||
| ) -> Result<Option<Arc<dyn ExecutionPlan>>>; | ||
| } |
There was a problem hiding this comment.
This is the new piece of code that allows the query planner to live here (with the default join injected in the sedona crate when we construct the session state).
There was a problem hiding this comment.
I suggest renaming it to SpatialJoinPhysicalPlanner or SpatialJoinPlanner since the main function of this trait is producing physical plans. SpatialJoinFactory is a bit vague since there are lots of components related to spatial join.
| impl SpatialJoinFactory for DefaultSpatialJoinFactory { | ||
| fn plan_spatial_join( | ||
| &self, | ||
| args: &PlanSpatialJoinArgs<'_>, | ||
| ) -> Result<Option<Arc<dyn ExecutionPlan>>> { | ||
| if !is_spatial_predicate_supported( | ||
| args.spatial_predicate, | ||
| &args.physical_left.schema(), | ||
| &args.physical_right.schema(), | ||
| )? { | ||
| return Ok(None); | ||
| } | ||
|
|
||
| let should_swap = !matches!( | ||
| args.spatial_predicate, | ||
| SpatialPredicate::KNearestNeighbors(_) | ||
| ) && args.join_type.supports_swap() | ||
| && should_swap_join_order(args.physical_left.as_ref(), args.physical_right.as_ref())?; |
There was a problem hiding this comment.
This is the implementation of the factory for the default join (the only modification here is using args...previously this was plan_extension()).
| let mut planner = SedonaQueryPlanner::new(); | ||
| #[cfg(feature = "spatial-join")] | ||
| { | ||
| state_builder = sedona_spatial_join::register_planner(state_builder)?; | ||
| use sedona_spatial_join::factory::DefaultSpatialJoinFactory; | ||
|
|
||
| planner = planner.with_spatial_join_factory(Arc::new(DefaultSpatialJoinFactory::new())); | ||
| } | ||
|
|
||
| state_builder = register_spatial_join_logical_optimizer(state_builder)?; | ||
| state_builder = state_builder.with_query_planner(Arc::new(planner)); |
There was a problem hiding this comment.
This is how it is used when creating the session
| /// Arguments passed to a [`SpatialJoinFactory`] when planning a spatial join. | ||
| pub struct PlanSpatialJoinArgs<'a> { | ||
| pub physical_left: &'a Arc<dyn ExecutionPlan>, | ||
| pub physical_right: &'a Arc<dyn ExecutionPlan>, | ||
| pub spatial_predicate: &'a SpatialPredicate, | ||
| pub remainder: Option<&'a JoinFilter>, | ||
| pub join_type: &'a JoinType, | ||
| pub join_options: &'a SpatialJoinOptions, | ||
| pub options: &'a Arc<ConfigOptions>, | ||
| } | ||
|
|
||
| /// Factory trait for creating spatial join physical plans. | ||
| /// | ||
| /// Implementations decide whether they can handle a given spatial predicate and, | ||
| /// if so, produce an appropriate [`ExecutionPlan`]. | ||
| pub trait SpatialJoinFactory: std::fmt::Debug + Send + Sync { | ||
| /// Given the provided arguments, produce an [ExecutionPlan] implementing | ||
| /// the join operation or `None` if this implementation cannot execute the join | ||
| fn plan_spatial_join( | ||
| &self, | ||
| args: &PlanSpatialJoinArgs<'_>, | ||
| ) -> Result<Option<Arc<dyn ExecutionPlan>>>; | ||
| } |
There was a problem hiding this comment.
I suggest renaming it to SpatialJoinPhysicalPlanner or SpatialJoinPlanner since the main function of this trait is producing physical plans. SpatialJoinFactory is a bit vague since there are lots of components related to spatial join.
|
@pwrliang Can you give this a review? Are there any other pieces it would be helpful if I exposed here? (Also you can do this in the follow up that implements the GPU join as you run across sharp edges) |
pwrliang
left a comment
There was a problem hiding this comment.
Thanks for this PR. I have some issues with visibility. I have annotated them, not sure whether there're more.
There was a problem hiding this comment.
#[cfg(test)] does not work cross-crate. We may resort to #[allow(dead_code)]
There was a problem hiding this comment.
I just removed it (now that it's pub there's no warning!)
| pub(crate) mod spatial_index; | ||
| pub(crate) mod spatial_index_builder; |
There was a problem hiding this comment.
Shall we also change the visibility to pub for spatial_index and spatial_index_builder as SpatialIndex trait has been declared as pub in this PR?
rust/sedona-spatial-join/src/lib.rs
Outdated
| @@ -18,25 +18,25 @@ | |||
| pub mod evaluated_batch; | |||
| pub mod exec; | |||
| mod index; | |||
There was a problem hiding this comment.
We may expose index to pub
There was a problem hiding this comment.
We may change the visiblity to pub or add some methods to update these variables, otherwise we cannot use SpatialJoinBuildMetrics out of the crate.
|
Thanks @pwrliang! I think I got all the visibility issues you noted but if you run into more when doing the GPU join feel free to update. If there are no objections I'll merge tomorrow! |
This PR separates the planning component from the sedona-spatial-join crate into sedona-query-planner. There are a few things related to planning (e.g., optimizer rules, statistics propagation) that are in our future and a few things regarding joins in our future (GPU joins, Geography joins) and I would like to separate out the execution to make it easier to experiment without touching the internals of sedona-spatial-join (which has been carefully optimized!).
This is inspired by #722 (review) but is a standalone improvement regardless, particularly so that there's an obvious place to put other optimizer rules and analyzer code. I'll stack a PR on top of this to show how it will help more cleanly separate the GPU join, too.