test: add test that validate partial reduce with different number of state fields#21175
Conversation
| /// This simulates a tree-reduce pattern: | ||
| /// Partial -> PartialReduce -> Final | ||
| #[tokio::test] | ||
| async fn test_partial_reduce_mode_on_aggregate_that_have_more_than_state_fields_than_input_arguments() |
There was a problem hiding this comment.
This new approx_percentile_cont case is helpful, but it is still only an indirect proxy for the regression mentioned in the PR description.
Right now it shows that one concrete aggregate can round-trip through Partial -> PartialReduce -> Final, but it would not fail if PartialReduce accidentally passed state-field types into AccumulatorArgs::expr_fields.
It would be great to add a small test-only aggregate stub that asserts its accumulator receives the original input field types. That would directly pin the intended contract and avoid giving false confidence in cases where the first state field happens to remain compatible.
| Arc::new(Float64Array::from(vec![40.0, 50.0, 60.0])), | ||
| ], | ||
| )?; | ||
| async fn evaluate_partial_reduce( |
There was a problem hiding this comment.
evaluate_partial_reduce is a nice extraction 👍
One thing I noticed is that the three callers still repeat the same schema and batch setup. You might consider table-driving the UDAF, alias, and expected snapshot so the test focuses more on which aggregate shape is being exercised rather than re-encoding the same fixture each time.
There was a problem hiding this comment.
This helper currently hardcodes vec![None] for the filter list, which quietly assumes there is only a single aggregate expression.
Using vec![None; aggregates.len()] would make it more reusable if we add more aggregate cases later and also makes the helper's contract a bit clearer.
| } | ||
|
|
||
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
| if arg_types.len() > 3 { |
There was a problem hiding this comment.
Since the public signature already restricts approx_percentile_cont to 2 or 3 arguments, this extra arity check reads more like a defensive internal invariant than user-facing validation.
A short comment explaining that it protects aggregate planning from accidentally feeding state-field types back into return_type would help clarify the intent for future readers.
|
@kosiew sorry for the late reply, I missed that update updated based on your review, can you please re-review |
Which issue does this PR close?
N/A
Rationale for this change
making sure that data_type on accumulator does not get called with the state fields data types in partial reduce
What changes are included in this PR?
added tests and validation for data_type input
Are these changes tested?
just tests
Are there any user-facing changes?
not really