Skip to content

test: add test that validate partial reduce with different number of state fields#21175

Open
rluvaton wants to merge 7 commits into
apache:mainfrom
rluvaton:add-tests-for-partial-reduce-with-different-state-fields
Open

test: add test that validate partial reduce with different number of state fields#21175
rluvaton wants to merge 7 commits into
apache:mainfrom
rluvaton:add-tests-for-partial-reduce-with-different-state-fields

Conversation

@rluvaton
Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A

Rationale for this change

making sure that data_type on accumulator does not get called with the state fields data types in partial reduce

What changes are included in this PR?

added tests and validation for data_type input

Are these changes tested?

just tests

Are there any user-facing changes?

not really

@github-actions github-actions Bot added functions Changes to functions implementation physical-plan Changes to the physical-plan crate labels Mar 26, 2026
@rluvaton rluvaton marked this pull request as ready for review March 26, 2026 20:01
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rluvaton

This is moving in the right direction, but I think we can tighten the regression coverage.

/// This simulates a tree-reduce pattern:
/// Partial -> PartialReduce -> Final
#[tokio::test]
async fn test_partial_reduce_mode_on_aggregate_that_have_more_than_state_fields_than_input_arguments()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new approx_percentile_cont case is helpful, but it is still only an indirect proxy for the regression mentioned in the PR description.

Right now it shows that one concrete aggregate can round-trip through Partial -> PartialReduce -> Final, but it would not fail if PartialReduce accidentally passed state-field types into AccumulatorArgs::expr_fields.

It would be great to add a small test-only aggregate stub that asserts its accumulator receives the original input field types. That would directly pin the intended contract and avoid giving false confidence in cases where the first state field happens to remain compatible.

Arc::new(Float64Array::from(vec![40.0, 50.0, 60.0])),
],
)?;
async fn evaluate_partial_reduce(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluate_partial_reduce is a nice extraction 👍

One thing I noticed is that the three callers still repeat the same schema and batch setup. You might consider table-driving the UDAF, alias, and expected snapshot so the test focuses more on which aggregate shape is being exercised rather than re-encoding the same fixture each time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper currently hardcodes vec![None] for the filter list, which quietly assumes there is only a single aggregate expression.

Using vec![None; aggregates.len()] would make it more reusable if we add more aggregate cases later and also makes the helper's contract a bit clearer.

}

fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
if arg_types.len() > 3 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the public signature already restricts approx_percentile_cont to 2 or 3 arguments, this extra arity check reads more like a defensive internal invariant than user-facing validation.

A short comment explaining that it protects aggregate planning from accidentally feeding state-field types back into return_type would help clarify the intent for future readers.

@rluvaton
Copy link
Copy Markdown
Member Author

@kosiew sorry for the late reply, I missed that update

updated based on your review, can you please re-review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants