Skip to content

Comments

Fix incorrect handling of missing values in ToParentBlockJoinSortField (#15548)#15727

Open
gmarouli wants to merge 11 commits intoapache:mainfrom
gmarouli:fix-bug-15548-incorrect-handling-of-missing-value
Open

Fix incorrect handling of missing values in ToParentBlockJoinSortField (#15548)#15727
gmarouli wants to merge 11 commits intoapache:mainfrom
gmarouli:fix-bug-15548-incorrect-handling-of-missing-value

Conversation

@gmarouli
Copy link

Description

Problem

ToParentBlockJoinSortField did not handle missing values for child documents. When a parent had a mix of children with a value and children without, the children without values were silently ignored from the min/max selection. This led to incorrect sort ordering -- the missing children should participate in the comparison using a configurable missing value, consistent with how the missing value in SortField works.

Additionally, the parent-level missingValue which was not configurable after #15483, is reinstated and configurable via the constructor.

Plan

  • Add support for missingValue configuration for the parent-level in ToParentBlockJoinSortField making it again configurable.
  • Rename the ambiguous reverse and order flags in the constructor of ToParentBlockJoinSortField.
  • Add support for missingValue configuration for the child-level in ToParentBlockJoinSortField and extend ToParentDocValues and BlockJoinSelector to detect children with missing values and incorporate the configurable missing value into the min/max aggregation, with proper handling for all supported types.

Fixes #15548

@gmarouli gmarouli marked this pull request as draft February 19, 2026 09:49
@github-actions github-actions bot added this to the 11.0.0 milestone Feb 19, 2026
@gmarouli gmarouli marked this pull request as ready for review February 19, 2026 11:07
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @gmarouli, thank you! I left some comments, but this all heading in the right direction.

}

private static void validate(Type type, Object childMissingValue) {
switch (type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we can use switch expressions here (ie, case STRING -> validateMissingValue())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I was a bit reluctant to add the comment // $CASES-OMITTED$ because I thought it might be recommended, but I did find other similar places in the code that it was used so it might be ok, right?

int prevParentDocID = parents.prevSetBit(docID - 1);
hasChildWithMissingValue = childrenWithValuesCount < getTotalChildrenCount(prevParentDocID);
if (hasChildWithMissingValue && missingWithChildWithoutValue) {
docID = nextDoc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid recursion here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I used a loop with the seen flag in the condition. I changed the code, so seen is set to true only when we are certain that the parent doc id will be returned.

It used to be set as long as we found a child doc, so I added a dedicated local flag for that and that allowed us to use the seen flag in the new loop.

@gmarouli gmarouli requested a review from romseygeek February 20, 2026 13:06
@gmarouli
Copy link
Author

This looks great @gmarouli, thank you! I left some comments, but this all heading in the right direction.

Thank you for all the guidance @romseygeek . I followed up to the comments and I am open to further changes if you see fit.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gmarouli! I left another comment.

int nextParentDocID = parents.nextSetBit(childWithValues.docID());
collector.reset();
seen = true;
int nextParentDocID = parents.nextSetBit(childWithValues.docID());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to check for NO_MORE_DOCS here now, given that it's possible that we iterate through parent docs until there are none left?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a test that exercises this scenario.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting it! It should have been in the loop. Fixed and test added, I added a new one dedicated to nextDoc because of its complexity, but if you think it would be better to add it to the existing one I can do it too.

@gmarouli gmarouli requested a review from romseygeek February 22, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ToParentBlockJoinSortField does not handle missing values correctly

2 participants