Fix incorrect handling of missing values in ToParentBlockJoinSortField (#15548)#15727
Fix incorrect handling of missing values in ToParentBlockJoinSortField (#15548)#15727gmarouli wants to merge 11 commits intoapache:mainfrom
Conversation
romseygeek
left a comment
There was a problem hiding this comment.
This looks great @gmarouli, thank you! I left some comments, but this all heading in the right direction.
lucene/join/src/java/org/apache/lucene/search/join/ToParentDocValues.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| private static void validate(Type type, Object childMissingValue) { | ||
| switch (type) { |
There was a problem hiding this comment.
Nit: we can use switch expressions here (ie, case STRING -> validateMissingValue())
There was a problem hiding this comment.
Done, I was a bit reluctant to add the comment // $CASES-OMITTED$ because I thought it might be recommended, but I did find other similar places in the code that it was used so it might be ok, right?
lucene/join/src/java/org/apache/lucene/search/join/ToParentDocValues.java
Outdated
Show resolved
Hide resolved
| int prevParentDocID = parents.prevSetBit(docID - 1); | ||
| hasChildWithMissingValue = childrenWithValuesCount < getTotalChildrenCount(prevParentDocID); | ||
| if (hasChildWithMissingValue && missingWithChildWithoutValue) { | ||
| docID = nextDoc(); |
There was a problem hiding this comment.
Can we avoid recursion here?
There was a problem hiding this comment.
Yes, I used a loop with the seen flag in the condition. I changed the code, so seen is set to true only when we are certain that the parent doc id will be returned.
It used to be set as long as we found a child doc, so I added a dedicated local flag for that and that allowed us to use the seen flag in the new loop.
lucene/join/src/java/org/apache/lucene/search/join/ToParentDocValues.java
Outdated
Show resolved
Hide resolved
c3f4c8a to
b03918f
Compare
Thank you for all the guidance @romseygeek . I followed up to the comments and I am open to further changes if you see fit. |
romseygeek
left a comment
There was a problem hiding this comment.
Thanks @gmarouli! I left another comment.
| int nextParentDocID = parents.nextSetBit(childWithValues.docID()); | ||
| collector.reset(); | ||
| seen = true; | ||
| int nextParentDocID = parents.nextSetBit(childWithValues.docID()); |
There was a problem hiding this comment.
I think we need to check for NO_MORE_DOCS here now, given that it's possible that we iterate through parent docs until there are none left?
There was a problem hiding this comment.
Worth adding a test that exercises this scenario.
There was a problem hiding this comment.
Thank you for spotting it! It should have been in the loop. Fixed and test added, I added a new one dedicated to nextDoc because of its complexity, but if you think it would be better to add it to the existing one I can do it too.
Description
Problem
ToParentBlockJoinSortFielddid not handle missing values for child documents. When a parent had a mix of children with a value and children without, the children without values were silently ignored from the min/max selection. This led to incorrect sort ordering -- the missing children should participate in the comparison using a configurable missing value, consistent with how the missing value inSortFieldworks.Additionally, the parent-level
missingValuewhich was not configurable after #15483, is reinstated and configurable via the constructor.Plan
missingValueconfiguration for the parent-level inToParentBlockJoinSortFieldmaking it again configurable.reverseandorderflags in the constructor ofToParentBlockJoinSortField.missingValueconfiguration for the child-level inToParentBlockJoinSortFieldand extendToParentDocValuesandBlockJoinSelectorto detect children with missing values and incorporate the configurable missing value into the min/max aggregation, with proper handling for all supported types.Fixes #15548