AllInvalid arrays have non empty distinct info#7369
Conversation
Signed-off-by: Robert Kruszewski <github@robertk.io>
Merging this PR will improve performance by 29.94%
Performance Changes
Comparing Footnotes
|
| erased: TypedStats { | ||
| distinct: Some(DistinctInfo { | ||
| distinct_values: HashSet::with_capacity_and_hasher(0, FxBuildHasher), | ||
| distinct_count: 0, |
There was a problem hiding this comment.
distinct count has to be 0, this is an invariant that other code relies on.
Let me check where
There was a problem hiding this comment.
we use it in the constant schemes where if the distinct count is not greater than 1, it must be exactly 1.
I think it is actually fine though since that exact code path wont happen, but I would prefer if we just set distinct count to 1 and then add a big TODO and red flag that this is a hack. Once I finish the vector search stuff I can revisit this and come up with a more robust solution while adding all the things that would be good to have in the compressor (#7216)
There was a problem hiding this comment.
not sure why setting any value is meaningful here, the idea is that they will not look at it and we just need to have a present value. This is also internally consistent where all null arrays aren't constant
There was a problem hiding this comment.
its just that I rely on that fact at other callsites, so it might panic there instead. I guess it doesn't really matter either way since both of these fixes are technically incorrect
connortsui20
left a comment
There was a problem hiding this comment.
i will fix this next week
This is required to handle cases where we are attempting to dict compress and
a sample ends up being all null
Signed-off-by: Robert Kruszewski github@robertk.io