Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion vortex-compressor/src/stats/float.rs
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,13 @@ where
null_count: u32::try_from(array.len())?,
value_count: 0,
average_run_length: 0,
erased: TypedStats { distinct: None }.into(),
erased: TypedStats {
distinct: Some(DistinctInfo {
distinct_values: HashSet::with_capacity_and_hasher(0, FxBuildHasher),
distinct_count: 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distinct count has to be 0, this is an invariant that other code relies on.

Let me check where

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use it in the constant schemes where if the distinct count is not greater than 1, it must be exactly 1.

I think it is actually fine though since that exact code path wont happen, but I would prefer if we just set distinct count to 1 and then add a big TODO and red flag that this is a hack. Once I finish the vector search stuff I can revisit this and come up with a more robust solution while adding all the things that would be good to have in the compressor (#7216)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why setting any value is meaningful here, the idea is that they will not look at it and we just need to have a present value. This is also internally consistent where all null arrays aren't constant

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its just that I rely on that fact at other callsites, so it might panic there instead. I guess it doesn't really matter either way since both of these fixes are technically incorrect

}),
}
.into(),
});
}

Expand Down
7 changes: 6 additions & 1 deletion vortex-compressor/src/stats/integer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -338,7 +338,12 @@ where
erased: TypedStats {
min: T::max_value(),
max: T::min_value(),
distinct: None,
distinct: Some(DistinctInfo {
distinct_values: HashMap::with_capacity_and_hasher(0, FxBuildHasher),
distinct_count: 0,
most_frequent_value: T::zero(),
top_frequency: 0,
}),
}
.into(),
});
Expand Down
Loading