fix: avoid ListView take_reduce rebuild for dense selections#7339
fix: avoid ListView take_reduce rebuild for dense selections#7339dimitarvdimitrov wants to merge 8 commits intodevelopfrom
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
|
Am I missing something? Why would we ever rebuild? The whole point of ListView is to have cheap take/filter. It should be explicit from the user if they want to garbage collect |
The rebuild is still worthwhile for sparse takes where we'd otherwise drag around a lot of unused elements when exporting e.g. duckdb's ListViewExporter vortex/vortex-duckdb/src/exporter/list_view.rs Lines 80 to 85 in 11d607e ideally we do this as heuristics in the exporter as opposed to in each operation like take. I was thinking of doing that next, but it might take a bit longer |
Yeah, 100%. Agree priority might not be to do it now, but eager GC impacts all Vortex users |
When the selection density is above the threshold, skip take_reduce so we don't rebuild the ListView for dense takes. Signed-off-by: Dimitar Dimitrov <mitko@spiraldb.com> Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
1074601 to
0cd4f36
Compare
|
looking into why the benchmarks regressed... |
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
| // TODO(connor)[ListView]: Ideally, we would only rebuild after all `take`s and `filter` | ||
| // compute functions have run, at the "top" of the operator tree. However, we cannot do this | ||
| // right now, so we will just rebuild every time (similar to [`ListArray`]). | ||
| pub const REBUILD_DENSITY_THRESHOLD: f32 = 0.1; |
There was a problem hiding this comment.
i think this should move? and also be pub crate?
There was a problem hiding this comment.
any suggestion for where to move it? it's now used in arrays/listview/compute and arrays/filter/execute
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
…iew-take-reduce-rebuild
The
take_reducepath rebuilds the ListView, which is expensive.For dense selections (density above the rebuild threshold) there's little garbage to reclaim, so skip the rebuild and let the regular take path handle it. The rebuild is still worthwhile for sparse takes where we'd otherwise drag around a lot of unused elements when exporting (e.g. to DuckDB).
This PR does a very crude estimation of the number of elements that stay in the ListViewArray.
Potentially we should do this estimation at export time in the
ListViewExporter, but that's less trivial.