Skip to content

fix: parquet limit pruning for row group selections#22942

Merged
xudong963 merged 5 commits into
apache:mainfrom
haohuaijin:row-group-limit-selection-fix
Jun 18, 2026
Merged

fix: parquet limit pruning for row group selections#22942
xudong963 merged 5 commits into
apache:mainfrom
haohuaijin:row-group-limit-selection-fix

Conversation

@haohuaijin

@haohuaijin haohuaijin commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

Limit pruning handled row groups with RowSelection incorrectly. It counted the full row group size and could replace a selection with a full scan.

What changes are included in this PR?

  • Preserve existing row selections.
  • Count only selected rows when checking the limit.
  • Add regression tests for both cases.

Are these changes tested?

Yes. New unit tests cover preserving RowSelection and counting selected rows during limit pruning.

Are there any user-facing changes?

No API changes.

@github-actions github-actions Bot added the datasource Changes to the datasource crate label Jun 13, 2026
@xudong963 xudong963 self-requested a review June 16, 2026 03:26

@xudong963 xudong963 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix

@xudong963 xudong963 added this pull request to the merge queue Jun 18, 2026
Merged via the queue into apache:main with commit 1f45d83 Jun 18, 2026
35 checks passed
@haohuaijin

Copy link
Copy Markdown
Contributor Author

Thanks @xudong963

@haohuaijin haohuaijin deleted the row-group-limit-selection-fix branch June 18, 2026 06:55
haohuaijin added a commit to openobserve/datafusion that referenced this pull request Jun 18, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#22941

## Rationale for this change

Limit pruning handled row groups with `RowSelection` incorrectly. It
counted the full row group size and could replace a selection with a
full scan.

## What changes are included in this PR?

- Preserve existing row selections.
- Count only selected rows when checking the limit.
- Add regression tests for both cases.

## Are these changes tested?

Yes. New unit tests cover preserving `RowSelection` and counting
selected rows during limit pruning.

## Are there any user-facing changes?

No API changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

limit pruning ignores RowSelection

2 participants