Track spill read-back memory in SMJ by SubhamSinghal · Pull Request #22103 · apache/datafusion

SubhamSinghal · 2026-05-11T06:57:50Z

Which issue does this PR close?

Follow-up to #21962.

Rationale for this change

After #21962, the memory pool accurately tracks residual join_arrays memory that remains after a BufferedBatch is
spilled to disk. However, when spilled batches are read back from disk during output materialization in
materialize_right_columns, the deserialized data temporarily exists in memory without any pool reservation.

Single-source path: one full batch loaded without reservation
Multi-source interleave path: ALL referenced spilled batches loaded simultaneously — N × batch_size untracked

The pool thinks these batches cost 0 bytes during read-back. Under memory pressure (the reason they were spilled), other
operators see stale headroom and may over-allocate, risking OOM.

What changes are included in this PR?

Changed materialize_right_columns from &self to &mut self and added grow/shrink at the exact points where spilled data is read from disk:

Path A (single source spilled):

grow(size_estimation) immediately before fetch_right_columns_by_idxs
shrink(size_estimation) immediately after

Path B (multi-source interleave):

Sum size_estimation for all spilled sources
grow(total) before source_data loading
shrink(total) after interleave completes

Uses unconditional grow() because the data must be read to produce output — there is no fallback. Same rationale as
#21962: if memory physically exists, the pool must reflect it.

Are these changes tested?

Yes — two new tests:

spill_read_back_memory_accounting: multiple buffered batches for same key (multi-source Path B) — verifies
peak_mem_used >= size_estimation and pool.reserved() == 0 at end
spill_read_back_single_source: distinct keys with one batch per group (single-source Path A) — same assertions

Are there any user-facing changes?

No.

kosiew

@SubhamSinghal
Thanks for the follow up.
I found one blocking issue around memory reservation accounting on error paths in the spill read-back flow.

kosiew · 2026-05-14T07:28:36Z

+                        .set_max(self.reservation.size());
+                    spill_read_mem += batch_mem;
+
+                    let file = BufReader::new(File::open(spill_file.path())?);


It looks like spill_read_mem is only shrunk after all file reads and all interleave calls succeed. Once self.reservation.grow(batch_mem) succeeds, any later ? such as from File::open, StreamReader::try_new, next().transpose(), or interleave can return early before the shrink happens at line 1651.

That leaves the reservation inflated until the stream is dropped, which breaks the grow/shrink accounting invariant on error paths and can leave the memory pool reporting stale reserved memory after a failed poll.

Could we make this temporary read-back reservation scoped/RAII-based, or otherwise guarantee that shrink runs on every return path after a successful grow?

@kosiew Thanks for highlighting this. Switched from manual grow/shrink on self.reservation to a scoped MemoryReservation via self.reservation.new_empty().

kosiew

@SubhamSinghal
Looks 👍 to me

Track spill read-back memory in SMJ

432f5c7

github-actions Bot added the physical-plan Changes to the physical-plan crate label May 11, 2026

kosiew requested changes May 14, 2026

View reviewed changes

SubhamSinghal added 2 commits May 14, 2026 16:17

Merge branch 'main' into smj-spill-read-back-memory-accounting

bf54239

Use MemoryReservation for memorary tracking

81be3f9

kosiew approved these changes May 14, 2026

View reviewed changes

Merge branch 'main' into smj-spill-read-back-memory-accounting

5614862

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track spill read-back memory in SMJ#22103

Track spill read-back memory in SMJ#22103
SubhamSinghal wants to merge 4 commits into
apache:mainfrom
SubhamSinghal:smj-spill-read-back-memory-accounting

SubhamSinghal commented May 11, 2026

Uh oh!

kosiew left a comment •

edited

Loading

Uh oh!

kosiew May 14, 2026

Uh oh!

SubhamSinghal May 14, 2026

Uh oh!

kosiew left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SubhamSinghal commented May 11, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

kosiew left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kosiew May 14, 2026

Choose a reason for hiding this comment

Uh oh!

SubhamSinghal May 14, 2026

Choose a reason for hiding this comment

Uh oh!

kosiew left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kosiew left a comment •

edited

Loading