Skip to content

Track spill read-back memory in SMJ#22103

Open
SubhamSinghal wants to merge 4 commits into
apache:mainfrom
SubhamSinghal:smj-spill-read-back-memory-accounting
Open

Track spill read-back memory in SMJ#22103
SubhamSinghal wants to merge 4 commits into
apache:mainfrom
SubhamSinghal:smj-spill-read-back-memory-accounting

Conversation

@SubhamSinghal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Follow-up to #21962.

Rationale for this change

After #21962, the memory pool accurately tracks residual join_arrays memory that remains after a BufferedBatch is
spilled to disk. However, when spilled batches are read back from disk during output materialization in
materialize_right_columns, the deserialized data temporarily exists in memory without any pool reservation.

  • Single-source path: one full batch loaded without reservation
  • Multi-source interleave path: ALL referenced spilled batches loaded simultaneously — N × batch_size untracked

The pool thinks these batches cost 0 bytes during read-back. Under memory pressure (the reason they were spilled), other
operators see stale headroom and may over-allocate, risking OOM.

What changes are included in this PR?

Changed materialize_right_columns from &self to &mut self and added grow/shrink at the exact points where spilled data is read from disk:

Path A (single source spilled):

  • grow(size_estimation) immediately before fetch_right_columns_by_idxs
  • shrink(size_estimation) immediately after

Path B (multi-source interleave):

  • Sum size_estimation for all spilled sources
  • grow(total) before source_data loading
  • shrink(total) after interleave completes

Uses unconditional grow() because the data must be read to produce output — there is no fallback. Same rationale as
#21962: if memory physically exists, the pool must reflect it.

Are these changes tested?

Yes — two new tests:

  • spill_read_back_memory_accounting: multiple buffered batches for same key (multi-source Path B) — verifies
    peak_mem_used >= size_estimation and pool.reserved() == 0 at end
  • spill_read_back_single_source: distinct keys with one batch per group (single-source Path A) — same assertions

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label May 11, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SubhamSinghal
Thanks for the follow up.
I found one blocking issue around memory reservation accounting on error paths in the spill read-back flow.

.set_max(self.reservation.size());
spill_read_mem += batch_mem;

let file = BufReader::new(File::open(spill_file.path())?);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like spill_read_mem is only shrunk after all file reads and all interleave calls succeed. Once self.reservation.grow(batch_mem) succeeds, any later ? such as from File::open, StreamReader::try_new, next().transpose(), or interleave can return early before the shrink happens at line 1651.

That leaves the reservation inflated until the stream is dropped, which breaks the grow/shrink accounting invariant on error paths and can leave the memory pool reporting stale reserved memory after a failed poll.

Could we make this temporary read-back reservation scoped/RAII-based, or otherwise guarantee that shrink runs on every return path after a successful grow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kosiew Thanks for highlighting this. Switched from manual grow/shrink on self.reservation to a scoped MemoryReservation via self.reservation.new_empty().

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SubhamSinghal
Looks 👍 to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants