Stream tar extraction to disk and add file-based unpack#167
Merged
Conversation
When extracting tar archives to disk, stream file entries in chunks instead of reading them fully into memory. Also replace hand-rolled tar parsing in hex_tarball with hex_erl_tar:extract.
Backport the streamed_extract test from erlang/otp#10818 to verify that files of various sizes (empty, small, chunk-boundary, and large) are correctly extracted when streaming to disk.
The compressed_one option for file:open is not available on OTP 24, causing file-based extraction of compressed tar entries to silently open files without decompression. Use compressed instead, which has the same behavior for single-member gzip files like contents.tar.gz.
Member
Author
|
Also see erlang/otp#10818 for the backport of streaming file extract. |
…dd {file, Path} support to unpack_docs
Previously the outer tarball extraction mode was tied to the input format.
Now Output drives the strategy: memory extracts to memory, path/none extracts
to a temp dir. Also adds {file, Path} input support to unpack_docs/2,3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When extracting tar entries to disk,
hex_erl_tarpreviously read each file entry fully into memory before writing it to disk. This change makes the disk extraction path stream file entries in chunks (default 64KB) directly to the output file.hex_tarball:unpack/2,3- Added{file, Path}as first argument to read tarballs from disk without loading into memoryhex_tarball:unpack/2,3- Addednoneas output mode to extract only metadata and checksums, skipping contentshex_tarball:unpack/2,3- Refactored soOutputdrives the outer extraction strategy:memoryextracts to memory, path/none extracts to a temp dirhex_tarball:unpack_docs/2,3- Added{file, Path}as first argument to read doc tarballs from disk without loading into memoryhex_erl_tar:extract/2- Added{chunks, N}option to control chunk size for streaming extraction to disk