Fix EOF retry bug that could cause data duplication when Range header is ignored #609
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
N/A - no existing issue. Happy to create one if preferred.
Rationale for this change
If a reqwest timeout occurs after consuming all bytes from an S3 response body but before receiving the EOF signal,
retry_streamattempts to retry with aRangeheader likebytes=5-4(where 5 is the file size). This is an invalid/backwards range.Per RFC 7233, servers MUST ignore syntactically invalid Range headers. S3 follows this spec and returns
200 OKwith the full file instead of206 Partial Contentor416 Range Not Satisfiable.Without validation,
retry_streamwould read the full file again, causing data duplication (e.g.,"hellohello"instead of"hello").What changes are included in this PR?
Early EOF check: If
range.start >= range.endbefore retrying, return EOF immediately rather than sending an invalid range request.206 status validation: Verify the retry response is
206 Partial Content. If the server returns200 OK(indicating it ignored our Range header), fail the retry and return the original error.Content-Range validation: Verify the
Content-Rangeheader matches the requested range. This catches cases where the server returns a different range than requested.Warning logs: Added
warn!logs for the new validation failures to aid debugging.Are there any user-facing changes?
No breaking changes. Users may see different error behavior in edge cases where retries were previously succeeding incorrectly (by silently duplicating data).
Test Coverage
Automated test coverage for this specific bug is limited for two reasons:
Requires real S3: LocalStack returns
416 Range Not Satisfiablefor invalid ranges, which is actually non-compliant with RFC 7233. Real S3 correctly ignores the invalid range and returns200 OKwith the full file, which is what triggers the bug.Hard to trigger: The bug requires a timeout (or other error) from reqwest after all data is read but before EOF is signaled. This is timing-dependent and would require integration tests with long sleeps to force a timeout.
A manual reproduction script is available in this gist: https://gist.github.com/sarum90/cb95633750390db690ef701f08aa20ed
The script demonstrates the bug against real S3 and shows the data duplication.