Consistent handling of empty files in CAS#266
Conversation
mostynb
left a comment
There was a problem hiding this comment.
Hi, thanks for the PR.
Do you know of existing clients that are broken with the current setup? I was planning on making a new release soon, and if there is known breakage then I might add a smaller fix for that, and then we can refactor this code afterwards. Until recently this behaviour was invalid according to the spec, and I implemented this code just well enough to un-break bazel.
One downside to moving these checks into disk.Cache is that we don't always have the blob size to do a quick check for zero before doing a string comparison. IIRC the HTTP API doesn't guarantee that you always have the size info like gRPC does. Maybe we could add a size argument to disk.Get and disk.Contains and have the HTTP code specify -1 if the size is unkown? Then we can use a simple less than or equal to zero check before doing a string comparison with the empty hash.
Re the bytestream API, it is only intended to be used for large files (which are necessarily non-empty). It wouldn't hurt us to have a sanity check here too, but if you have clients doing this it might be worth reporting issues against them to switch to the ContentAddressableStorage APIs for small files, which I think are more efficient.
Re GetValidatedActionResult, I thought I covered all the corner cases where empty CAS blobs are valid, but it would not surprise me if I missed something.
I don't know (see my comment in #267)
Yes, good idea!
Good point! I don't know if there are such clients. Do you prefer grpc_bytstream.go to reject empty file requests with a clear error message? Or accept them and let disk.go handle also empty ones transparently?
Could the following be empty? bazel-remote/cache/disk/disk.go Line 663 in 69bce62 bazel-remote/cache/disk/disk.go Line 674 in 69bce62 |
Thanks to Ulrik Falklöf <ulrik.falklof@ericsson.com> for catching these. Background + discussion of followup work: buchgr#266
|
I think you're right about the two extra cases in GetValidatedActionResult. Here's a quickfix for those, and for the bytestream API (let's just handle this corner case, even though clients shouldn't use it): #270 Would you like to redo this PR, adding a size parameter to Get and Contains as described above (to be completed after I tag the next release)? |
Thanks to Ulrik Falklöf <ulrik.falklof@ericsson.com> for catching these. Background + discussion of followup work: #266
|
Yes, I would like to redo this PR, but I wait until after your release. I will then add a size parameter to Get and Contains. |
9fc4e57 to
cb19c95
Compare
|
The PR is now updated. Is it as you suggested @mostynb, with the added size parameter to Get and Contains? |
mostynb
left a comment
There was a problem hiding this comment.
Thanks for following up. Looks prety good, with some minor issues.
cb19c95 to
313ab7e
Compare
|
Thanks for the review! Something more? |
mostynb
left a comment
There was a problem hiding this comment.
LGTM with one comment fixup required- please make that change and push a squashed result, and I will land the change.
Thanks!
There have been inconsistencies where some functions expected empty blobs to be available on disk, but other functions avoided writing them. Those inconsistencies were probably corrected in commit 9945963. This commit moves all logic for empty blobs into disk.go in order to prevent future inconsistencies. A size parameter is added to disk.Get and disk.Contains, mainly for the empty blob handling. As a side effect, the expected size is checked against the found size.
313ab7e to
b1bde89
Compare
|
Thanks! I have now pushed a squashed commit, including the comment. |
|
Thanks- landed. |
Hi,
There were special handling of empty files in server/grpc_cas.go and
server/http.go. It assumed that empty files are not stored in CAS.
The reason was to handle clients (like bazel?) which in some cases
don't upload the empty blob, but do reference it. It is kind of OK
not to upload the empty blob, since it can be recongnised by hash
and trivally regenerated.
Problem was that server/grpc_bytestream.go,
GetValidatedActionResult in disk.go and perhaps more places, still
expected the empty blobs to be available on disk.
This commit moves the logic into the disk.go to assure the same
logic is used consistenly everywhere.
Would you accept this pull request, if I add and update unit tests?
BR
Ulrik Falklöf