Skip to content

Consistent handling of empty files in CAS#266

Merged
mostynb merged 1 commit into
buchgr:masterfrom
ulrfa:consistent_empty_files
Jun 1, 2020
Merged

Consistent handling of empty files in CAS#266
mostynb merged 1 commit into
buchgr:masterfrom
ulrfa:consistent_empty_files

Conversation

@ulrfa
Copy link
Copy Markdown
Contributor

@ulrfa ulrfa commented May 7, 2020

Hi,

There were special handling of empty files in server/grpc_cas.go and
server/http.go. It assumed that empty files are not stored in CAS.
The reason was to handle clients (like bazel?) which in some cases
don't upload the empty blob, but do reference it. It is kind of OK
not to upload the empty blob, since it can be recongnised by hash
and trivally regenerated.

Problem was that server/grpc_bytestream.go,
GetValidatedActionResult in disk.go and perhaps more places, still
expected the empty blobs to be available on disk.

This commit moves the logic into the disk.go to assure the same
logic is used consistenly everywhere.

Would you accept this pull request, if I add and update unit tests?

BR
Ulrik Falklöf

Copy link
Copy Markdown
Collaborator

@mostynb mostynb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the PR.

Do you know of existing clients that are broken with the current setup? I was planning on making a new release soon, and if there is known breakage then I might add a smaller fix for that, and then we can refactor this code afterwards. Until recently this behaviour was invalid according to the spec, and I implemented this code just well enough to un-break bazel.

One downside to moving these checks into disk.Cache is that we don't always have the blob size to do a quick check for zero before doing a string comparison. IIRC the HTTP API doesn't guarantee that you always have the size info like gRPC does. Maybe we could add a size argument to disk.Get and disk.Contains and have the HTTP code specify -1 if the size is unkown? Then we can use a simple less than or equal to zero check before doing a string comparison with the empty hash.

Re the bytestream API, it is only intended to be used for large files (which are necessarily non-empty). It wouldn't hurt us to have a sanity check here too, but if you have clients doing this it might be worth reporting issues against them to switch to the ContentAddressableStorage APIs for small files, which I think are more efficient.

Re GetValidatedActionResult, I thought I covered all the corner cases where empty CAS blobs are valid, but it would not surprise me if I missed something.

@ulrfa ulrfa mentioned this pull request May 8, 2020
@ulrfa
Copy link
Copy Markdown
Contributor Author

ulrfa commented May 8, 2020

Do you know of existing clients that are broken with the current setup?

I don't know (see my comment in #267)

Maybe we could add a size argument to disk.Get and disk.Contains and have the HTTP code specify -1 if the size is unkown? ... Then we can use a simple less than or equal to zero check before doing a string comparison with the empty hash.

Yes, good idea!

Re the bytestream API, it is only intended to be used for large files (which are necessarily non-empty). It wouldn't hurt us to have a sanity check here too, but if you have clients doing this it might be worth reporting issues against them to switch to the ContentAddressableStorage APIs for small files, which I think are more efficient.

Good point! I don't know if there are such clients. Do you prefer grpc_bytstream.go to reject empty file requests with a clear error message? Or accept them and let disk.go handle also empty ones transparently?

Re GetValidatedActionResult, I thought I covered all the corner cases where empty CAS blobs are valid, but it would not surprise me if I missed something.

Could the following be empty?

found, _ := c.Contains(cache.CAS, f.Digest.Hash)

found, _ := c.Contains(cache.CAS, f.Digest.Hash)

mostynb added a commit to mostynb/bazel-remote that referenced this pull request May 9, 2020
Thanks to Ulrik Falklöf <ulrik.falklof@ericsson.com> for catching these.

Background + discussion of followup work:
buchgr#266
@mostynb
Copy link
Copy Markdown
Collaborator

mostynb commented May 9, 2020

I think you're right about the two extra cases in GetValidatedActionResult. Here's a quickfix for those, and for the bytestream API (let's just handle this corner case, even though clients shouldn't use it): #270

Would you like to redo this PR, adding a size parameter to Get and Contains as described above (to be completed after I tag the next release)?

mostynb added a commit that referenced this pull request May 11, 2020
Thanks to Ulrik Falklöf <ulrik.falklof@ericsson.com> for catching these.

Background + discussion of followup work:
#266
@ulrfa
Copy link
Copy Markdown
Contributor Author

ulrfa commented May 11, 2020

Yes, I would like to redo this PR, but I wait until after your release. I will then add a size parameter to Get and Contains.

@ulrfa ulrfa force-pushed the consistent_empty_files branch from 9fc4e57 to cb19c95 Compare May 25, 2020 10:56
@ulrfa
Copy link
Copy Markdown
Contributor Author

ulrfa commented May 25, 2020

The PR is now updated. Is it as you suggested @mostynb, with the added size parameter to Get and Contains?

Copy link
Copy Markdown
Collaborator

@mostynb mostynb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up. Looks prety good, with some minor issues.

Comment thread cache/disk/disk.go
Comment thread cache/disk/disk.go Outdated
Comment thread cache/disk/disk.go Outdated
Comment thread cache/disk/disk.go Outdated
Comment thread cache/disk/disk.go
Comment thread cache/disk/disk_test.go Outdated
Comment thread cache/s3proxy/s3proxy.go
Comment thread server/grpc_ac.go Outdated
@ulrfa ulrfa force-pushed the consistent_empty_files branch from cb19c95 to 313ab7e Compare May 28, 2020 21:02
@ulrfa
Copy link
Copy Markdown
Contributor Author

ulrfa commented May 28, 2020

Thanks for the review! Something more?

Copy link
Copy Markdown
Collaborator

@mostynb mostynb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one comment fixup required- please make that change and push a squashed result, and I will land the change.

Thanks!

Comment thread cache/disk/disk_test.go
There have been inconsistencies where some functions expected empty blobs
to be available on disk, but other functions avoided writing them.
Those inconsistencies were probably corrected in commit 9945963.

This commit moves all logic for empty blobs into disk.go in order to prevent
future inconsistencies.

A size parameter is added to disk.Get and disk.Contains, mainly for
the empty blob handling. As a side effect, the expected size is
checked against the found size.
@ulrfa ulrfa force-pushed the consistent_empty_files branch from 313ab7e to b1bde89 Compare May 31, 2020 20:37
@ulrfa
Copy link
Copy Markdown
Contributor Author

ulrfa commented May 31, 2020

Thanks! I have now pushed a squashed commit, including the comment.

@mostynb mostynb merged commit db90811 into buchgr:master Jun 1, 2020
@mostynb
Copy link
Copy Markdown
Collaborator

mostynb commented Jun 1, 2020

Thanks- landed.

@ulrfa ulrfa deleted the consistent_empty_files branch June 1, 2020 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants