Add support for the compressed-blobs REAPI proposal by mostynb · Pull Request #386 · buchgr/bazel-remote

mostynb · 2021-01-03T17:08:21Z

This PR adds support for the compressed-blobs REAPI proposal.

By default, uploaded CAS blobs are stored in a file with a header and zstandard-compressed chunks (corresponding to 1M of raw data, initially). When clients request one of these items in zstandard compressed form, bazel-remote can stream data without recompression (or if the client requests data from an offset, at most one chunk will be recompressed).

Preexisting CAS blobs are not migrated to the compressed format, because this can take a lot of time for large caches. Instead, these blobs are moved into the new cache directory structure with a .v1 filename extension. If the storage mode is set to "uncompressed" then CAS blobs are always stored in this form. (I plan to investigate on-the-fly recompression for preexisting blobs later.)

The HTTP interface also supports zstandard compressed blobs via Accept-Encoding: zstd for GET and Content-Encoding: zstd and a custom X-Digest-SizeBytes header for PUT.

Implements #12.

mostynb · 2021-01-03T17:19:47Z

@glukasiknuro, @ulrfa, @bdittmer, @buchgr: FYI- this is a significant change.

Once this lands and stabilises, I'll make a 2.0.0 release.

mostynb · 2021-01-04T05:50:01Z

+		hash := ks[len(ks)-sha256.Size*2:]
+		var kind cache.EntryKind = cache.AC
+		if strings.HasPrefix(ks, "cas") {
+			kind = cache.CAS


This needs to be updated to handle uncompressed .v1 files.

I should also add back some eviction tests with uncompressed storage mode, since the size is predictable again.

The proxy interface also needs fixing.

ulrfa

Interesting @mostynb! I think the network bandwidth saving can be beneficial for my use cases! 🎉

Do you know if the bazel client, and others such as buildbarn, have plans to implement or make use of this feature?

Do you believe clients will typically use zstd for all their CAS uploads/downloads (including small files)? Do you expect transcoding to be common?

However, I wonder about reading from offset in compressed files. What use cases do you have in mind for that? It seems to me that it comes with drawbacks:

A) Need for header in each file.
B) Re-compression when receiving already compressed files via HTTP PUT.
C) Complexity.
D) Performance overhead (due to A and B).

For me it would be attractive to have the files stored without the extra header, and instead with a .zst file name extension (RFC 8478). Primarily for performance reasons, but also to allow digging into the cache directories with standard zstd command line tools. The logical size could also be embedded in the file name if needed.

Have you run performance benchmarks?

ulrfa · 2021-01-07T23:58:51Z

-			if err != nil {
-				return err
+			if !legacy {
+				size, err = casblob.GetLogicalSize(f.name)


I wonder how long time it takes to restart bazel-remote, when data have to be read from each file?

An alternative could be to embed the logical size as part of the file name.

In my largest bazel-remote instance that's running something close to this branch, restarting is fast enough. It's still growing though, so I can re-check once it has filled the space.

The CAS in one of our cache instances is 1500 GB and contains 8 000 000 files.

I’m emulating restarting the cache by reading 8 bytes from them. Running “xargs head -c 8 > /dev/null” over them took 20 minutes. That is too long for us.

Re-running again with warm file system caches took 45 seconds.

(A bazel-remote instance was using the CAS at the same time)

Some of our caches are even larger. All of them are on SSD, but other people might have rotating disks where these 8 byte random-accesses takes even more time.

I think a 45 second restart for a large cache is reasonable, though not ideal.

A 20 minute cold start is pretty bad though. There is one other potential startup optimization that could help- loading file details concurrently (I know that @bdittmer has a local patch for this, which helps even at small scale). I wonder if you could try using xargs -P 72 in your benchmark (and maybe a couple of lower values too)?

(I don't think we should bother considering HDD-speeds, it's the wrong tool for the job and I would rather just recommend consumer SSD-like speed storage as a minimum.)

45 second restart time is longer than I would like in production. Because the longer time the cache is down, while restarting to upgrade version or change configuration, the higher risk for failing builds-without-the-bytes assumption about cache always available.

A first try with -P 72 took 4 minutes, but when re-trying with -P 72 more times, it took 1m45s, despite clearing filesystem cache with "sync; echo 3 > /proc/sys/vm/drop_caches" in between. Due to the difference, I don't fully trust the numbers. I wonder if there are some other caching layer involved, or if the first try coincided with peak load from the bazel-remote instance also running towards the disk. The fastest time I got was with -P 64, which was 1m 17s. If you are interested, I can list values for all -P values I tried.

But it come to my mind that there is one more large issue: Reading headers from files during startup, would touch file system access time, and that affects the ordering in LRU.

I believe all these issues can be avoided by simply storing the logical size embedded in the file name, like: cas/ef95d34495b36c0a6108c4e48679ebebfb1cee01364e3dceead1f95ae4276d2b_450.zst, if the logical size is 450 bytes and compressed with zstandard.

Or even better, I would prefer having both the logical size and the on-disk-size embedded in the file name. Since if there is a guaranteed one-to-one association between disk.lruItem entries and file names, then we can consider releasing the mutex during some syscalls in disk.go, as discussed in #323. Any changes to the mutexes could be considered in later separate work, but embedding both sizes, as preparation, would be easier in this PR.

Example zstandard compressed with logical size 450 bytes and on-disk-size 290 bytes:

cas/ef95d34495b36c0a6108c4e48679ebebfb1cee01364e3dceead1f95ae4276d2b_450_290.zst

Example non-compressed with both logical and on-disk size as 450 bytes:

cas/ef95d34495b36c0a6108c4e48679ebebfb1cee01364e3dceead1f95ae4276d2b_450

Those numbers look bad enough to give up on reading all the headers during loading. I will start refactoring.

I don't think the combination of logical size and on-disk size would be unique enough for the early mutex release idea- that would probably require storing a pseudo-random string or number in each lruItem.

Aside: using atimes for the initial LRU order is only approximate, because Contains updates the LRU but for efficiency reasons does not touch the filesystem (builds-without-the-bytes makes this worse). Reading data from file headers didn't make things much worse because the files were sorted by atime in one pass, then imported in that order in a second pass (but it added bias to the filesystem atime order for compressed CAS items). If the cache was running long enough then the atime order would become more similar to the LRU index order.

In the future, I might consider writing out the index order to disk prior to clean shutdown (and maybe also periodically) and then only use atimes as a fallback.

I don't think the combination of logical size and on-disk size would be unique enough for the early mutex release

I believe it would be enough, since then it would not matter how concurrent PUT requests of same key are interleaved. Because all their LRU entry commits would be compatible with any of their files, and it would not matter if we end up with LRU commit from one request and file from another. PUT requests interleaved with evictions of same key, is more complex, but I have hope that is also possible without holding mutexes during file system syscalls.

Reading data from file headers didn't make things much worse because the files were sorted by atime in one pass, then imported in that order in a second pass

Our CAS file systems are mounted with the relatime option, so each access time is not updated more than once a day. That is normally not an issue, since files remains in our caches much longer than a day. But I suspect that same order in first and second pass would not have been sufficient with relatime.

Thanks for the refactoring!

mostynb · 2021-01-08T09:26:59Z

Interesting @mostynb! I think the network bandwidth saving can be beneficial for my use cases! 🎉

Do you know if the bazel client, and others such as buildbarn, have plans to implement or make use of this feature?

Bazel plans to implement this: bazelbuild/bazel#12670
It looks like buildfarm might too: buildfarm/buildfarm#504

I don't think buildbarn is interested in implementing this for now (but I will offer to contribute patches later).

Do you believe clients will typically use zstd for all their CAS uploads/downloads (including small files)? Do you expect transcoding to be common?

It's entirely up to the client, so hard to predict (this is why I added a storage mode config option). However zstandard is fast enough that it might be OK to do a lot of transcoding, and there is still some benefit to be had from the increased effective cache size.

I suspect that bazel will add a simple on/off setting for using compressed-blobs (ie use it for all blobs, or for none), and maybe stop there.

However, I wonder about reading from offset in compressed files. What use cases do you have in mind for that? It seems to me that it comes with drawbacks:

A) Need for header in each file.
B) Re-compression when receiving already compressed files via HTTP PUT.
C) Complexity.
D) Performance overhead (due to A and B).

The use case is for resuming interrupted downloads of large blobs, without resorting to downloading in uncompressed form or decompressing the entire blob + recompressing the tail. With the chunked format, at most one chunk needs to be recompressed.

The server needs to decompress the blob after receiving it in order to verify the hash anyway, so it's just one extra compression, which should be OK if blobs are likely to be read multiple times.

For me it would be attractive to have the files stored without the extra header, and instead with a .zst file name extension (RFC 8478). Primarily for performance reasons, but also to allow digging into the cache directories with standard zstd command line tools. The logical size could also be embedded in the file name if needed.

There is a seekable zstandard format, but as far as I know nothing supports it yet. I would consider migrating to that later if it takes off.

In the meantime if needed we could provide a small utility which strips the header from a blob (the concatenated chunks can be processed by standard tools).

Have you run performance benchmarks?

Transferring large compressible blobs is much faster (of course), and I see cache sizes about 30% of the uncompressed size in my test cases (this will vary for different workloads).

ulrfa · 2021-01-09T12:51:18Z

However, I wonder about reading from offset in compressed files. What use cases do you have in mind for that? It seems to me that it comes with drawbacks:
A) Need for header in each file.
B) Re-compression when receiving already compressed files via HTTP PUT.
C) Complexity.
D) Performance overhead (due to A and B).

The use case is for resuming interrupted downloads of large blobs, without resorting to downloading in uncompressed form or decompressing the entire blob + recompressing the tail. With the chunked format, at most one chunk needs to be recompressed.

Are people often experiencing interrupted downloads of huge files from CAS? (Personally, I’m not)

I wonder how the performance is affected with huge amount of concurrent HTTP requests? I would be happy to run benchmarks with 72 core machines and 20 Gbps network. But how to best do that now without zstd support in bazel client?

Would it make sense to benchmark the following scenarios?

A. Non-compressed HTTP GET & PUT requests from bazel clients, to bazel-remote --storage_mode zstd
- In order to detect overhead by transcoding
- In order to detect overhead by io.Copy not being able to optimize with sendfile(2) since source is not an *os.File regular file.
B. Non-compressed HTTP GET & PUT requests from bazel clients, to bazel-remote --storage_mode uncompressed
- In order to detect overhead by all the small file syscalls to read/write the header

mostynb · 2021-01-11T23:04:25Z

Are people often experiencing interrupted downloads of huge files from CAS? (Personally, I’m not)

I don't have an answer to this, because IIRC it's possible to download uncompressed blobs from an offset and therefore I'm unlikely to get complaints about it.

I wonder how the performance is affected with huge amount of concurrent HTTP requests? I would be happy to run benchmarks with 72 core machines and 20 Gbps network. But how to best do that now without zstd support in bazel client?

I think the best we can do in the meantime is try to put together an artificial stress-test setup using something like curl (recent versions support zstd downloads) or a custom client (which is fairly easy to write). Then we can ramp up the load with xargs or make or similar.

Would it make sense to benchmark the following scenarios?

A. Non-compressed HTTP GET & PUT requests from bazel clients, to bazel-remote --storage_mode zstd

In order to detect overhead by transcoding

In order to detect overhead by io.Copy not being able to optimize with sendfile(2) since source is not an *os.File regular file.

That is measurable today, with bazel configured to use remote cache via gRPC or HTTP. We should expect higher CPU and memory usage on the server, but hopefully a larger effective cache size.

B. Non-compressed HTTP GET & PUT requests from bazel clients, to bazel-remote --storage_mode uncompressed

In order to detect overhead by all the small file syscalls to read/write the header

There should be no extra load if --storage_mode uncompressed stores raw files with a ".v1" extension to distinguish them from the new header + compressed blob format. I added this for the "migrating existing files from the old directory structure" step, but I can't remember if I have added if for the Put code path yet. I'll follow up tomorrow.

Update: I need to fix the Put code path.

mostynb · 2021-01-12T12:46:24Z

BTW, I am also thinking through some --storage_mode hybrid ideas, but it's not something I want to attempt in the first PR. For example:

Store blobs in the form they were uploaded (uncompressed or in our compressed format).
Compress and replace uncompressed blobs if/when a client requests them in compressed form.

mostynb · 2021-01-12T13:18:56Z

Interesting, maybe we could implement the header as a skippable frame (or multiple skippable frames), in which case the entire compressed file would be decodable by any compliant zstandard decoder: https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#skippable-frames

The cost compared to the current header would only be 8 bytes per file for most files.

mostynb · 2021-01-15T08:27:50Z

Pushed some new commits:

Make the compressed CAS blob file format regular .zst (the header is now a zstd skippable frame).
Include the logical size in the filename of compressed blobs, so we can import them from the directory listing alone (ie without reading the header of each blob).
Fix the uploaded CAS blob filenames for --storage_mode uncompressed.

The first two changes break compatibility with cache directories created with earlier versions of this branch/PR.

(The proxy backends are still broken, that's next on my TODO list.)

ulrfa · 2021-01-17T00:17:01Z

BTW, I am also thinking through some --storage_mode hybrid ideas, but it's not something I want to attempt in the first PR. For example:

Store blobs in the form they were uploaded (uncompressed or in our compressed format).

Interesting idea!

The client might have additional information to make decision if certain files are worth compressing or not, then it could make sense to also store them in CAS as they were uploaded.

Or bazel-remote could decide to not compress files smaller than 4 KB? Since those would still consume a whole file system block anyway.

Below follows an example of the file size distribution in one of our caches.

Note that 60% of the files in that CAS have size of 128 bytes or less. Those should not be compressed.

Also note that 98% of the files would fit in a single 1M chunk. For them the seekable zstd table is unnecessary.

Size  Number of files in CAS
   8:       6
  16:   47690
  32:     100
  64:     537
 128: 6471070
 256:    5044
 512:   64129
  1k:  183429
  2k:  320129
  4k:  998385
  8k: 1611520
 16k:  264389
 32k:  269138
 64k:  177826
128k:  109303
256k:   96537
512k:   61308
  1M:   22527
  2M:   70088
  4M:   62483
  8M:   55450
 16M:    5043
 32M:    1657
 64M:      34
256M:       2

mostynb · 2021-01-18T09:12:49Z

BTW, I am also thinking through some --storage_mode hybrid ideas, but it's not something I want to attempt in the first PR. For example:

Store blobs in the form they were uploaded (uncompressed or in our compressed format).

Interesting idea!

The client might have additional information to make decision if certain files are worth compressing or not, then it could make sense to also store them in CAS as they were uploaded.

Or bazel-remote could decide to not compress files smaller than 4 KB? Since those would still consume a whole file system block anyway.

Note that there are two other costs to consider: network transfer time (even if we didn't save disk space, we might save transfer time) and re-compression time (eg if new clients request all blobs in compressed form). This is something that I think we could experiment with in the "hybrid" storage mode.

Below follows an example of the file size distribution in one of our caches.

Interesting- can you share the language(s) in use here?

Note that 60% of the files in that CAS have size of 128 bytes or less. Those should not be compressed.

Also note that 98% of the files would fit in a single 1M chunk. For them the seekable zstd table is unnecessary.

Right, it's an optimization for large blobs.

ulrfa · 2021-01-18T15:57:38Z

Note that there are two other costs to consider: network transfer time (even if we didn't save disk space, we might save transfer time) and re-compression time (eg if new clients request all blobs in compressed form). This is something that I think we could experiment with in the "hybrid" storage mode.

Agree about experiments. For example I’m curious about how much the network transfer is affected by compressing small files that would have fit in a single 1500 byte ethernet MTU even as non-compressed.

Regarding re-compression: If bazel-remote decides to store some files uncompressed, then I assume bazel-remote would have the choice of transferring them uncompressed, even if client also accepts zstd encoding?

Below follows an example of the file size distribution in one of our caches.

Interesting- can you share the language(s) in use here?

I think it is mostly building C code and running tests with a proprietary test framework.

Many of the tiny files seems to be REv2 messages, such as Action messages that mainly contains only two Digests. That particular cache instance is not used for remote execution, but those messages are always uploaded by bazel client to CAS also for pure caching setups. However, they are only downloaded if using remote execution.

(Note that I mean the Action message in CAS, not the ActionResult message in AC)

One-liners for calculating size distribution: https://superuser.com/questions/565443/generate-distribution-of-file-sizes-from-the-command-prompt

mostynb · 2021-01-18T22:25:53Z

Note that there are two other costs to consider: network transfer time (even if we didn't save disk space, we might save transfer time) and re-compression time (eg if new clients request all blobs in compressed form). This is something that I think we could experiment with in the "hybrid" storage mode.

Agree about experiments. For example I’m curious about how much the network transfer is affected by compressing small files that would have fit in a single 1500 byte ethernet MTU even as non-compressed.

Regarding re-compression: If bazel-remote decides to store some files uncompressed, then I assume bazel-remote would have the choice of transferring them uncompressed, even if client also accepts zstd encoding?

That is possible for HTTP (with some refactoring) because the server has a way to specify the encoding, but not for gRPC bytestream where the server only returns data (not metadata). It might be something to reconsider for REAPIv3.

Below follows an example of the file size distribution in one of our caches.

Interesting- can you share the language(s) in use here?

I think it is mostly building C code and running tests with a proprietary test framework.

Many of the tiny files seems to be REv2 messages, such as Action messages that mainly contains only two Digests. That particular cache instance is not used for remote execution, but those messages are always uploaded by bazel client to CAS also for pure caching setups. However, they are only downloaded if using remote execution.

We could try using heuristics in a future hybrid storage mode. For example, any CAS blob under a certain size is stored uncompressed, and then only compressed for storage if it is downloaded a certain number of times (between bazel-remote restarts).

(Note that I mean the Action message in CAS, not the ActionResult message in AC)

I can't think of a reason why the client would upload Action messages for cache-only mode. My guess is that it either simplifies the code on the client side, or it's a bug.

ulrfa · 2021-01-19T11:20:55Z

I can't think of a reason why the client would upload Action messages for cache-only mode. My guess is that it either simplifies the code on the client side, or it's a bug.

I'm not sure, but one reason could be the following, which is valuable for me:

Sometimes I analyze content in ac & cas, by starting from the ActionResult and following references from Action to Command and further, to get the associations between output files in CAS and their originating commands. In order answer questions like:

How many alternative compiled object files are there in the CAS for a particular source file?
What changes typically cause cache misses?
Are there many similar commands resulting in same output file?
...

I have only done it in small caches, but I have ideas about a tool for creating index mapping original action output file names to Action digests. Then it hope being able to see patterns also in large caches. I though about generating such index from within bazel-remote, but I would like to avoid that overhead and instead only generating it on demand when needed.

mostynb · 2021-02-08T21:03:27Z

The proxy backends should work now, and I'm not aware of any other blockers for landing this. There is one limitation that might be annoying- it's not possible to use storage_mode zstd and proxy to another bazel-remote instance via http, but I think that can be implemented later.

tschuett · 2021-02-09T08:28:37Z

Thanks for doing this.
How about a second small lru cache. When somebody reads from an offset of a Zstd file, uncompress the file, put the uncompressed file into the second cache, and serve the data to the client. If somebody else reads from an offset from the same file, you can serve the data from the second cache.

mostynb · 2021-02-09T11:21:06Z

@tschuett: I have some ideas for "hybrid" storage modes that we can experiment with later, which might cover that scenario if it occurs often enough. Hopefully reads from offsets are rare though.

tschuett · 2021-02-09T11:27:04Z

Thanks. It will be interesting to see the actual Bazel implementation.

mostynb · 2021-02-14T22:26:32Z

@ulrfa: I decided to try to squeeze in one last change while changing the on-disk format. The latest commit uses psuedo-random strings in the blob filenames, so there's no canonical file for each hash that needs to be replaced while holding the LRU index lock.

To avoid incomplete files being used when restarting bazel-remote, I use the setgid permission bit- it's set when creating the file and un-set once the file has been written successfully. Then on startup we discard any files with the setgid bit set.

ulrfa · 2021-02-15T12:47:36Z

@ulrfa: I decided to try to squeeze in one last change while changing the on-disk format. The latest commit uses psuedo-random strings in the blob filenames, so there's no canonical file for each hash that needs to be replaced while holding the LRU index lock.

@mostynb, thanks for considering the on-disk format for the locks!

I wonder what happens in the following scenario, with two PUT and one GET request, all with the same key:

Request1-PUT: Writes file and commits to LRU.
Request2-GET: lru.Get returns item.random. But Request3 arrives before Request2 calls os.Open.
Request3-PUT: Writes file and commits to LRU. Will it replace the previous entry in LRU? Is that triggering an eviction of the file from Request1?
Request2-GET: Continues with os.Open. Will it fail to open the file from request1?

mostynb · 2021-02-15T14:04:13Z

availableOrTryProxy opens the file while holding the LRU lock and returns the *os.File, and the lock is also held whenever evicting items (somewhere inside Put or Get with a proxy download).

So request 2 gets the first version of the file and has a valid/usable file handle, then that file is unlinked due to request 3, but request 2 progresses as usual.

ulrfa · 2021-02-15T15:07:47Z

availableOrTryProxy opens the file while holding the LRU lock

@mostynb, to me it seems like the mutex is released before opening the file. Right?

bazel-remote/cache/disk/disk.go

Lines 689 to 696 in 33dfce6

    
           c.mu.Unlock() // We expect a cache hit below. 
        
           locked = false 
        
           blobPath := path.Join(c.dir, c.FileLocation(kind, item.legacy, hash, item.size, item.random)) 
        
           if !isSizeMismatch(size, item.size) { 
        
           	var f *os.File 
        
           	f, err = os.Open(blobPath)

I think we decided to release the lock before os.Open on purpose, to avoid holding the mutex while doing frequent filesystem syscalls. And we considered it OK that os.Open could fail if an old entry was evicted. But with the latest change also recently written files are deleted when replacing lru item with same key.

mostynb · 2021-02-15T17:35:14Z

@mostynb, to me it seems like the mutex is released before opening the file. Right?

bazel-remote/cache/disk/disk.go

Lines 689 to 696 in 33dfce6

c.mu.Unlock() // We expect a cache hit below.

locked = false

blobPath := path.Join(c.dir, c.FileLocation(kind, item.legacy, hash, item.size, item.random))

if !isSizeMismatch(size, item.size) {

var f *os.File

f, err = os.Open(blobPath)

I think we decided to release the lock before os.Open on purpose, to avoid holding the mutex while doing frequent filesystem syscalls. And we considered it OK that os.Open could fail if an old entry was evicted. But with the latest change also recently written files are deleted when replacing lru item with same key.

Right- good catch. There's also another bug which masks this one: I forgot to make SizedLRU.Add remove files when replacing items.

One solution to the first bug might be to try to open the file without holding the lock, and if the file does not exist anymore, try again while holding the lock.

For the second bug, we can either delete the old file while holding the lock, or spawn a goroutine to perform the deletion so we can continue and release the lock (possibly with a delay to increase the probability of hitting the fast path inside Get).

I'll think this through some more, maybe there's a better solution.

mostynb · 2021-02-17T23:18:37Z

Here's another potential solution, reference counting Gets that haven't opened the file yet:

When adding an item to the LRU, initialise a uint32 counter in the lruItem value to 1 (while holding the lock).
When the value is evicted from the LRU index, perform an atomic decrement and remove the file if the resulting uint32 is 0 (this would require some refactoring to perform the file removals without holding the lock - perhaps by spawning a new goroutine per file removal).
When a Get for that item occurs, do an atomic increment, and defer a function which performs an atomic decrement and then removes the file if the resulting uint32 is 0 (the lock is released after deferring).

This will overflow if there are 4,294 million concurrent Gets (that haven't opened the file yet) for the same value, but I don't think we'd scale that high anyway.

However, I think it would be better to start with the fallback slow path fix I mentioned the other day. ie if we fail to open the file, perform another LRU lookup and open the file while holding the lock (and remove files when replacing items in the index- also after refactoring, so it can be done without holding the lock). Maybe that will be fast enough.

… calls

The default is to use zstd, but some installs might choose to use uncompressed data (for example if clients mostly upload blobs which are already compressed).

Migrating old data files to the new format is too slow, we will need to add back support for raw uncompressed files.

The on-disk format seems to be settled now, and migration should be as fast as is reasonably possible. I no longer consider it a risky move to migrate old caches.

…alue This could put significant load on the backend server. Instead, just return "unknown size" and rely on the SHA256 key by itself to avoid collisions.

By using a zstd skippable frame for our header, the file becomes a valid .zst file on disk. This makes it possible to inspect cache files with standard tools. The cost is only 8 bytes per (compressed) file. Compressed CAS blobs are now limited to ~511TB (with 1M chunks), which should not be an issue in practical use.

By doing this we can build the LRU index on startup using only the directory listings (ie without reading the header of compressed blobs). This should keep startup times low.

While preexisting legacy/.v1 CAS blobs were being imported and used correctly, newly uploaded blobs weren't being named correctly with --storage_mode uncompressed.

Unlike the other settings, these were previously only settable from a config file.

Since we have two different on-disk CAS blob formats, and we don't want to perform transcoding in the proxy backends, we need to use a different path for v2 CAS blobs. This means that for now, it's not possible to use storage_mode: zstd and use the httpproxy backend to proxy blobs to another bazel-remote instance. Also, when switching storage modes, stored blobs for the other storage mode on the proxy backends will not be used.

Rely on the sha256 hash alone, like we do in httpproxy.Contains. This avoids the need to update this code since we added the zstandard skippable framing, which I don't have time to debug right now.

…he LRU lock By adding a psuedo-random string to the filename for blobs and storing that string in the index, we no longer need to rename files while holding the LRU index lock. To avoid incomplete files from being reused after restarting bazel-remote, we use a bit of a hack: create the blob file with the setgid bit set, and un-set it once the blob is complete. On startup, remove any files with the setgid bit set.

…mments more explicit

If there is an error while reading the input file, we should pass that error on to readers so they know not to use incomplete files.

…e call

We can peek at the first part of the data to figure out the logical size, then let the disk cache layer decide if it wants to use the blob or not.

mostynb commented Jan 4, 2021

View reviewed changes

ulrfa reviewed Jan 8, 2021

View reviewed changes

mostynb force-pushed the compressed_blobs_support branch from 1d664a1 to ad8de4d Compare January 10, 2021 14:14

mostynb force-pushed the compressed_blobs_support branch from b8b1f23 to ee44820 Compare January 15, 2021 08:25

mostynb mentioned this pull request Jan 23, 2021

Configure http request duration buckets for lower latency caches #389

Closed

mostynb force-pushed the compressed_blobs_support branch 3 times, most recently from 75094b0 to c199b08 Compare February 7, 2021 14:09

mostynb added 28 commits March 1, 2021 07:55

[zstd] limit concurrency to 1 for single-use {en,de}coders

997695a

[zstd] use decoder and encoder pools

fe70a24

[bytestream] be more careful about calling Close on the result of Get…

0accc93

… calls

[disk] make the CAS storage mode (zstd/uncompressed) configurable

6cc58ac

The default is to use zstd, but some installs might choose to use uncompressed data (for example if clients mostly upload blobs which are already compressed).

[disk] fan out when migrating old data files

2176041

[disk] use raw uncompressed CAS blob files

038e067

Migrating old data files to the new format is too slow, we will need to add back support for raw uncompressed files.

[disk] validate cache file hash values when loading

f8eb86c

remove --destructive_compression_prototype flag

512ad74

The on-disk format seems to be settled now, and migration should be as fast as is reasonably possible. I no longer consider it a risky move to migrate old caches.

doc: mention compressed-blobs in the gRPC section of the readme file

d5ad6e7

[disk] fix space accounting

e873612

[httpproxy] don't read remote file headers for CAS.v2 Contains size v…

eaffa82

…alue This could put significant load on the backend server. Instead, just return "unknown size" and rely on the SHA256 key by itself to avoid collisions.

[disk] encode logical CAS blob sizes in the filename

9154319

By doing this we can build the LRU index on startup using only the directory listings (ie without reading the header of compressed blobs). This should keep startup times low.

[disk] fix --storage_mode uncompressed uploads

3832c9c

While preexisting legacy/.v1 CAS blobs were being imported and used correctly, newly uploaded blobs weren't being named correctly with --storage_mode uncompressed.

[proxy] add cmdline flags for http and gcs backend proxies

f907889

Unlike the other settings, these were previously only settable from a config file.

[s3proxy] don't read remote file headers for CAS.v2 Contains size value

163eb40

Rely on the sha256 hash alone, like we do in httpproxy.Contains. This avoids the need to update this code since we added the zstandard skippable framing, which I don't have time to debug right now.

[http] don't interact with files in tests, use disk.Cache public API

9af561e

[casblob] sanity check file size

63ba6fc

[casblob] add note that we never store empty files

b12351e

[casblob] add missing Close on error conditions

605d444

[casblob] make the GetUncompressedReadCloser and GetZstdReadCloser co…

c24f861

…mments more explicit

[casblob] GetLegacyZstdReadCloser should pass errors to readers

ed9901a

If there is an error while reading the input file, we should pass that error on to readers so they know not to use incomplete files.

[casblob] read and write chunk offset table with one binary.Read/Writ…

311fb8e

…e call

[casblob] proxy backends should return the logical size for Get

ff4c853

We can peek at the first part of the data to figure out the logical size, then let the disk cache layer decide if it wants to use the blob or not.

[casblob] test a basic assumption about the sizeof int

564d20e

[disk] accept ascii characters in random string of blob filenames

48e8649

mostynb force-pushed the compressed_blobs_support branch from 51a10d3 to 48e8649 Compare March 1, 2021 07:17

mostynb merged commit b193c83 into buchgr:master Mar 2, 2021

Conversation

mostynb commented Jan 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mostynb commented Jan 3, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ulrfa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mostynb commented Jan 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ulrfa commented Jan 9, 2021

Uh oh!

mostynb commented Jan 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mostynb commented Jan 12, 2021

Uh oh!

mostynb commented Jan 12, 2021

Uh oh!

mostynb commented Jan 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ulrfa commented Jan 17, 2021

Uh oh!

mostynb commented Jan 18, 2021

Uh oh!

ulrfa commented Jan 18, 2021

Uh oh!

mostynb commented Jan 18, 2021

Uh oh!

ulrfa commented Jan 19, 2021

Uh oh!

mostynb commented Feb 8, 2021

Uh oh!

tschuett commented Feb 9, 2021

Uh oh!

mostynb commented Feb 9, 2021

Uh oh!

tschuett commented Feb 9, 2021

Uh oh!

mostynb commented Feb 14, 2021

Uh oh!

ulrfa commented Feb 15, 2021

Uh oh!

mostynb commented Feb 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ulrfa commented Feb 15, 2021

Uh oh!

mostynb commented Feb 15, 2021

Uh oh!

mostynb commented Feb 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

mostynb commented Jan 3, 2021 •

edited

Loading

mostynb commented Jan 8, 2021 •

edited

Loading

mostynb commented Jan 11, 2021 •

edited

Loading

mostynb commented Jan 15, 2021 •

edited

Loading

mostynb commented Feb 15, 2021 •

edited

Loading

mostynb commented Feb 17, 2021 •

edited

Loading