What is the correct way to track hit rate?

Hey,

I am fixing our remote bazel cache for our monorepos and I inherited monitoring dashboards which had incorrect way of checking hit/miss ratio:
`sum(bazel_remote_disk_cache_hits + bazel_remote_http_cache_hits) / sum(bazel_remote_disk_cache_hits + bazel_remote_http_cache_hits + bazel_remote_disk_cache_misses + bazel_remote_http_cache_misses)` which doesn't provide overtime data and resets once server is restarted.

I am now on latest version of the release and looking for a correct way to track this metric. We are not using any 2nd layer solutions. Perhaps using something similar to `sum(rate(grpc_server_handled_total{service="cache", grpc_code="OK"}[1h])) / sum(rate(grpc_server_handled_total{service="cache"}[1h]))` would work? I just don't have any experience with prometheus on how to set up queries to see hit/miss percentage over time, so I don't know if this is correct in any way and it's not an easy task to confirm the metrics are correct.

Our goal is to have at least 75% hit rate and make sure we get a warning if it drops below. 

https://github.com/buchgr/bazel-remote/pull/472#issuecomment-919758689 contains what I want to see, but from what I read, I assume that there are custom code added to the docker image to be able to produce such metrics.

Any help would be appreciated!

EDIT: What I am looking for is a stable metric that can show real time value that I could use to throw alerts of. I assume that rate period should be selected accordingly. Also not sure which metric is best to use. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the correct way to track hit rate? #478

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What is the correct way to track hit rate? #478

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions