Bazel remote caching API

Welcome to our series on Bazel remote caching: understanding how Bitrise enables you to fully harness the power of Bazel’s build and test automation capabilities.

Whether you are new to Bazel or already a user of it and you want to learn more about how Bazel works under the hood, you are in the right place. You can use Bazel without knowing most of what we cover here, but we think it is pretty cool and worth doing a deep dive.

What to expect in this series:

  • Bazel remote caching API technical deep dive
  • Server-side implementation of the remote cache API at Bitrise
  • Remote execution

In this first post, we introduce Bazel’s remote caching, specifically the API specification and client-side implementation, to enhance build times through cache reuse across different environments.

What is bazel?

Bazel is an open-source build and test automation tool designed to support multiple languages and platforms, often used for large, multi-language projects with complex dependencies. Its remote caching capabilities allow it to store and retrieve build outputs from a remote server, enabling faster build times and incremental builds across different machines and environments.

If you already use Bazel, with Bitrise remote build cache for Bazel you can set up remote caching in just a few minutes without being familiar with its internal details, and you can do that regardless what CI provider you use. If you want to enjoy the additional performance and cost benefits of the colocated build machines and build cache, you can use it with Bitrise CI, but it is not a requirement. Bitrise remote build cache for Bazel will work perfectly fine with any other CI vendor or self hosted CI.

Let’s start the deep dive into the internal details of Bazel, starting with the API specification and client-side implementation.

Specification v2.3.0

Google released a Protobuf-based specification that outlines how caching clients and remote servers interact. Bazel is one such client, but the API could also work with other build tools. For instance, we’re considering updating our Bitrise remote build cache for the Gradle plugin to align with this specification, given its similarities to our current approach. There are various server-side implementations, however, we’ll focus on Bitrise’s design and implementation in our next post.

The specification may initially appear daunting, so I’ve summarized the key concepts into simple diagrams below, concentrating solely on caching and omitting aspects related to remote execution.

The core entity is Action, which is a Command (with arguments, environment variables, etc) that will be executed on a given platform (eg. Linux, macOS, etc).

All entities but one are referenced by their digest (hash). For example, each Action has a command_digest field with the value of the hash of the related Command entity.

The only exception is ActionResult which represents the results (stdout, stderr, exit code, produced files and directories, etc) of an already executed Action and is referenced by the digest of the Action. These enable a client flow such as:

  • Execute Action (let’s say it has a digest of a1), and upload its ActionResult under key a1. The ActionResult itself will probably reference other blobs produced as part of the Command execution (let’s say they are uploaded under and referenced by digests b1 , b2, and b3).
  • Whenever the same exact Action (the same exact Command on the same exact platform) is executed, the client can skip it by hashing the Action and looking for the already existing ActionResult for digest a1. Related blobs (b1, b2, and b3) can also be read from the cache using their digest as a key.

To showcase what these entities might correspond to, let’s take a look at the example above. An Action was executed, running the dotnet build Command on the Windows platform. It resulted in three files (some.dll, other.dll, and my.exe), printed the Build succeeded. message to its stdout and terminated with a 0 exit code.

The specification also documents the usual client flow. It starts with calling the GetCapabilities endpoint to check what compression, digest algorithm, API version, etc, are supported by the server (and the client should conform to them or not use the remote cache API at all in case of incompatibility). It continues with building an Action graph (what Actions to execute in which order in case they depend on each other).

For each Action, it checks whether there is already a cached result. It calls the GetActionResult endpoint of the remote cache API, passing the digest of the given Action (which is the key of the desired ActionResult). Remote caches have finite space, therefore they usually evict old entries based on some logic (eg. LRU/LFU) to make room for new entries. The server should ensure that all blobs referenced by the ActionResult (stdout, stderr, produced files) are also available in the remote cache (further increasing their lifetimes to avoid their eviction before reading them).

If an ActionResult is found in the remote cache, it also downloads the related blobs by their digests using the Content Addressable Storage (CAS) with one of the related endpoints. As per the specification, the BatchReadBlobs endpoint (supports reading multiple blobs at once) should be used for small blobs, and the ByteStream API’s Read endpoint (supports streaming read for a single blob) should be used for large blobs.

If no ActionResult is found, the Action is executed locally, and its results are uploaded to the remote cache as part of a three-step process.

  1. It checks what blobs are missing from the remote cache using the FindMissingBlobs API endpoint (as it might be the case that two different actions produced some overlapping blobs).
  2. Missing blobs are uploaded using the CAS with either the BatchUpdateBlobs or the ByteStream API’s Write endpoint (similarly to the read path).
  3. The last step is calling UpdateActionResult, referencing the corresponding Action and related blobs that were already uploaded previously.

A client-side implementation: Bazel v7.0.2

Bazel is open source with a large and active community behind it. We looked into its codebase to better understand what to expect on the API side and prepare for deep-dive debugging if users report issues with their workloads.

I will walk you through it very briefly using diagrams using the notation above (trying to simplify many layers of abstraction in the Java code).

Everything starts with BAZEL_MODULES which is a long list of loosely coupled modules (classes implementing extending BlazeModule) that are loaded by the Bazel runtime. Each module is responsible for a specific feature of Bazel. We are interested in remote caching, which is implemented by the RemoteModule. It calls the GetCapabilites endpoint on the configured remote caching API, checking if it’s compatible with the Bazel client, and configures it for usage throughout the rest of the execution.

Soon after bootstrapping modules, Bazel starts parallel building and executing of the action graph using a component called Skyframe. For each action, it checks whether it was already executed and cached remotely (in which case local execution can be skipped by reading the existing ActionResult and related blobs) or if it should be executed locally and cached remotely.

The read path starts by calling the GetActionResult endpoint on the remote cache API.

  • Suppose a result isn’t present for the given Action in the cache. In that case, it signals it to the caller, returning an empty CacheHandle that can be used to upload the ActionResult later on.
  • If it is successful, Bazel downloads all of the blobs referenced by the ActionResult (all of which should also be present in the cache according to the spec) using the Read endpoint of the ByteStream API (never using BatchReadBlobs): stdout and stderr, all corresponding files, etc.

Writing the results of locally executed actions starts by calling the FindMissingDigests endpoint of the API. For all missing blobs, the Write endpoint of the ByteStream API (never using BatchWriteBlobs) is used either by streaming a file from the filesystem or passing it from memory (depending on the blob type). Interrupted writes are resumed using the QueryWriteStatus endpoint of the API, which should return the number of bytes already successfully written to the cache, enabling Bazel to continue writing the missing parts only.

Once all referenced blobs are written, Bazel persists the result for the given Action by calling the UpdateActionResult endpoint.

Conclusion

At Bitrise, we continuously enhance our build caching capabilities to provide the best support possible. This article explored the Bazel remote caching API specification and its client-side implementation. Next, we will delve into the server-side implementation of the remote cache API at Bitrise.

Interested in optimizing your Bazel builds? For a comprehensive guide, visit our DevCenter.

Not using Bitrise Build Cache yet? Start streamlining your builds with Bazel or Gradle by signing up for a 30-day free trial at Bitrise—no strings attached. Alternatively, feel free to talk to a mobile expert.

Join our team! Explore career opportunities on our careers page.

1 Like