Continuous integration and deployment with Bazel

Dropbox server-side software lives in a large monorepo. One lesson we’ve learned scaling the monorepo is to minimize the number of global operations that operate on the repository as a whole. Years ago, it was reasonable to run our entire test corpus on every commit to the repository. This scheme became untenable as we added more tests. One obvious inefficiency is the pointless and wasteful execution of tests that can’t possibly be affected by a particular change.

We addressed this problem with the help of our build system. Code in our monorepo is built and tested exclusively with Bazel. Abstractly, Bazel views the repository as a set of targets (files, binaries, libraries, tests, etc.) and the dependencies between them. In particular, Bazel knows the dependency graph between all source files and tests in the repository. We modified our continuous integration system to extract this dependency information from Bazel (via bazel query) and use it to compute the set of tests affected by a particular commit. This allows us to greatly reduce the number of tests executed on most commits while still being correct.

Since a particular test no longer runs on every commit, we use its previous history to determine its status on commits where it didn’t run. If a test runs on commit N but isn’t affected by commit N+1, we can consider the test to have the same status on both commits. In this way, we propagate a status for every test in the repository for every commit that it exists in.

To conserve extra resources, we don’t run all affected tests on all commits. We roll up the set of affected tests for the commits in a fixed time period by computing the union of the set of affected tests over each commit in the period. Then, we execute the rolled-up set of affected tests for that period on the last commit in the period. The time between rollup builds is a tunable parameter trading resource usage against test result timeliness. This batching could hamper breakage detection because the first commit to observe a test failure may not actually be culpable. However, our automated breakage detection system, Athena, is able to bisect failing tests over the entire rollup period to find the precise broken change.

Our production deployment system distributes software to hosts in the form of SquashFS images. We have a custom Bazel rule that builds SquashFS images for the deployment system to consume.

We generally require that software pass its tests before being pushed into our production environment. This requirement is enforced by our deployment system. Historically, we allowed pushing software from a particular commit only if all tests in the repository passed on it, either by being directly run on the commit or through propagation from previous commits. While simple, this model doesn’t scale well. It’s frustrating to have deployment blocked because a completely unrelated test is failing. Even if all tests pass, repository growth over time means takes longer and longer to prove that a particular commit is completely green, despite running only affected tests on each commit. To break down the old “monolithic green” system, we allow every deployable package to specify “release tests”, the set of tests required to pass before pushing it. The test set is described using a subset of Bazel’s target pattern syntax.

For example, deploying a simple C++ ping server at Dropbox might have
a Bazel BUILD file like this:

cc_library(
     name = "ping_lib",
     srcs = ["ping.cc"],
)

cc_binary(
    name = "ping_server",
    srcs = ["ping_server.cc"],
    deps = [":ping_lib"],
)

cc_test(
    name = "ping_test",
    srcs = ["ping_test.cc"],
    deps = [":ping_lib"],
)

dbx_pkg_sqfs(
    name = "ping_server.sqfs",
    data = [
        ":server",
    ],
    release_tests = [
        "//ping_server/...",
    ],
)

This file declares a C++ library, binary, and test using the standard built-in Bazel C++rules. The ping_server.sqfs target produces a SquashFS containing the ping_server binary. ping_server.sqfs would be deployable on commits where every test in //ping_server/ and its subpackages had passed.

As mentioned previously, we conserve resources by aggregating test runs across several commit builds. This potentially introduces extra latency between when a commit lands and when it’s deployable. If an engineer makes a change to the ping server and wants to deploy it
immediately, they can request that our continuous integration system run ping_server.sqfs‘s release tests as soon as their commit lands. This happens regardless of where the commit falls in the rollup period.

We leave the decision of what to put in release_tests up to individual teams. It’s common to include a package’s own tests as well as the tests of critical dependency libraries. More conservative projects might include some of their reverse dependencies’ tests. When we were developing the release tests feature, we experimented with automatically generating the test set by inspecting the packaged code’s dependencies. However, we were unable to develop an intuitive heuristic that simultaneously included the tests people expected and meaningfully cut down the number of tests required to release a package.

Interested in building great developer tools at scale? We’re hiring!