Many moons ago, I was working at the New York Times and created a library called Store, which was “a Java library for effortless, reactive data loading.” We built Store using RxJava and patterns adopted from Guava’s Cache implementation. Today’s app users expect data updates to flow in and out of the UI without having to do things like pulling to refresh or navigating back and forth between screens. Reactive front ends led me to think of how we can have declarative data stores with simple APIs that abstract complex features like multi-request throttling and disk caching that are needed in modern mobile applications.
The Dropbox Traffic team is charged with innovating our application networking stack to improve the experience for every one of our users—over half a billion of them. This article describes our work with NS1 to optimize our intelligent DNS-based global load balancing for corner cases that we uncovered while improving our point of presence (PoP) selection automation for our edge network. By co-developing the platform capabilities with NS1 to handle these outliers, we deliver positive Dropbox experiences to more users, more consistently.
Spoiler alert: BBRv2 is slower than BBRv1 but that’s a good thing.
BBRv1 Congestion Control
Three years have passed since “Bottleneck Bandwidth and Round-trip” (BBR) congestion control was released. Nowadays, it is considered production-ready and added to Linux, FreeBSD, and Chrome (as part of QUIC.) In our blogpost from 2017, “Optimizing web servers for high throughput and low latency,” we evaluated BBRv1 congestion control on our edge network and it showed awesome results:
Since then, BBRv1 has been deployed to Dropbox Edge Network and we got accustomed to some of its downsides.
Dropbox server-side software lives in a large monorepo. One lesson we’ve learned scaling the monorepo is to minimize the number of global operations that operate on the repository as a whole. Years ago, it was reasonable to run our entire test corpus on every commit to the repository. This scheme became untenable as we added more tests. One obvious inefficiency is the pointless and wasteful execution of tests that can’t possibly be affected by a particular change.
We addressed this problem with the help of our build system. Code in our monorepo is built and tested exclusively with Bazel.
Dropbox is a big user of Python. It’s our most widely used language both for backend services and the desktop client app (we are also heavy users of Go, TypeScript, and Rust). At our scale—millions of lines of Python—the dynamic typing in Python made code needlessly hard to understand and started to seriously impact productivity. To mitigate this, we have been gradually migrating our code to static type checking using mypy, likely the most popular standalone type checker for Python. (Mypy is an open source project, and the core team is employed by Dropbox.)
Dropbox has been one of the first companies to adopt Python static type checking at this scale.