Dropbox runs hundreds of services, written in different languages, which exchange millions of requests per second. At the core of our Service Oriented Architecture is Courier, our gRPC-based Remote Procedure Call (RPC) framework. While developing Courier, we learned a lot about extending gRPC, optimizing performance for scale, and providing a bridge from our legacy RPC system.
Note: this post shows code generation examples in Python and Go. We also support Rust and Java.
The road to gRPC
Courier is not Dropbox’s first RPC framework. Even before we started to break our Python monolith into services in earnest,
Dropbox invests heavily in our security program. We have lots of teams dedicated to securing Dropbox, each working on exciting things. Some recent examples covered on our tech blog include:
- Our Product Security team rolled out support for WebAuthn to boost user adoption of two-step verification and upleveled our industry-leading public bug bounty program
- Because security is everyone’s responsibility, our Security Culture team helps our employees make consistently secure and informed decisions that protect Dropbox, our users, and our employees
- Our Detection and Response Team (DART) implementation of extensive instrumentation throughout our infrastructure to catch any indications of compromise.
Dropbox stores petabytes of metadata to support user-facing features and to power our production infrastructure. The primary system we use to store this metadata is named Edgestore and is described in a previous blog post, (Re)Introducing Edgestore. In simple terms, Edgestore is a service and abstraction over thousands of MySQL nodes that provides users with strongly consistent, transactional reads and writes at low latency.
Edgestore hides details of physical sharding from the application layer to allow developers to scale out their metadata storage needs without thinking about complexities of data placement and distribution.
One of the greatest challenges associated with maintaining a complex desktop application like Dropbox is that with hundreds of millions of installs, even the smallest bugs can end up affecting a very large number of users. Bugs inevitably will strike, and while most of them allow the application to recover, some cause the application to terminate. These terminations, or “crashes,” are highly disruptive events: when Dropbox stops, synchronization stops. To ensure uninterrupted sync for our users we automatically detect and report all crashes and take steps to restart our application when they occur.
In 2016, faced with our impending transition to Python 3,
In this post we will describe the Edge network part of Dropbox traffic infrastructure. This is an extended transcript of our NginxConf 2018 presentation. Around the same time last year we described low-level aspects of our infra in the Optimizing web servers for high throughput and low latency post. This time we’ll cover higher-level things like our points of presence around the world, GSLB, RUM DNS, L4 loadbalancers, nginx setup and its dynamic configuration, and a bit of gRPC proxying.
Dropbox has more than half a billion registered users who trust us with over an exabyte of data and petabytes of corresponding metadata.