This is the first in a series of posts that Dropbox Principal Engineer James Cowling has published on his personal Medium blog about technical leadership. Being a strong tech lead is very different from being a strong engineer and we thought the readers of our tech blog would find his experiences relevant and interesting.
Back when I was a first-time tech lead at Dropbox I had the misfortune of juggling two intimidating responsibilities at the same time:
- Build a multi-exabyte distributed storage system and migrate our data off Amazon S3,
In our previous post, we provided an overview of the global edge network that we deployed to improve performance for our users around the world. We built this edge network over the last two years as part of a strategy to deliver the benefits of Magic Pocket.
Alongside our edge network, we launched a global backbone network that connects our data centers in North America not only to each other, but also to the edge nodes around the world. In this blog, we’ll first review how we went about building out this backbone network and then discuss the benefits that it’s delivering for us and for our users.
Update (November 14, 2017): Miami, Sydney, Paris, Milan and Madrid have been added to the Dropbox Edge Network.
Since launching Magic Pocket last year, we’ve been storing and serving more than 90 percent of our users’ data on our own custom-built infrastructure, which has helped us to be more efficient and improved performance for our users globally.
But with about 75 percent of our users located outside of the United States, moving onto our own custom-built data center was just the first step in realizing these benefits. As our data centers grew,
Dropbox has hundreds of millions of registered users, and we’re always hard at work to ensure our customers have a speedy, reliable experience, wherever they are. Today, I am excited to announce an expansion to our global infrastructure that will deliver faster transfer speeds and improved performance for our customers around the world.
To give all of our users fast, reliable network performance, we’ve launched new Points of Presence (PoPs) across Europe, Asia, and parts of the US. We’ve coupled these PoPs with an open-peering policy, and as a result have seen consistent speed improvements.
Large-scale networks are complex, dynamic systems with many parts, managed by many different teams. Each team has tools they use to monitor their part of the system, but they measure very different things. Before we built our own infrastructure, Magic Pocket, we didn’t have a global view of our production network, and we didn’t have a way to look at the interactions between different parts in real time. Most of the logs from our production network have semi-structured or unstructured data formats, which makes it very difficult to track a large amount of log data in real-time.
Dropbox runs hundreds of services, written in different languages, which exchange millions of requests per second. At the core of our Service Oriented Architecture is Courier, our gRPC-based Remote Procedure Call (RPC) framework. While developing Courier, we learned a lot about extending gRPC, optimizing performance for scale, and providing a bridge from our legacy RPC system.
Note: this post shows code generation examples in Python and Go. We also support Rust and Java.
The road to gRPC
Courier is not Dropbox’s first RPC framework. Even before we started to break our Python monolith into services in earnest,