We introduced Cape in a previous post. In a nutshell, Cape is a framework for enabling real-time asynchronous event processing at a large scale with strong guarantees. It has been over a year since the system was launched. Today Cape is a critical component for Dropbox infrastructure. It operates with both high performance and reliability at a very large scale. Here are a few key metrics, Cape is:
- running on thousands of servers across the continent
- subscribing to over 30 different event domains at a rate of 30K/s
- processing jobs of various sizes at rate of 150K/s
- delivering 95% of events under 1 second after they are created.
Dropbox invests heavily in our security program. We have lots of teams dedicated to securing Dropbox, each working on exciting things. Some recent examples covered on our tech blog include:
- Our Product Security team rolled out support for WebAuthn to boost user adoption of two-step verification and upleveled our industry-leading public bug bounty program
- Because security is everyone’s responsibility, our Security Culture team helps our employees make consistently secure and informed decisions that protect Dropbox, our users, and our employees
- Our Detection and Response Team (DART) implementation of extensive instrumentation throughout our infrastructure to catch any indications of compromise.
Dropbox stores petabytes of metadata to support user-facing features and to power our production infrastructure. The primary system we use to store this metadata is named Edgestore and is described in a previous blog post, (Re)Introducing Edgestore. In simple terms, Edgestore is a service and abstraction over thousands of MySQL nodes that provides users with strongly consistent, transactional reads and writes at low latency.
Edgestore hides details of physical sharding from the application layer to allow developers to scale out their metadata storage needs without thinking about complexities of data placement and distribution.
One of the greatest challenges associated with maintaining a complex desktop application like Dropbox is that with hundreds of millions of installs, even the smallest bugs can end up affecting a very large number of users. Bugs inevitably will strike, and while most of them allow the application to recover, some cause the application to terminate. These terminations, or “crashes,” are highly disruptive events: when Dropbox stops, synchronization stops. To ensure uninterrupted sync for our users we automatically detect and report all crashes and take steps to restart our application when they occur.
In 2016, faced with our impending transition to Python 3,
What is the JS Guild?
The JS Guild is a grassroots initiative at Dropbox to improve our frontend engineering by fostering community, culture, and code quality. The group strives to teach frontend best practices to generalists and to help strong frontend engineers leverage and grow their domain knowledge.
In this post we will describe the Edge network part of Dropbox traffic infrastructure. This is an extended transcript of our NginxConf 2018 presentation. Around the same time last year we described low-level aspects of our infra in the Optimizing web servers for high throughput and low latency post. This time we’ll cover higher-level things like our points of presence around the world, GSLB, RUM DNS, L4 loadbalancers, nginx setup and its dynamic configuration, and a bit of gRPC proxying.
Dropbox has more than half a billion registered users who trust us with over an exabyte of data and petabytes of corresponding metadata.