Cross shard transactions at 10 million requests per second

Dropbox stores petabytes of metadata to support user-facing features and to power our production infrastructure. The primary system we use to store this metadata is named Edgestore and is described in a previous blog post, (Re)Introducing Edgestore. In simple terms, Edgestore is a service and abstraction over thousands of MySQL nodes that provides users with strongly consistent, transactional reads and writes at low latency.

Edgestore hides details of physical sharding from the application layer to allow developers to scale out their metadata storage needs without thinking about complexities of data placement and distribution.

Read more

Crash reporting in desktop Python applications

One of the greatest challenges associated with maintaining a complex desktop application like Dropbox is that with hundreds of millions of installs, even the smallest bugs can end up affecting a very large number of users. Bugs inevitably will strike, and while most of them allow the application to recover, some cause the application to terminate. These terminations, or “crashes,” are highly disruptive events: when Dropbox stops, synchronization stops. To ensure uninterrupted sync for our users we automatically detect and report all crashes and take steps to restart our application when they occur.

In 2016, faced with our impending transition to Python 3,

Read more

Dropbox traffic infrastructure: Edge network

In this post we will describe the Edge network part of Dropbox traffic infrastructure. This is an extended transcript of our NginxConf 2018 presentation. Around the same time last year we described low-level aspects of our infra in the Optimizing web servers for high throughput and low latency post. This time we’ll cover higher-level things like our points of presence around the world, GSLB, RUM DNS, L4 loadbalancers, nginx setup and its dynamic configuration, and a bit of gRPC proxying.

Dropbox scale

Dropbox has more than half a billion registered users who trust us with over an exabyte of data and petabytes of corresponding metadata.

Read more

Using machine learning to index text from billions of images

In our previous blog posts, we talked about how we updated the Dropbox search engine to add intelligence into our users’ workflow, and how we built our optical character recognition (OCR) pipeline. One of the most impactful benefits that users will see from these changes is that users on Dropbox Professional and Dropbox Business Advanced and Enterprise plans can search for English text within images and PDFs using a system we’re describing as automatic image text recognition.

The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous.

Read more

Architecture of Nautilus, the new Dropbox search engine

Over the last few months, the Search Infrastructure engineering team at Dropbox has been busy releasing a new full-text search engine called Nautilus, as a replacement for our previous search engine.

Search presents a unique challenge when it comes to Dropbox due to our massive scale—with hundreds of billions of pieces of content—and also due to the need for providing a personalized search experience to each of our 500M+ registered users. It’s personalized in multiple ways: not only does each user have access to a different set of documents, but users also have different preferences and behaviors in how they search.

Read more