Dropbox has hundreds of millions of registered users, and we’re always hard at work to ensure our customers have a speedy, reliable experience, wherever they are. Today, I am excited to announce an expansion to our global infrastructure that will deliver faster transfer speeds and improved performance for our customers around the world.
To give all of our users fast, reliable network performance, we’ve launched new Points of Presence (PoPs) across Europe, Asia, and parts of the US. We’ve coupled these PoPs with an open-peering policy, and as a result have seen consistent speed improvements.
In our previous blog posts (Part 1, Part 2), we presented an overview of various parts of Dropbox’s document scanner, which helps users digitize their physical documents by automatically detecting them from photos and enhancing them. In this post, we will delve into the problem of maintaining a real-time frame rate in the document scanner even in the presence of camera movement, and share some lessons learned.
Document scanning as augmented reality
Dropbox’s document scanner shows an overlay of the detected document over the incoming image stream from the camera.
Large-scale networks are complex, dynamic systems with many parts, managed by many different teams. Each team has tools they use to monitor their part of the system, but they measure very different things. Before we built our own infrastructure, Magic Pocket, we didn’t have a global view of our production network, and we didn’t have a way to look at the interactions between different parts in real time. Most of the logs from our production network have semi-structured or unstructured data formats, which makes it very difficult to track a large amount of log data in real-time.
It’s universally acknowledged that it’s a bad idea to store plain-text passwords. If a database containing plain-text passwords is compromised, user accounts are in immediate danger. For this reason, as early as 1976, the industry standardized on storing passwords using secure, one-way hashing mechanisms (starting with Unix Crypt). Unfortunately, while this prevents the direct reading of passwords in case of a compromise, all hashing mechanisms necessarily allow attackers to brute force the hash offline, by going through lists of possible passwords, hashing them, and comparing the result. In this context, secure hashing functions like SHA have a critical flaw for password hashing: they are designed to be fast.
For Firefly, Dropbox’s full-text search engine, speed has always been a priority. (For more background on Firefly, check out our blog post). When our team saw search latency deteriorate from 250 ms to 1000 ms (95th percentile), we knew what to do—we measured, we analyzed, we fixed.
In order to create a good user experience for Firefly, we strive to keep our query latency under 250 ms (at 95th percentile). We noticed that our latency had deteriorated quite a bit since we started adding users to the system.
Edgestore is the metadata store that powers many internal and external Dropbox services and products. We first talked about Edgestore in late 2013 and needless to say, much has happened since.
In this post, we give a high-level overview of the motivation behind Edgestore, its architecture, salient features and how it’s being used at Dropbox. We’ll be doing a deep-dive on various aspects of Edgestore in subsequent posts.
A Brief History
Like so many startups, Dropbox started with vanilla MySQL databases for our metadata needs. As we rapidly added both users and features,