Large-scale networks are complex, dynamic systems with many parts, managed by many different teams. Each team has tools they use to monitor their part of the system, but they measure very different things. Before we built our own infrastructure, Magic Pocket, we didn’t have a global view of our production network, and we didn’t have a way to look at the interactions between different parts in real time. Most of the logs from our production network have semi-structured or unstructured data formats, which makes it very difficult to track a large amount of log data in real-time.
It’s universally acknowledged that it’s a bad idea to store plain-text passwords. If a database containing plain-text passwords is compromised, user accounts are in immediate danger. For this reason, as early as 1976, the industry standardized on storing passwords using secure, one-way hashing mechanisms (starting with Unix Crypt). Unfortunately, while this prevents the direct reading of passwords in case of a compromise, all hashing mechanisms necessarily allow attackers to brute force the hash offline, by going through lists of possible passwords, hashing them, and comparing the result. In this context, secure hashing functions like SHA have a critical flaw for password hashing: they are designed to be fast.
For Firefly, Dropbox’s full-text search engine, speed has always been a priority. (For more background on Firefly, check out our blog post). When our team saw search latency deteriorate from 250 ms to 1000 ms (95th percentile), we knew what to do—we measured, we analyzed, we fixed.
In order to create a good user experience for Firefly, we strive to keep our query latency under 250 ms (at 95th percentile). We noticed that our latency had deteriorated quite a bit since we started adding users to the system.
Edgestore is the metadata store that powers many internal and external Dropbox services and products. We first talked about Edgestore in late 2013 and needless to say, much has happened since.
In this post, we give a high-level overview of the motivation behind Edgestore, its architecture, salient features and how it’s being used at Dropbox. We’ll be doing a deep-dive on various aspects of Edgestore in subsequent posts.
A Brief History
Like so many startups, Dropbox started with vanilla MySQL databases for our metadata needs. As we rapidly added both users and features,
Dropbox’s document scanner lets users capture a photo of a document with their phone and convert it into a clean, rectangular PDF. It works even if the input is rotated, slightly crumpled, or partially in shadow—but how?
In our previous blog post, we explained how we detect the boundaries of the document. In this post, we cover the next parts of the pipeline: rectifying the document (turning it from a general quadrilateral to a rectangle) and enhancing it to make it evenly illuminated with high contrast. In a traditional flatbed scanner,
A few weeks ago, Dropbox launched a set of new productivity tools including document scanning on iOS. This new feature allows users to scan documents with their smartphone camera and store those scans directly in their Dropbox. The feature automatically detects the document in the frame, extracts it from the background, fits it to a rectangular shape, removes shadows and adjusts the contrast, and finally saves it to a PDF file. For Dropbox Business users, we also run Optical Character Recognition (OCR) to recognize the text in the document for search and copy-pasting.