More than a billion files are saved to Dropbox every day, and we need to run many asynchronous jobs in response to these events to power various Dropbox features. Examples of these asynchronous jobs include indexing a file to enable search over its contents, generating previews of files to be displayed when the files are viewed on the Dropbox website, and delivering notifications of file changes to third-party apps using the Dropbox developer API. This is where Cape comes in — it’s a framework that enables real-time asynchronous processing of billions of events a day,
At Dropbox, we strive to make products that are easy for everyone to use. As part of that mission, we’ve been improving product accessibility for users with disabilities, and building a collaborative culture in which our engineers understand and value accessibility best practices as part of their process.
To create accessible products, you need to find opportunities to spread accessibility knowledge and enthusiasm in a sustainable way throughout your company. But awareness is one of the largest barriers to implementing these best practices into a product. Most computer science curriculums at colleges and universities don’t include in-depth coverage of accessibility (though organizations like Teach Access are working on changing that!).
In this post we will take you behind the scenes on how we built a state-of-the-art Optical Character Recognition (OCR) pipeline for our mobile document scanner. We used computer vision and deep learning advances such as bi-directional Long Short Term Memory (LSTMs), Connectionist Temporal Classification (CTC), convolutional neural nets (CNNs), and more. In addition, we will also dive deep into what it took to actually make our OCR pipeline production-ready at Dropbox scale.
Most representations of data contain a lot of redundancy, which provides an opportunity for greater communication efficiency by compressing the content. Compression is either built-in into the data format — like in the case of images, fonts, and videos — or provided by the transportation medium, e.g. the HTTP protocol has the
Content-Encoding header pair that allows clients and servers to agree on a preferred compression method. In practice though, most servers today only support
It’s universally acknowledged that it’s a bad idea to store plain-text passwords. If a database containing plain-text passwords is compromised, user accounts are in immediate danger. For this reason, as early as 1976, the industry standardized on storing passwords using secure, one-way hashing mechanisms (starting with Unix Crypt). Unfortunately, while this prevents the direct reading of passwords in case of a compromise, all hashing mechanisms necessarily allow attackers to brute force the hash offline, by going through lists of possible passwords, hashing them, and comparing the result. In this context, secure hashing functions like SHA have a critical flaw for password hashing: they are designed to be fast.