In our previous blog posts, we talked about how we updated the Dropbox search engine to add intelligence into our users’ workflow, and how we built our optical character recognition (OCR) pipeline. One of the most impactful benefits that users will see from these changes is that users on Dropbox Professional and Dropbox Business Advanced and Enterprise plans can search for English text within images and PDFs using a system we’re describing as automatic image text recognition.
The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous.
In our previous post, we discussed the architecture of our new search engine, named Nautilus, and its use of machine intelligence to scale our search–ranking and content–understanding models. Along with best–in–class performance, scalability, and reliability, we also provided a foundation for implementing intelligent document ranking and retrieval features. This flexible system allows our engineers to easily customize the document–indexing and query–processing pipelines while maintaining strong safeguards to preserve the privacy of our users’ data.
In this post, we will discuss the process that we undertook to ensure optimal performance and reliability.
Each of the hundreds of our search leaves runs our retrieval engine,
Over the last few months, the Search Infrastructure engineering team at Dropbox has been busy releasing a new full-text search engine called Nautilus, as a replacement for our previous search engine.
Search presents a unique challenge when it comes to Dropbox due to our massive scale—with hundreds of billions of pieces of content—and also due to the need for providing a personalized search experience to each of our 500M+ registered users. It’s personalized in multiple ways: not only does each user have access to a different set of documents, but users also have different preferences and behaviors in how they search.
Our workdays are getting noisier. Never-ending emails, text messages, constant notifications from more apps and more platforms—it’s disruptive and distracting. And then there’s content. All kinds of documents, spreadsheets, presentations, videos, and photos. Industry research shows that employees at larger organizations use an average of 36 cloud services at work, including tools for productivity, project management, communication, and storage. This information overload is a key source of pain for people at work—and a prime opportunity to leverage the help of machine intelligence.
How do we define machine intelligence?
When we talk about machine intelligence at Dropbox,
With Dropbox’s document scanner, a user can take a photo of a document with their phone and convert it into a clean, rectangular PDF. In our previous blog posts (Part 1, Part 2), we presented an overview of document scanner’s machine learning backend, along with its iOS implementation. This post will describe some of technical challenges associated with implementing the document scanner on Android.
We will specifically focus on all steps required to generate an augmented camera preview in order to achieve the following effect:
In this post we will take you behind the scenes on how we built a state-of-the-art Optical Character Recognition (OCR) pipeline for our mobile document scanner. We used computer vision and deep learning advances such as bi-directional Long Short Term Memory (LSTMs), Connectionist Temporal Classification (CTC), convolutional neural nets (CNNs), and more. In addition, we will also dive deep into what it took to actually make our OCR pipeline production-ready at Dropbox scale.
In previous posts we have described how Dropbox’s mobile document scanner works. The document scanner makes it possible to use your mobile phone to take photos and “