At Dropbox, we are building smart features that use machine intelligence to help reduce people’s busywork. Since introducing content suggestions, which we described in our previous blog post, we have been improving the underlying infrastructure and machine learning algorithms that power content suggestions.
One new challenge we faced during this iteration of content suggestions was the disparate types of content we wanted to support. In Dropbox, we have various kinds of content—files, folders, Google Docs, Microsoft Office documents, and our own Dropbox Paper.
As we laid out in our blog post introducing DBXi, Dropbox is building features to help users stay focused on what matters. Searching through your content can be tedious, so we built content suggestions to make it easier to find the files you need, when you need them.
We’ve built this feature using modern machine learning (ML) techniques, but the process to get here started with a simple question: how do people find their files? What kinds of behavior patterns are most common? We hypothesized the following two categories would be most prevalent:
- Recent files: The files you need are often the ones you’ve been using most recently.
In this post we will take you behind the scenes on how we built a state-of-the-art Optical Character Recognition (OCR) pipeline for our mobile document scanner. We used computer vision and deep learning advances such as bi-directional Long Short Term Memory (LSTMs), Connectionist Temporal Classification (CTC), convolutional neural nets (CNNs), and more. In addition, we will also dive deep into what it took to actually make our OCR pipeline production-ready at Dropbox scale.
In previous posts we have described how Dropbox’s mobile document scanner works. The document scanner makes it possible to use your mobile phone to take photos and “
In our previous blog posts on Dropbox’s document scanner (Part 1, Part 2 and Part 3), we focused on the algorithms that powered the scanner and on the optimizations that made them speedy. However, speed is not the only thing that matters in a mobile environment: what about memory? Bounding both peak memory usage and memory spikes is important, since the operating system may terminate the app outright when under memory pressure. In this blog post, we will discuss some tweaks we made to lower the memory usage of our iOS document scanner.
In our previous blog posts (Part 1, Part 2), we presented an overview of various parts of Dropbox’s document scanner, which helps users digitize their physical documents by automatically detecting them from photos and enhancing them. In this post, we will delve into the problem of maintaining a real-time frame rate in the document scanner even in the presence of camera movement, and share some lessons learned.
Document scanning as augmented reality
Dropbox’s document scanner shows an overlay of the detected document over the incoming image stream from the camera.
Dropbox’s document scanner lets users capture a photo of a document with their phone and convert it into a clean, rectangular PDF. It works even if the input is rotated, slightly crumpled, or partially in shadow—but how?
In our previous blog post, we explained how we detect the boundaries of the document. In this post, we cover the next parts of the pipeline: rectifying the document (turning it from a general quadrilateral to a rectangle) and enhancing it to make it evenly illuminated with high contrast. In a traditional flatbed scanner,