Until very recently, Dropbox had a technical strategy on mobile of sharing code between iOS and Android via C++. The idea behind this strategy was simple—write the code once in C++ instead of twice in Java and Objective C. We adopted this C++ strategy back in 2013, when our mobile engineering team was relatively small and needed to support a fast growing mobile roadmap. We needed to find a way to leverage this small team to quickly ship lots of code on both Android and iOS.
We have now completely backed off from this strategy in favor of using each platforms’ native languages (primarily Swift and Kotlin,
With Dropbox’s document scanner, a user can take a photo of a document with their phone and convert it into a clean, rectangular PDF. In our previous blog posts (Part 1, Part 2), we presented an overview of document scanner’s machine learning backend, along with its iOS implementation. This post will describe some of technical challenges associated with implementing the document scanner on Android.
We will specifically focus on all steps required to generate an augmented camera preview in order to achieve the following effect:
In this post we will take you behind the scenes on how we built a state-of-the-art Optical Character Recognition (OCR) pipeline for our mobile document scanner. We used computer vision and deep learning advances such as bi-directional Long Short Term Memory (LSTMs), Connectionist Temporal Classification (CTC), convolutional neural nets (CNNs), and more. In addition, we will also dive deep into what it took to actually make our OCR pipeline production-ready at Dropbox scale.
In previous posts we have described how Dropbox’s mobile document scanner works. The document scanner makes it possible to use your mobile phone to take photos and “
In our previous blog posts on Dropbox’s document scanner (Part 1, Part 2 and Part 3), we focused on the algorithms that powered the scanner and on the optimizations that made them speedy. However, speed is not the only thing that matters in a mobile environment: what about memory? Bounding both peak memory usage and memory spikes is important, since the operating system may terminate the app outright when under memory pressure. In this blog post, we will discuss some tweaks we made to lower the memory usage of our iOS document scanner.
In our previous blog posts (Part 1, Part 2), we presented an overview of various parts of Dropbox’s document scanner, which helps users digitize their physical documents by automatically detecting them from photos and enhancing them. In this post, we will delve into the problem of maintaining a real-time frame rate in the document scanner even in the presence of camera movement, and share some lessons learned.
Document scanning as augmented reality
Dropbox’s document scanner shows an overlay of the detected document over the incoming image stream from the camera.
Dropbox’s document scanner lets users capture a photo of a document with their phone and convert it into a clean, rectangular PDF. It works even if the input is rotated, slightly crumpled, or partially in shadow—but how?
In our previous blog post, we explained how we detect the boundaries of the document. In this post, we cover the next parts of the pipeline: rectifying the document (turning it from a general quadrilateral to a rectangle) and enhancing it to make it evenly illuminated with high contrast. In a traditional flatbed scanner,