In our previous blog posts (Part 1, Part 2), we presented an overview of various parts of Dropbox’s document scanner, which helps users digitize their physical documents by automatically detecting them from photos and enhancing them. In this post, we will delve into the problem of maintaining a real-time frame rate in the document scanner even in the presence of camera movement, and share some lessons learned.
Document scanning as augmented reality
Dropbox’s document scanner shows an overlay of the detected document over the incoming image stream from the camera.
We are pleased to announce the open source release of Lepton, our new streaming image compression format, under the Apache license.
Lepton achieves a 22% savings reduction for existing JPEG images, by predicting coefficients in JPEG blocks and feeding those predictions as context into an arithmetic coder. Lepton preserves the original file bit-for-bit perfectly. It compresses JPEG files at a rate of 5 megabytes per second and decodes them back to the original bits at 15 megabytes per second, securely, deterministically, and in under 24 megabytes of memory.
We have used Lepton to encode 16 billion images saved to Dropbox,
Written by Daniel Reiter Horn and Mehant Baid, Serving Infrastructure team at Dropbox.
In HBO’s Silicon Valley, lossless video compression plays a pivotal role for Pied Piper as they struggle to stream HD content at high speed.
John P. Johnson/HBO
Inspired by Pied Piper, we created our own version of their algorithm Pied Piper at Hack Week. In fact, we’ve extended that work and have a bit-exact, lossless media compression algorithm that achieves extremely good results on a wide array of images. (Stay tuned for more on that!)
Dropbox LAN Sync is a feature that allows you to download files from other computers on your network, saving time and bandwidth compared to downloading them from Dropbox servers. As the number of companies and offices using Dropbox has increased, the use cases for LAN Sync have grown, and the feature was recently rewritten and improved. Here’s a look inside how it works.
With hundreds of billions of files, Dropbox has become one of the world’s largest stores of private documents, and it’s still growing strong! As users add more and more files it becomes harder for them to stay organized. Eventually, search replaces browsing as the primary way users find their content. In other words, the more content our users store with us the more important it is for them to have a powerful search tool available. With this motivation in mind, we set out to deploy instant, full-text search for Dropbox.
Like any serious practitioners of large distributed systems our first order of business was clear: come up with a name for the project!
Our users love Dropbox for many reasons, sync performance being chief among them. We’re going to look at a recent performance improvement called Streaming Sync which can improve sync latency by up to 2x.
Prior to Streaming Sync, file synchronization was partitioned into two phases: upload and download. The entire file must be uploaded to our servers and committed to our databases before any other clients could learn of its existence. Streaming sync allows file contents to “stream” through our servers between your clients.
The Dropbox File System
First we’ll discuss the way Dropbox stores and syncs files.