Using machine learning to predict what file you need next

As we laid out in our blog post introducing DBXi, Dropbox is building features to help users stay focused on what matters. Searching through your content can be tedious, so we built content suggestions to make it easier to find the files you need, when you need them.

We’ve built this feature using modern machine learning (ML) techniques, but the process to get here started with a simple question: how do people find their files? What kinds of behavior patterns are most common? We hypothesized the following two categories would be most prevalent:

  • Recent files: The files you need are often the ones you’ve been using most recently.

Read more

Using machine learning to index text from billions of images

In our previous blog posts, we talked about how we updated the Dropbox search engine to add intelligence into our users’ workflow, and how we built our optical character recognition (OCR) pipeline. One of the most impactful benefits that users will see from these changes is that users on Dropbox Professional and Dropbox Business Advanced and Enterprise plans can search for English text within images and PDFs using a system we’re describing as automatic image text recognition.

The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous.

Read more

Validating performance and reliability of the new Dropbox search engine

In our previous post, we discussed the architecture of our new search engine, named Nautilus, and its use of machine intelligence to scale our search–ranking and content–understanding models. Along with best–in–class performance, scalability, and reliability, we also provided a foundation for implementing intelligent document ranking and retrieval features. This flexible system allows our engineers to easily customize the document–indexing and query–processing pipelines while maintaining strong safeguards to preserve the privacy of our users’ data. 

In this post, we will discuss the process that we undertook to ensure optimal performance and reliability.


Index format

Each of the hundreds of our search leaves runs our retrieval engine,

Read more

Architecture of Nautilus, the new Dropbox search engine

Over the last few months, the Search Infrastructure engineering team at Dropbox has been busy releasing a new full-text search engine called Nautilus, as a replacement for our previous search engine.

Search presents a unique challenge when it comes to Dropbox due to our massive scale—with hundreds of billions of pieces of content—and also due to the need for providing a personalized search experience to each of our 500M+ registered users. It’s personalized in multiple ways: not only does each user have access to a different set of documents, but users also have different preferences and behaviors in how they search.

Read more

Machine intelligence at Dropbox: An update from our DBXi team

Our workdays are getting noisier. Never-ending emails, text messages, constant notifications from more apps and more platforms—it’s disruptive and distracting. And then there’s content. All kinds of documents, spreadsheets, presentations, videos, and photos. Industry research shows that employees at larger organizations use an average of 36 cloud services at work, including tools for productivity, project management, communication, and storage. This information overload is a key source of pain for people at work—and a prime opportunity to leverage the help of machine intelligence.

How do we define machine intelligence?

When we talk about machine intelligence at Dropbox,

Read more

Augmented camera previews for the Dropbox Android document scanner

With Dropbox’s document scanner, a user can take a photo of a document with their phone and convert it into a clean, rectangular PDF. In our previous blog posts (Part 1, Part 2), we presented an overview of document scanner’s machine learning backend, along with its iOS implementation. This post will describe some of technical challenges associated with implementing the document scanner on Android.

We will specifically focus on all steps required to generate an augmented camera preview in order to achieve the following effect:

Read more