Using machine learning to index text from billions of images

In our previous blog posts, we talked about how we updated the Dropbox search engine to add intelligence into our users’ workflow, and how we built our optical character recognition (OCR) pipeline. One of the most impactful benefits that users will see from these changes is that users on Dropbox Professional and Dropbox Business Advanced and Enterprise plans can search for English text within images and PDFs using a system we’re describing as automatic image text recognition.

The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous.

Read more

Validating performance and reliability of the new Dropbox search engine

In our previous post, we discussed the architecture of our new search engine, named Nautilus, and its use of machine intelligence to scale our search–ranking and content–understanding models. Along with best–in–class performance, scalability, and reliability, we also provided a foundation for implementing intelligent document ranking and retrieval features. This flexible system allows our engineers to easily customize the document–indexing and query–processing pipelines while maintaining strong safeguards to preserve the privacy of our users’ data. 

In this post, we will discuss the process that we undertook to ensure optimal performance and reliability.

Performance

Index format

Each of the hundreds of our search leaves runs our retrieval engine,

Read more

Architecture of Nautilus, the new Dropbox search engine

Over the last few months, the Search Infrastructure engineering team at Dropbox has been busy releasing a new full-text search engine called Nautilus, as a replacement for our previous search engine.

Search presents a unique challenge when it comes to Dropbox due to our massive scale—with hundreds of billions of pieces of content—and also due to the need for providing a personalized search experience to each of our 500M+ registered users. It’s personalized in multiple ways: not only does each user have access to a different set of documents, but users also have different preferences and behaviors in how they search.

Read more

How we rolled out one of the largest Python 3 migrations ever

Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python. In fact, Drew’s very first lines of code for Dropbox were written in Python for Windows using venerable libraries such as pywin32.

Though we’ve relied on Python 2 for many years (most recently, we used Python 2.7), we began moving to Python 3 back in 2015. This transition is now complete: If you’re using Dropbox today,

Read more

Machine intelligence at Dropbox: An update from our DBXi team

Our workdays are getting noisier. Never-ending emails, text messages, constant notifications from more apps and more platforms—it’s disruptive and distracting. And then there’s content. All kinds of documents, spreadsheets, presentations, videos, and photos. Industry research shows that employees at larger organizations use an average of 36 cloud services at work, including tools for productivity, project management, communication, and storage. This information overload is a key source of pain for people at work—and a prime opportunity to leverage the help of machine intelligence.

How do we define machine intelligence?

When we talk about machine intelligence at Dropbox,

Read more

Live-hacking Dropbox @ H1-3120

In 2018, Dropbox has focused on improving our world-class bug bounty program. From increasing bounties to protecting our researchers, we’re always looking for more creative and meaningful ways to stay ahead of the game when it comes to running this program.

As an example, we recently partnered with HackerOne to host their H1-3120 live-hacking event in Amsterdam. Live-hacking events let participants hack on a target—often in person—submit vulnerabilities, and receive bounties quickly, all during the course of the event. Live-hacking comes with a number of benefits over traditional bug bounty programs, such as real-time communication and relationship building,

Read more