In our previous blog posts, we talked about how we updated the Dropbox search engine to add intelligence into our users’ workflow, and how we built our optical character recognition (OCR) pipeline. One of the most impactful benefits that users will see from these changes is that users on Dropbox Professional and Dropbox Business Advanced and Enterprise plans can search for English text within images and PDFs using a system we’re describing as automatic image text recognition.
The potential benefit of automatically recognizing text in images (including PDFs containing images) is tremendous.
Over the last few months, the Search Infrastructure engineering team at Dropbox has been busy releasing a new full-text search engine called Nautilus, as a replacement for our previous search engine.
Search presents a unique challenge when it comes to Dropbox due to our massive scale—with hundreds of billions of pieces of content—and also due to the need for providing a personalized search experience to each of our 500M+ registered users. It’s personalized in multiple ways: not only does each user have access to a different set of documents, but users also have different preferences and behaviors in how they search.
Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python. In fact, Drew’s very first lines of code for Dropbox were written in Python for Windows using venerable libraries such as
Though we’ve relied on Python 2 for many years (most recently, we used Python 2.7), we began moving to Python 3 back in 2015. This transition is now complete: If you’re using Dropbox today,
Our workdays are getting noisier. Never-ending emails, text messages, constant notifications from more apps and more platforms—it’s disruptive and distracting. And then there’s content. All kinds of documents, spreadsheets, presentations, videos, and photos. Industry research shows that employees at larger organizations use an average of 36 cloud services at work, including tools for productivity, project management, communication, and storage. This information overload is a key source of pain for people at work—and a prime opportunity to leverage the help of machine intelligence.
How do we define machine intelligence?
When we talk about machine intelligence at Dropbox,
Modernizing the front-end stack
The core Dropbox web application is 10 years old and used by millions of users per day. Hundreds of front-end engineers across multiple cities actively work on it. Unsurprisingly, our codebase is very large and somewhat irregular. Recently written parts have thorough test coverage, other parts haven’t been updated in years.
Over the past two years we’ve worked to modernize our front-end stack. We’ve successfully moved from CoffeeScript to TypeScript, from jQuery to React, and from a custom Flux implementation to Redux. Having completed these migrations we identified our utility library, Underscore, as one more candidate for migration.