Over the last few months, the Search Infrastructure engineering team at Dropbox has been busy releasing a new full-text search engine called Nautilus, as a replacement for our previous search engine.
Search presents a unique challenge when it comes to Dropbox due to our massive scale—with hundreds of billions of pieces of content—and also due to the need for providing a personalized search experience to each of our 500M+ registered users. It’s personalized in multiple ways: not only does each user have access to a different set of documents, but users also have different preferences and behaviors in how they search.
Dropbox is one of the most popular desktop applications in the world: You can install it today on Windows, macOS, and some flavors of Linux. What you may not know is that much of the application is written using Python. In fact, Drew’s very first lines of code for Dropbox were written in Python for Windows using venerable libraries such as
Though we’ve relied on Python 2 for many years (most recently, we used Python 2.7), we began moving to Python 3 back in 2015. This transition is now complete: If you’re using Dropbox today,
Our workdays are getting noisier. Never-ending emails, text messages, constant notifications from more apps and more platforms—it’s disruptive and distracting. And then there’s content. All kinds of documents, spreadsheets, presentations, videos, and photos. Industry research shows that employees at larger organizations use an average of 36 cloud services at work, including tools for productivity, project management, communication, and storage. This information overload is a key source of pain for people at work—and a prime opportunity to leverage the help of machine intelligence.
How do we define machine intelligence?
When we talk about machine intelligence at Dropbox,
In 2018, Dropbox has focused on improving our world-class bug bounty program. From increasing bounties to protecting our researchers, we’re always looking for more creative and meaningful ways to stay ahead of the game when it comes to running this program.
As an example, we recently partnered with HackerOne to host their H1-3120 live-hacking event in Amsterdam. Live-hacking events let participants hack on a target—often in person—submit vulnerabilities, and receive bounties quickly, all during the course of the event. Live-hacking comes with a number of benefits over traditional bug bounty programs, such as real-time communication and relationship building,
Modernizing the front-end stack
The core Dropbox web application is 10 years old and used by millions of users per day. Hundreds of front-end engineers across multiple cities actively work on it. Unsurprisingly, our codebase is very large and somewhat irregular. Recently written parts have thorough test coverage, other parts haven’t been updated in years.
Over the past two years we’ve worked to modernize our front-end stack. We’ve successfully moved from CoffeeScript to TypeScript, from jQuery to React, and from a custom Flux implementation to Redux. Having completed these migrations we identified our utility library, Underscore, as one more candidate for migration.
Compressing your files is a good way to save space on your hard drive. At Dropbox’s scale, it’s not just a good idea; it is essential. Even a 1% improvement in compression efficiency can make a huge difference. That’s why we conduct research into lossless compression algorithms that are highly tuned for certain classes of files and storage, like Lepton for jpeg images, and Pied-Piper-esque lossless video encoding. For other file types, Dropbox currently uses the zlib compression format, which saves almost 8% of disk storage.
We introduce DivANS,