Building Carousel, Part I: How we made our networked mobile app feel fast and local

// By Stephen Poletto • Apr 14, 2014

When we began the journey of building a mobile app for Dropbox a few years ago, we started simple — our Android and iOS apps allowed our users to view their files on the go, and cache them for offline access. As smartphones became more popular, we realized we could provide another great service on mobile: automatic backup of all the photos and videos taken on these devices, so they’d be safe forever in Dropbox.

Last Wednesday, on April 9, we took another giant leap forward with the introduction of Carousel. Carousel is a single home for all your photos and videos, independent of whether they’re local to the device you’re using, or already backed up to Dropbox.

While Carousel seems pretty simple on the surface, there were a number of technical challenges we faced in building it. We needed to ship both an Android app and an iOS app on day one, which required us to think critically about how to share code between the two platforms. In order to support collections of over 100,000 photos, we needed to prioritize performance and find a way to beat the garbage collector on Android. In the coming weeks and months, we want to share the story of how we went about building Carousel and provide some insight into the hard engineering problems we solved along the way. Today, we’re going to focus on how we built Carousel to feel fast, responsive, and local, even though the data on which users operate is ultimately backed by the Dropbox servers.

Make it Faster!

As we thought about what we wanted in the next iteration of a mobile photos product, we kept coming back to this guiding principle:

A Dropbox-powered gallery app can and should be just as fast as a local gallery app and should never force the user to wait to complete an action. Users should be able to view, curate, and share their photos regardless of state; they should never have to wait or worry about which photos are local and which are not.

As long as our app was slower than a local gallery app, we knew it would never become the central place where our users go to view and interact with their photos. With this guiding principle in mind, we took a critical look at the Dropbox app’s photos tab, and identified two key problems that made the app feel way too slow:

1. The photos tab makes blocking HTTPS requests in order to sync user actions to the server. For instance, when the user tries to share a photo, this is what they see:

The same is true when the user tries to delete a photo from the photos tab. In the event of no connectivity, these requests outright fail and require the user to try again later.

2. There’s no way to view and interact with photos that are local only to the device (i.e. not yet uploaded to Dropbox).

These two problems, when combined, made the app especially difficult to use in the context of sharing photos with others. In order to share using the Dropbox app, users first had to wait for their photos to back up, then wait on a blocking network request to complete the share. The app also couldn't be used as a replacement for a traditional gallery, since photos captured offline can't be viewed at all.

Client Architecture

To solve these problems, we need to build a unified data model, in which local photos and remote photos are treated as equivalent objects, with all the same capabilities and properties. Second, considering that humans can perceive application response delays at around the 100 ms mark, we simply can't afford to make user-visible blocking network calls. Instead, we need to build an eventually consistent system, where the user can perform some action, immediately see the effect of that action locally, then eventually see the effect of that action globally on other devices. In the academic world, this is known as optimistic replication.

To build a merged view of both local and server content, we first need the client to stay up to date with changes that are happening remotely on the server. To achieve that, we use HTTP long polling to get notified of changes, and use a variant of our delta API to pull those changes down. Delta returns the changes that have occurred to a user’s Dropbox since the last time the client called up to the server. That is, it provides the additions, deletions and modifications to photo metadata that have occurred since the prior cursor. When we fetch these changes, we write the most up-to-date server metadata into a server_photos table in SQLite. The server_photos table is purely a cache of the “truth,” which lives on the server.

Meanwhile, our client-side camera roll scanner computes a fast hash of each photo to determine which photos have not yet been backed up to Dropbox. We turn a photo that needs to be uploaded into a photo_upload_operation, and likewise serialize it into SQLite.

Finally, before we can render the view, we have a third input source in the form of client-side user actions. Whenever the user hides or deletes a photo in Carousel, we want the action to take effect instantly. We can then asynchronously write that change back to the server. To do so, we construct a HideOperation, or DeleteOperation, which also gets persisted to SQLite.

Every user action in Carousel thus becomes an operation, which will eventually be synced to the server. These operations are placed into in-memory operation queues and persisted to SQLite for conservation across app launches. For each queue, there’s a dedicated operation sync thread, which waits until an operation is ready to execute, then makes the HTTPS call necessary to submit the change to the server. Whenever we need to render a view to the user, we consult these pending operations to make sure we’re reflecting the user’s latest actions. It’s only safe to remove these operations once we’re certain we’ve seen their effect come down in the delta. We thus end up with an architecture that looks like this:

Let’s walk through an example of rendering the primary grid view to the user.

class DbxPhotoClient {
    list<DbxPhotoItem> list_photos();
};

Inside the implementation of list_photos, we:

1. Read all server photo metadata out of the server_photos table. 2. Add in all the local photos pending upload.
3. Remove photos which have been hidden or deleted.

For example, suppose our server_photos table contains the following data:

Server ID Hash Hidden On Server
A hash_a No

Our photo_upload_operations table contains the following data:

Camera Roll ID Hash
B hash_b

And our photo_modification_operations table contains the following data:

Operation Type Photo ID(s)
Hide [Server ID = A]

Our call to list_photos() will produce as final output the result of unioning local and server content, then applying the pending hide:

ID Hash Is Hidden
A hash_a Yes
B hash_b No

Note that in practice, forcing the UI to call list_photos() to perform a read from SQLite every time there’s a change to the photo model would be prohibitively expensive. Instead, we keep the photo model loaded in memory, and modify it as changes come in (either via user actions in the app, or remote changes on the server). This is not all that different than the delta API we use to sync down changes from the server. To keep things fast, we essentially introduce another level of delta between disk and memory. In the next blog post, we’ll take a look at exactly how this works, and how it enabled us to build an app that can handle over 100,000 photos.

The key idea in the example we walked through above is that applying a client-side photo addition and hide on top of cached server state should provide the same result as eventually uploading the photo and applying the hide on the server. Whenever we render data in Carousel, we first consult the cached server state, then “re-play” pending operations on top of it. In the case of hide & delete, we then rely on last-write-wins semantics on the server to resolve any conflicts.

This works really well for photos that are already in Dropbox, since the photos already have server IDs. Each pending operation can store the server ID(s) on which it should be applied. But what happens when we want to allow modifications to photos that haven’t finished uploading yet? As an additional constraint, keep in mind that due to the multi-platform nature of Dropbox, the photo might be uploaded from a source other than the Carousel client. Even when that happens, we still need to resolve any pending actions that were taken on that photo.

Identifying a Photo

There are a few different ways to ensure an action taken on a local-only photo gets synced to the server properly. We wanted something simple and relatively stateless to keep the client-side logic easy to reason about. To achieve this, we introduced the concept of a device-specific ID, henceforth referred to as a LUID (locally unique ID), as the canonical way to refer to each photo. A LUID is a stable identifier, meaning it can be used to refer to a photo both before and after it has been uploaded. A LUID is simply an autoincrement integer, and it works like this:

When we scan the device for new photos and find a photo that needs to be uploaded to Dropbox, we create a LUID for that local photo. We then add an entry in the local_photo_luids table, which maps the LUID to its native camera roll ID.

When a new server photo S comes down in delta, we check if S.hash matches any local photo hashes. If not, we create a new LUID, and add an entry to the server_photo_luids table, which maps the LUID to its server ID.

In the event the hash does match some local photo L, it means L has finished uploading and we now have its server metadata available. We assign S.photo_luid = L.photo_luid. At the same time, we also mark the relevant photo_upload_operation as completed. To prevent conflicts (for instance if the same photo gets added to the user’s Dropbox multiple times), the first server photo with the same hash is the one that will “complete” the upload operation and claim the LUID.

You’ll notice by using this logic, we always have a stable way to refer to a particular photo without worrying about whether it is on the server or not. This reduces a lot of complexity in the app, since the UI can simply treat LUIDs as the basis of equality between photo objects. When a local photo finishes uploading, we don’t need to worry about tracking down each reference to it and “upgrading” the reference to use the new server ID. The LUID abstracts that away.

Faster Sharing

With LUIDs in hand, let’s take a look at what happens when a user shares in Carousel.

Suppose the user selects a batch of photos, some of which are still local only to the device, and some of which are already in Dropbox. Even if one of these photos finishes uploading while the user is still selecting photos, their selection will be preserved, since the selection set is based on LUIDs.

 

After the user selects the recipients with whom they’d like to share, we can construct the corresponding share operation.

DbxShareOp op(photo_luids, recipients);
op.persist(); // Save the operation to SQLite

When we render the resulting conversation view, we read the cached server state for the conversation uniquely identified by the recipients. We then re-play this pending share on top of it, just like all the operations we’ve seen before. We could spend a whole blog post going into more depth here, but for now we’ll defer that discussion.

If any of the LUIDs within the share operation are still local only (i.e. they do not have entries in the server_photo_luids table), then we know the share is not yet ready to be submitted to the server. The share operation queue can therefore sleep, and wait until the local-only photos in question are uploaded. We consider this a dependency on the share operation, which must be resolved before the operation is ready for remote execution. As part of constructing the share operation, we also mark the relevant photo_upload_operations as “blocking a share”, so that they become re-prioritized to the front of the upload queue.

When the dependent photo uploads complete, the share operation is ready to execute on the server. We look up the server IDs (via the server_photo_luids lookup table) and send a request to the server to perform the share.

The best part is that all of this happens asynchronously, so the user is free to continue using the app, or go about their day. No spinners, no waiting.

Lessons Learned

The big lesson we learned from building the Dropbox app photos tab was: don’t block the user! Instead of requiring changes to be propagated to the server synchronously, we built Carousel from day one as an eventually consistent system. With mobile networks still slow and unreliable, we knew this would be the only way to deliver a Dropbox-backed gallery that felt fast and local.

The asynchronous, delta-based design to our mobile library empowered us to build an app that was much faster than the Dropbox photos tab. This design enabled us to hide the latency between client and server from the user. In the next installation of this series, we’ll go into more depth on the latency between disk and memory, and how optimizing that was also critical to making the app feel fast.


// Copy link