Building Carousel, Part II: Speeding Up the Data Model

In the last post, Stephen explored the asynchronous client-server communication model we use in Carousel to provide a fast user experience, where interactions aren’t blocked on network calls. But network latency was not our only hurdle. In order to make the app feel fast and responsive, we also needed to minimize user-visible disk latency. One of Carousel’s primary goals was to make all of the user’s photos and videos accessible in one continuous view. We have users with over 100,000 photos in Dropbox, and we needed to build a metadata-loading pipeline that could accommodate them. This meant building a metadata-loading system that can show photos on-screen within a second of opening the app and provides smooth scrolling even as the user navigates back in time through their entire history of photos. Simply reading the metadata off of disk is too slow to do on demand while scrolling, so we keep a model of the user’s photo metadata in memory that reflects the metadata we have in our cache on disk.

Keeping the in-memory model correct with respect to the on-disk model is tricky because there are many threads that want to modify the state of the model concurrently: the thread receiving file changes from the server, the main thread with interactions from the user, and various background tasks like uploading pictures. Furthermore, we can’t read all of a user’s metadata off of disk immediately on app startup because it would block the user from seeing their photos for far too long. For example, on a modern Android smartphone (Nexus 5) with about 300 bytes of metadata per photo, it takes a full second to read metadata for a mere 5,000 photos out of SQLite.

Data Loading in Smaller Data Sets

Before we look at Carousel’s in-memory model, let us take a moment to consider common practices of loading and caching data to display. On Android, this is typically done with Loaders and Cursors. For the photos tab on our Dropbox mobile app, we store photo metadata in SQLite and then retrieve it using a SQLiteCursor which wraps a query to SQLite. This has a few problems because the Cursor interface doesn’t mesh well with the interface SQLite exposes. SQLite is a single-threaded library, so to read the results of a SQLite query in C, one needs to execute a query and step through the result rows all while holding a lock on the database connection. But the Cursor interface allows for later access to the query result, so what SQLiteCursor does is that the Java object runs the SQL query lazily, caching a fixed amount of query results at a time (2MB by default). There are quite a few drawbacks here, especially if you have to deal with more than 2MB of data. First, the Cursor grabs the next page of data by rerunning the query with an OFFSET clause. If the data set changed between the first time the query was run and the second, it might miss returning some rows. Second, it’s easy to miss the fact that the query is run the first time the Cursor needs data (generally during the first call to moveToFirst or getCount), which might be on the main thread. Even if you work around this while preparing the query by forcing the query to run on a background thread, a second run of the query will be whenever you advance past the first 2MB, which is also likely to be on the main thread. A disk read of 2MB on the main thread can cause a several-second hang. On iOS, although we get to interface directly with SQLite, we have similar problems because the naive implementation is to load data with a single SQL query, generally with a limit to the number of rows returned. When we built our original iOS and Android apps, we chose to work around these issues by paginating the photos tab, choosing a page size of approximately 5,000 photos where the latency to do a blocking disk read of a page of metadata is “tolerable”.

Accumulator Model

In Carousel, we wanted to fix the experience for users with more than a single page of photos by using a different model for loading photo metadata off of disk. When it comes time to render the view to the user, it doesn’t matter how we store the user’s data on disk as long as we can quickly bind data to the view. We prepare an in-memory “view model”, a data structure that provides exactly this: an index on our metadata that is fast enough to query on the UI thread at render time. We use the view model to implement UI callbacks like iOS’s cellForRowAtIndexPath: and Android’s Adapter.getView(). Because the way we access this data structure is similar between iOS and Android, we can actually share this view model between the two platforms, implementing the logic only once in C++. Having a large data model in C++ rather than in Java also helps us avoid stuttering app behavior on Android due to large garbage collections. While we won’t go into depth on cross-platform mobile development with C++ here, we plan to write about this more in a future blog post.

There were several performance characteristics we needed to be careful about in implementing this view model. First, it needed to be fast to update. Whenever a new photo gets added to the user’s Dropbox remotely on the server, perhaps because they added it from their computer, we need to be able to reflect that update quickly, without reloading the entire data set. Second, we also need to be able to load the initial view of Carousel without blocking on a full read of the metadata on disk. Both of these requirements drove us toward an interface that would allow for incremental updates, which is why we call our in-memory data model an “accumulator”. The accumulator prepares the view model to pass off to the UI whenever changes happen with the user’s data. The model that our timeline view uses has a fairly simple transactional interface:

class EventsAccumulator {
  virtual ~EventsAccumulator() {}

  virtual void add_photo(const string & photo_id,
                         const PhotoMetadata & photo_metadata) = 0;
  virtual void remove_photo(const string & photo_id) = 0;
  virtual void add_event(const string & event_id,
                         const EventMetadata & event_metadata,
                         const vector & photo_ids) = 0;
  virtual void remove_event(const string & event_id) = 0;
  virtual void commit() = 0;

In one transaction, we add a batch of photos and their corresponding events, then finish by calling commit. At the end of commit, the accumulator might not have all the photos in a user’s collection because we’re still streaming metadata from the server or out of our on-disk SQLite cache, but we’re guaranteed to have a consistent data model where the photos that do appear are in the events we intend for them to be in. With this interface in place, we’ve been able to develop and optimize our database transactions without adding much complexity to the higher application layer. We make use of this interface by adding photos in a few places:

  • when the app is first launched, to load cached metadata out of SQLite and into memory
  • when we download new metadata from the server
  • when our camera roll detector finds new photos that need to be backed up

These three code paths happen on separate threads, so we need to avoid having two threads with open EventsAccumulator transactions at the same time. Each thread locks the EventsAccumulator during a transaction, but it only has to spend that time pushing in-memory metadata into the accumulator. For example, the sync thread looks roughly like this, using calls to a server API like our /delta:

while (/*delta has more*/) {
  Json response = call_server_delta();
  vector events;
  vector photos;
  std::tie(events, photos) = parse_delta(response);
  write_delta_to_disk(events, photos);
  /*with accumulator lock*/ {
    for (const ParsedEvent & event : events) {
      accumulator->add_event(, event.metadata, event.photo_ids);
    for (const ParsedPhoto & photo : photos) {
      accumulator->add_photo(, photo.metadata);

This thread can perform the time-intensive tasks of making the network call and parsing the returned json before it tries to grab the accumulator lock, allowing the thread that’s reading metadata off of disk proceed unblocked. Of course, we still have to coordinate on accesses to disk between these two threads to give the app a consistent view of the data. The thread that’s reading data out of SQLite takes a lock on the database while it reads a batch of metadata that prevents us from executing write_delta_to_disk on the sync thread at the same time.

Metadata Snapshots

With these threads running to populate the accumulator, the data that the UI needs to display can be changing very frequently. To give the UI code control over when the data can change during its calculations, we present the UI with an immutable snapshot of the data model every time we call commit.

One reason why we care about using snapshots is so that the UI can perform an atomic swap of models to refresh its data. There are generally two major ways UI frameworks will support updating the data backing the UI:

  1. swap the entire view model for a new model
  2. apply incremental updates to the current view model

Android primarily uses the first pattern, while iOS generally uses the second. Incremental updates have an advantage in that they allow for animating changes to the data model (i.e. animating insertions and deletions). But they’re not strictly necessary, and one can derive them by taking diffs between snapshots. Our shared code prepares snapshots of the view model that support the first pattern. The snapshots allow looking up photos by absolute position in the ordering of all photos and by (event_index, index_within_event) pairs (iOS developers will recognize this pair as NSIndexPath in disguise). When we get a new snapshot, on iOS we call [UITableView reloadData:] to swap the new view model for the old. On Android, we wrap the snapshot with a thin layer that calculates the row index of each event and use it as the data source for our ListAdapter.

Speedy Snapshots

An immutable data structure gives good logical properties to someone using it, but often worse performance than its mutable counterpart depending on how it’s implemented. In the simplest implementation, when we’d like to change an item, we would have to make a copy of the entire immutable data structure with that single item changed. Our first, naive implementation of a snapshot of all the photo metadata was a sorted array of all the photos in the timeline. Say a user with 50,000 photos hides a photo in their timeline; making a copy of the entire photo metadata array and re-sorting it (a guard for the more general case of photos changing events) takes us almost a second! That’s a terrible user experience. To remove this latency, we improved the performance of these immutable snapshots dramatically by changing up the structure so we could take fast, shallow copies.

Taking advantage of the fact that we group our photos into events and that changes tend to happen to photos in a handful of events at once, we changed the structure of metadata snapshots from a single array of photos to an array of events with pointers to arrays of photos in each of them.

Stacked List redux

To make a copy of this snapshot with a single event changed, we only have to make a deep copy of the single changed event and then copy pointers to other events that keep their own arrays of photos.


This scheme gets a bit more complicated because our snapshots also cache information to make various lookups faster, but having the ability to make fast, shallow copies remains the core idea. We expose an interface for looking up a photo by absolute position in the entire snapshot; this is implemented by keeping the offsets of the beginnings of the events into the full list, then doing a binary search over these to find in which event a photo index lands. Here are the offsets for the earlier example:

Event Offsets redux

And here they are after a photo gets added:

Event Offsets 2 redux

Coming Next in “Building Carousel”

This accumulator and snapshot design works best because we can hold a user’s view model with metadata for all of their photos in memory. We can’t hold all of the user’s photos in memory at the same time, though, so we have a different solution designed to keep a window of the thumbnails in memory at any given time, fetching them around where the user is looking. There are also more parts of the UI layer that we’ve optimized between the snapshot model and the end result users see. For example, the layout for showing users’ metadata and photos in conversations is also significantly different from the events view. We’ll dive into more details on these topics in future blog posts.