Annotations on Document Previews

// By Laura Harris Neal • Nov 30, 2016

Introduction

Location-specific feedback has always been fundamental to collaboration. At Dropbox, we’ve recognized this need and implemented annotations on document previews. Our goal was to allow users to provide focused and clear feedback by drawing rectangles and highlighting text on their documents. We ran into a few main challenges along the way: How do we ensure annotations can be drawn and rendered accurately on any kind of document, with any viewport size, and using any platform? How can we maintain isolation of user documents for security? How can we keep performance smooth and snappy? Below, I’m going to answer these questions and dive a bit deeper into how annotations work at Dropbox.

FileViewer architecture

screenshot-2016-10-31-13-17-58

Before jumping into the annotations library, let’s take a look at our existing file preview architecture. On the web, files are previewed by our FileViewer module, which behaves differently depending on file type. Images and text files are relatively simple, and can be inserted directly into the DOM. Previewing more complicated files (e.g. PDFs, Microsoft Office files, and Adobe Illustrator files), requires first generating a PDF preview on the back-end and then displaying that preview in an iframe within the FileViewer.

User documents are incredibly variable and could potentially contain malicious content. Therefore, complicated filetypes with generated previews are shown in an iframe. Since the iframe’s source comes from a different domain, its context doesn’t have direct access to the main site’s DOM, CSS styles, JavaScript functions, cookies, or local storage. Thus, the user-generated content is effectively isolated. Within that iframe, Dropbox uses PDF.js to display the generated previews. To maintain isolation, PDF.js knows nothing about the user and has the sole purpose of rendering a PDF at a given URL. Using PDF.js in an iframe has some additional benefits besides increased security: it keeps the code simple and allows us to benefit from an existing technology with a large user base and continuous upgrades.

While this structure worked very well for vanilla read-only document previews, it provided some substantial challenges when it came time to enhance previews with inline annotations. We now needed to do more than simply view a document, so we had to establish communication between PDF.js and our FileViewer. This communication happens with FrameMessenger, a Dropbox proprietary message-passing module which sends information in JSON. Although annotations must work for arbitrary file types and platforms, the following discussion will use PDF previews on the web as an illustrative example.

Annotations

Screenshot 2016-10-12 11.56.22.png

At Dropbox we use the React JavaScript library for our front end. Annotations have two main React components, which can easily be reused on arbitrary document types: the inline markup itself (the Annotation) and the corresponding comment bubble (the AnnotationBubble ). The Annotation is a yellow overlay positioned within the document itself that refers to part of its content. Currently, this could be a text highlight or a rectangle. In the future we may add other types, such as freehand shapes or pointers. The Annotation is placed and sized based on user mouse events and must react smoothly while being created. Annotations also must move with the document when it’s scrolled or resized.

The second component is the corresponding AnnotationBubble, which has comment text contained in a popup “bubble” which floats near the Annotation. The AnnotationBubble will contain the original comment, along with any replies and a list of users relevant to the conversation. When a user @mentions someone, our CommentComposer React component brings the feedback directly to the attention of a recipient via an email and popup notification. This AnnotationBubble must be visually attached to its Annotation, but must also connect with other Dropbox components, such as the User object, the contact list popup, and the comments list side panel.

Integrating annotations with the existing framework

One of the most challenging aspects of designing the architecture for annotations was deciding how to bridge the divide between the disjointed ecosystems of the Dropbox FileViewer and the PDF.js iframe. Where should we get mouse events from? Where should we draw the Annotation and AnnotationBubble? Ideally, the Annotation and AnnotationBubble should move smoothly while the mouse interacts with them or the document scrolls or resizes. Also, AnnotationBubble should be able to float over the edge of the document, but the Annotation should be clipped at the edge. For implementation simplicity, we wanted to limit modifications to the third-party PDF.js and do most of our development in Dropbox’s FileViewer. Finally, we needed to be careful about what data we’re sending to and from the iframe. If we relied on too much data flow, performance could be adversely affected. More importantly, we didn’t want to compromise the security provided by the iframe’s encapsulation by sending sensitive user data across to the document.

Option 1: All components in the PDF.js iframe

One option would have been to customize PDF.js and implement everything within the iframe. The annotation components could be, in every sense, “inside” the document. This means that resizes and scrolls could immediately and seamlessly update the Annotation’s position, no calculations needed. Also, the Annotation would never overflow the bounds of the document, since it would be automatically clipped by the iframe. Although this has huge performance and simplicity benefits, it also has some serious drawbacks:

  1. The AnnotationBubble would also be clipped by the iframe, which should instead be allowed to be overlaid across the document bounds to maximize valuable viewport space.
  2. There would be a large cost to implementation simplicity. Since all the annotation components would be inside PDF.js, they would have to be compiled into this block of “vanilla” JavaScript, and would be very hard to maintain as PDF.js develops.
  3. These components would also be hard or impossible to generalize for non-iframe preview types.
  4. Finally, we would have to send all of the necessary information for the comment bubble from the FileViewer ecosystem into the iframe, including a user’s information and their contact list. Sending this sensitive user data across to the iframe would break the FileViewer’s security encapsulation. Although we could’ve overcome the other difficulties mentioned above, preserving security was the main reason an all-iframe implementation wasn’t chosen.

Option 2: All components in the parent FileViewer

The opposite approach would have been to implement everything in FileViewer, in an overlay on “top” of the iframe. Advantages would include aligning the development process more with the rest of the Dropbox website and allowing for easier code reuse between other Dropbox systems and between document types. Also, information passing between the AnnotationBubble and FileViewer would be trivial and would have no security implications. However, with this approach it becomes very hard to make the Annotation look like a part of the document. Instead of having to transmit bulky user information in JSON via the FrameMessenger as before, we’d have to send streams of fast-moving mouse, scroll, and resize events. The time required for this cross-document communication, along with translation between coordinate systems and manual repaints of the Annotation would cause the Annotation to perceptibly lag behind a user’s mouse or the document’s scroll. Annotations could also flow outside the document’s edges, and the illusion that the Annotation was attached to the document would be impossible to maintain.

Option 3: Hybrid solution

We found that a compromise between these two options was the best solution, both for code quality and performance. The code for the Annotation is integrated into PDF.js so that relevant mouse events are captured and used right away. Since the Annotation is inside the iframe and attached to the document as a child div, it moves smoothly along with the document when it’s scrolled or resized. The Annotation is also automatically clipped when it overflows the iframe. The AnnotationBubble, however, is in the parent FileViewer, and benefits greatly from direct access to other Dropbox components and data. It also can easily overflow the iframe window, allowing for a better use of viewport space. However, since its position needs to follow the Annotation in the iframe, any movements of the Annotation are sent up through the FrameMessenger and then translated to the viewport’s coordinates. This does introduce a delay in the AnnotationBubble’s movements, which we mitigate by hiding it when its Annotation is moving. There is also some necessary algorithmic complexity involved in translating positions between the iframe and FileViewer, which we describe in the appendix at the bottom of the post. (In fact, every different type of preview has its own interface for accepting and translating movement events sent from the Annotation to the AnnotationBubble.)

This table summarizes the three options above:

Option Pros Cons
  1. Annotation & AnnotationBubble in PDF.js iframe
  • Annotation smoothly follows document when it scrolls/resizes.
  • Annotation is automatically clipped by the document edge.
  • The AnnotationBubble would be undesirably clipped.
  • Modifying PDF.js is more complex than staying in Dropbox ecosystem.
  • Can’t generalize to non-iframe preview types.
  • Sending lots of user data to iframe breaks security encapsulation and hurts performance.
  1. Annotation & AnnotationBubble in FileViewer
  • AnnotationBubble floats over document edge.
  • Implementation is simpler in FileViewer .
  • Can be generalized for non-iframe preview types.
  • Can use existing Dropbox components.
  • Only events would have to be sent across iframe.
  • Annotation would perceivably lag as document scrolls/resizes.
  • Annotation would undesirably flow over the document edge.
  1. Compromise: Annotation in PDF.js iframe & AnnotationBubble in FileViewer
  • Annotation smoothly follows document when it scrolls/resizes.
  • Annotation is automatically clipped by the document edge.
  • AnnotationBubble floats over document edge.
  • Most of the implementation can be generalized for non-iframe preview types.
  • Can use existing Dropbox components for AnnotationBubble.
  • Only events would have to be sent across iframe.
  • AnnotationBubble would perceivably lag as document scrolls/resizes (we hide it during movement to mitigate this).

Example action flowing through whole system

The following example shows how we isolate the preview and how we deal with communication across the iframe. In this scenario, the user has decided to place an annotation on a PDF and has already begun drawing a rectangle by clicking and dragging her mouse across a part of the screen. Now, the user releases the mouse, starting a flurry of events, summarized in the diagram below the animation.

annotations-dataflow
(Click to zoom into the diagram)

iframe events

  1. PDF.js contains the actual document preview and has event listeners set up on the browser’s window object. It receives the mouseup and informs PdfJsAnnotationInterface.
  2. PdfJsAnnotationInterface does all of the document type-specific communication between the preview, the more general AnnotationController, and the FileViewer.
  3. From here, the event gets passed to the AnnotationController, which determines which Annotation the event gets passed to next (this could be either a new Annotation or one that’s currently being drawn/edited). In this case, AnnotationController knows we’ve previously been dragging the mouse to create a region, so it calls the AnnotationRegion’s onMouseUp callback.

The following is a simplified version of the coffeescript code in AnnotationRegion’s onMouseUp callback (the code path specific to this example is bold):

AnnotationRegion = React.createClass(
  ... 
  # If dragging/resizing, we can stop now. 
  # Otherwise, the click happened elsewhere and we just hide the rectangle
  onMouseUp: (event) ->
    if @_isModifying()  # the mouse was just interacting with the region 
      # Update the annotation 
      @_updateAnnotationFromState()  # updates @annotation dict based on state 
      # Call "Annotation Placed" or "End Drag" depending on whether or not 
      # we were just creating the region 
      if @state.isInitialCreation 
        @props.onAnnotationPlaced?(@annotation)  # back to AnnotationController 
      else 
        @props.onAnnotationEndDrag?(@annotation)

      # Disable creation mode
      @setState {
        isInitialCreation: false
      }
      # the mouse was not interacting with the region, 
      # so a click outside it tells it to hide.
      else  
        @hideAnnotation(event)
  ...
)
4. The AnnotationRegion’s state contains coordinates set from previous mousemove events. Now it updates its @annotation object based on this state and passes this back to the AnnotationController via the onAnnotationPlaced callback.
5. The AnnotationController takes note that the current AnnotationRegion is done and passes the @annotation back to the PdfJsAnnotationInterface.
6. Now we’re finally ready to send information out of the iframe to the FileViewer! PdfJsAnnotationInterface translates the coordinates in the annotation from PDF points to viewport pixels, packages it up in a JSON message, and sends it over the iframe boundary via FrameMessenger.

This is an example of the actual JSON that gets sent across the iframe boundary:

payload: {
  action: "annotation-placed"
  parameters: {
    pdf_coordinates: [  // original PDF location information
      page: 1
      page_size: {
        height: 790
        width: 610
      }
      coordinates: [
        x: 250.0, y: 540.0
        x: 250.0, y: 360.0
        x: 410.0, y: 360.0
        x: 410.0, y: 540.0
      ]
    ]
    type: 2   // 2 = region
    // text_highlight would contain the selected text for a highlight
    text_highlight: null
    viewport_coordinates: [   // translated viewport pixels
      x: 530, y: 510
      x: 530, y: 870
      x: 830, y: 870
      x: 830, y: 510      
    ]    
  }
}

FileViewer events

7. Now in the FileViewer domain, the "annotation-placed" JSON message lands in FileViewerInterface, which handles all communication between the preview and FileViewer.
8. From here, the message is passed on up to FilePreviewAnnotations, which is responsible for all FileViewer-side annotation logic.
9. FilePreviewAnnotations updates the commenting Store and it sets up the data necessary for a new AnnotationBubble. Specifically, the Store updates the createAnnotationBubble part of its state, as shown below. Note that since Commenting uses a Flux architecture, the Store only contains the shared     state; it does not actually create the new AnnotationBubble.
return Reflux.createStore({
  ...
  onStartAnnotationCreation: ({annotation}) -> 
    @setState({
      # createAnnotationBubble contains information for creating 
      # the AnnotationBubble for a new Annotation 
      createAnnotationBubble: { 
        annotation: annotation 
        showBubble: true 
      } 
    }) 
  ... 
})

10. The new AnnotationBubble is created in FilePreviewOverlay, which listens for updates in the Store. When Store.createAnnotationBubble changes, FilePreviewOverlay receives this update. As a result, it positions and creates a new AnnotationBubble.

11. The user then types a comment into the AnnotationBubble and hits “Post”.
12. This triggers an event in ActionCreators.addAnnotation. While the Store contains global state in the Flux paradigm, it is ActionCreators that handles global actions, including side effects and I/O. ActionCreators.addAnnotation actually saves the annotation and comment to Dropbox’s back-end data centers.
13. If the save is successful, ActionCreators updates the Store, clearing Store.createAnnotationBubble.
14. Like before, FilePreviewOverlay hears this update and hides the AnnotationBubble.

Conclusion

The information cascade in the above example was started by a single user mouse action, and multitudes of other events are fired continuously as the user interacts with the preview. Events also go in the reverse direction, as FileViewer needs to inform the iframe of higher-level actions such as the user turning commenting on or off. To make the annotations system react smoothly and sensibly to all of this input, we needed to bridge the gap between our intentionally isolated document preview and the broader Dropbox environment. As explained above, we kept the purely visual Annotation simple and attached it directly to the document to maximize its performance. The information-heavy AnnotationBubble was kept outside and a flexible interface was made to connect them. This separation of components and use of interfaces made it easy to gracefully extend this implementation for image files, and will make annotations possible on many more file types in the future.

Try out annotations on a sample file today!

Appendix: Coordinate translation between iframe and FileViewer

For PDFs, the decision outlined above to split annotations between the iframe and FileViewer meant that coordinates would have to be translated between two different systems: PDF points and viewport pixels.

Screenshot 2016-10-12 14.06.56.png
Screenshot 2016-10-12 14.06.56.png

On PDFs, positions are expressed in relation to a physical printed document. Each position is measured from the bottom left corner of a page and expressed in “points” (one of which equals 1/72 of an inch on a printed page). Conversely, positions in the viewport are measured from the top left of the viewer’s viewport and expressed in pixels. When translating from PDF points in the iframe to pixels in the viewport, the current page and scroll position of the document both need to be taken into account to calculate an offset. Also, the vertical component of the point needs to be reversed. Finally, the zoom level of the document is used to determine the multiplier required to complete the translation to viewport pixels.

All of this translation is required every time the Annotation moves, whether the movement is caused by the user drawing the Annotation, scrolling/resizing the document, etc. This position information is sent as a stream of information from the iframe to the FileViewer. Information is also passed in the other direction, from the FileViewer to the iframe. For example, a message is passed down when a user changes the visibility of all comments on a document, or when the user interacts with the AnnotationBubble to post or delete a specific comment. Fortunately, all these simple messages are fast to transmit, resulting in no performance issues.


// Copy link