This is an expanded version of my talk at NginxConf 2017 on September 6, 2017. As an SRE on the Dropbox Traffic Team, I’m responsible for our Edge network: its reliability, performance, and efficiency. The Dropbox edge network is an nginx-based proxy tier designed to handle both latency-sensitive metadata transactions and high-throughput data transfers. In a system that is handling tens of gigabits per second while simultaneously processing tens of thousands latency-sensitive transactions, there are efficiency/performance optimizations throughout the proxy stack, from drivers and interrupts, through TCP/IP and kernel, to library, and application level tunings.
Large-scale networks are complex, dynamic systems with many parts, managed by many different teams. Each team has tools they use to monitor their part of the system, but they measure very different things. Before we built our own infrastructure, Magic Pocket, we didn’t have a global view of our production network, and we didn’t have a way to look at the interactions between different parts in real time. Most of the logs from our production network have semi-structured or unstructured data formats, which makes it very difficult to track a large amount of log data in real-time.
At Dropbox, our traffic team recently upgraded the front-end Nginx servers to enable HTTP/2 for our web services. In this article, we would like to share our experiences and findings during the HTTP/2 transition. The overall upgrade was smooth for us, although there are also a couple of caveats that might be helpful to others.
Background: HTTP/2 and Dropbox web service infrastructure
HTTP/2 (RFC 7540) is the new major version of the HTTP protocol. It is based on SPDY and provides several performance optimizations compared to HTTP/1.1. These optimizations include more efficient header compression,