The Visible Effects and Hidden Sources of Internet Latency

Most Internet Service Providers advertise their performance in terms of downstream throughput.  The “speed” that one pays for reflects, effectively, the number of bits per second that can be delivered on the access link into your home network.  Although this metric makes sense for many applications, it is only one characteristic of network performance that ultimately affects a user’s experience.  In many cases, latency can be at least as important as downstream throughput.

For example, consider the figure below, which shows Web page load times as downstream throughput increases—the time to load many Web pages decreases as throughput increases, but downstream throughput that is faster than about 16 Mbps stops having any effect on Web page load time.

web-plt

Page load times decrease with downstream throughput, but only up to 8–16 Mbits/s.

The culprit is latency: For short, small transfers (as is the case with many Web objects), the time to initiate a TCP connection and open the initial congestion window is dominated by the round-trip time between the client and the Web server.  In other words, the size of the access link no longer matters because TCP cannot increase its sending rate to “fill the pipe” before the connection has completed.

The role of latency in Web performance is no secret to anyone who has spent time studying it, and many content providers including Google, Facebook, and others have spent considerable effort to reduce latency (Google has a project called “Make the Web Faster” that encompasses many of these efforts).  Latency plays a role in the time it takes to complete a DNS lookup, the time to initiate a connection to the server, and the time to increase TCP’s congestion window (indeed, students of networking will remember that TCP throughput is inversely proportional to the round-trip time between the client and the server).  Thus, as throughput continues to increase, network latency plays an increasingly predominant role in the performance of applications such as the Web.  Of course, latency also determines user experience for many latency-sensitive applications as well, including streaming voice, audio, video, and gaming.

The question, then, becomes how to reduce latency to the destinations that users commonly access.  Content providers such as Google and others have taken several approaches: (1) placing Web caches closer to users; (2) adjusting TCP’s congestion control mechanism to start sending at a faster rate for the first few round trips.  These steps, however, are only part of the story, because the network performance between the Web cache and the user may still suffer, for a variety of reasons:

  • First, factors such as bufferbloat and DSL interleaving can introduce significant latency effects in the last mile.  Our study from SIGCOMM 2011 showed how both access link configuration and a user’s choice of equipment (e.g., DSL modem) can significantly affect the latency that a user sess.

  • Second, a poor wireless network in the home can introduce significant latency effects; sometimes we see that 20% of the latency for real user connections from homes is within the home itself.

  • Finally, if the Web cache is not close to users in the first place (e.g., in the case of developing countries), the paths between the users and their destinations can still be subject to significant latency.  These factors can be particularly evident in developing countries, where poor peering and interconnection can result in long paths to content, and where the vast majority of users access the network through mobile and cellular networks.

In the Last Mile

In our SIGCOMM 2011 paper “Broadband Internet Performance: A View from the Gateway” (led by Srikanth Sundaresan and Walter de Donato), we pointed out several aspects of home networks that can contribute significantly to latency.  We define a metric called last-mile latency, which is the latency to the first hop inside the ISP’s network. This metric captures the latency of the access link.

We found in this study that last-mile latencies are often quite high, varying from about 10 ms to nearly 40 ms (ranging from 40–80% of the end-to-end path latency). Variance is also high. One might expect that variance would be lower for DSL, since it is not a shared medium like cable. Surprisingly, we found that the opposite was true: Most users of cable ISPs have last-mile latencies of 0–10 ms. On the other hand, a significant proportion of DSL users have baseline last-mile latencies more than 20 ms, with some users seeing last-mile latencies as high as 50 to 60 ms. Based on discussions with network operators, we believe DSL companies may be enabling an interleaved local loop for these users.  ISPs enable interleaving for three main reasons: (1) the user is far from the DSLAM; (2) the user has a poor quality link to the DSLAM; or (3) the user subscribes to “triple play” services. An interleaved last-mile data path increases robustness to line noise at the cost of higher latency. The cost varies between two to four times the baseline latency. Thus, cable providers in general have lower last-mile latency and jitter. Latencies for DSL users may vary significantly based on physical factors such as distance to the DSLAM or line quality.

dsl-latencies

Most users see latencies less than 10 ms, but there are a significant number of users with the last mile latency greater than 10 ms.

Customer provided equipment also plays a role.  Our study confirmed that excessive buffering is a widespread problem afflicting most ISPs (and the equipment they provide). We profile different modems to study how the problem affects each of them. We also see the possible effect of ISP policies, such as active queue and buffer management, on latency and loss.  For example, when measuring latency under load (the latency that a user experiences when the access link is saturated due to an upload or a download), we see more than an order of magnitude of difference between modems. The 2Wire modem we tested had the lowest worst-case last-mile latency, 800 ms. Motorola’s was about 1.6 seconds, and the Westell modem we tested had a worst case latency of more than 10 seconds.

modem-bufferbloat

Empirical measurements of modem buffering. Different modems have different buffer sizes, leading to wide disparities in observed latencies when the upstream link is busy.

Last-mile latency can also be high for particular technologies such as mobile.  In a recent study of fixed and mobile broadband performance in South Africa, we found that, although the mobile providers consistently offer higher throughput, the latency of mobile connections is often 2–3x higher than that of fixed-line connectivity in the country.

In the Home Wireless Network

Our recent study of home network performance (led by Srikanth Sundaresan) found that a home wireless network can also be a significant source of latency.  We have recently instrumented home networks with a passive monitoring tool that determines whether the access link or the home wireless network (or both) are potential sources of performance problems.  One of the features that we explored in that work was the TCP round-trip time between wireless clients in the home network and the wireless access point in the home.  In many cases, due to wireless contention or other sources of wireless bottlenecks, the TCP round-trip latency in home wireless networks was a significant portion of the overall round-trip latency.

We analyzed the performance of the home network relative to the wide-area network performance for distributions of real user traffic in for about 65 homes over the course of one month. We use these traces to compare the round-trip times between the devices and the access point to the round- trip times from the access point to the wide-area destination for each flow. We define the median latency ratio for a device as the median ratio of the LAN TCP round-trip time to the WAN TCP round-trip time across all flows for that device. The figure below shows the distribution of the median latency ratio across all devices. The result shows for 30% of devices in those homes, at least half of the flows have end-to-end latencies where the home wireless network contributes more than 20% of the overall end-to-end latency.  This technical report provides more details concerning the significant role that home wireless networks can play in end-user performance; a future post will explore this topic at length.

lan-rtts

The distribution of the median ratio of the LAN TCP round-trip time to the WAN TCP round-trip time across all flows for that device, across all devices.

Our findings of latency in home networks suggest that the RTT introduced by the wireless network may often be a significant fraction of the end-to-end RTT. This finding is particularly meaning- ful in light of the many recent efforts by service providers to reduce latency to end-to-end services with myriad opti- mizations and careful placement of content. We recommend that, in addition to the attention that is already being paid to optimizing wide-area performance and host TCP connection settings, operators should also spend effort to improve home wireless network performance.

In Developing Regions

Placing content in a Web cache has little effect if the users accessing the content still have high latency to those destinations.  A study of latency from fixed-line access networks in South Africa using BISmark data that was led by Marshini ChettySrikanth Sundarean, Sachit Muckaden, and Enrico Calandro in cooperation with Research ICT Africa showed that peering and interconnectivity within the country still has a long way to go: in particular, the plot below shows the average latency from 16 users of fixed-line access networks in South Africa to various Internet destinations.  The bars are sorted in order of increasing distance from Johannesburg, South Africa.  Notably, geographic distance from South Africa does not correlate with latency—the latency to Nairobi, Kenya is almost twice as much as the latency to London.  In our study, we found that users in South Africa experienced average round-trip latencies exceeding 200 ms to five of the ten most popular websites in South Africa: Facebook (246 ms), Yahoo (265 ms), LinkedIn (305 ms), Wikipedia (265 ms), and Amazon (236 ms). Many of these sites only have data centers in Europe and North America.

jnb-latencies

The average latencies to Measurement Lab servers around the world from South Africa. The numbers below each location reflect the distance from Johannesburg in kilometers, and the bars are sorted in order of increasing distance from Johannesburg.  Notably, latency does not increase monotonically with distance.

People familiar with Internet connectivity may not find this result surprising: indeed, many ISPs in South Africa connect to one another via the London Internet Exchange (LINX) or the Amsterdam Internet Exchange (AMS-IX) because it is cheaper to backhaul connectivity to exchange points in Europe than it is to connect directly at an exchange point on the African continent.  The reasons for this behavior appears to be both regulatory and economic, but more work is needed, both in deploying caches and improving Internet interconnectivity to reduce the latency that users in developing regions see to popular Internet content.

About Nick Feamster
Nick Feamster is a professor in the Department of Computer Science at Princeton University. Before joining the faculty at Princeton, he was a professor in the School of Computer Science at Georgia Tech. He received his Ph.D. in Computer science from MIT in 2005, and his S.B. and M.Eng. degrees in Electrical Engineering and Computer Science from MIT in 2000 and 2001, respectively. His research focuses on many aspects of computer networking and networked systems, including the design, measurement, and analysis of network routing protocols, network operations and security, and anonymous communication systems. In December 2008, he received the Presidential Early Career Award for Scientists and Engineers (PECASE) for his contributions to cybersecurity, notably spam filtering. His honors include the Technology Review 35 "Top Young Innovators Under 35" award, a Sloan Research Fellowship, the NSF CAREER award, the IBM Faculty Fellowship, and award papers at SIGCOMM 2006 (network-level behavior of spammers), the NSDI 2005 conference (fault detection in router configuration), Usenix Security 2002 (circumventing web censorship using Infranet), and Usenix Security 2001 (web cookie analysis).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: