Making Sense of Data Caps and Tiered Pricing in Broadband and Mobile Networks

Last week, I had the pleasure of sitting on a panel at the Broadband Breakfast Club in downtown Washington, DC.  The panel was organized by BroadbandBreakfast.com, an policy and news organization that focuses on policy issues related to broadband service in the United States; the group meets about once a month.  I was asked last fall to sit on a panel on measuring broadband performance, due to our ongoing work on BISmark, but I was unable to make it last fall, so I found myself on a panel on data caps in wired and wireless networks.

I participated on the panel with the following other panelists: Serena Viswanathan, an attorney from the Federal Trade Commission; Patrick Lucey from the New America Foundation; and Roger Entner, the founder of Recon Analytics.  The panel discussed a variety of topics surrounding data caps in broadband networks, but the high level question that the panel circled around was: Do data caps (and tiered pricing) yield positive outcomes for the consumer?

We had an interesting discussion.  Roger Entner espoused the opinion that data caps really only affect the worst offenders, and that applications on mobile devices now make it much easier for users to manage their data caps.  Therefore, data caps shouldn’t be regarded as oppressive, but rather are simply a way for Internet service providers and mobile carriers to recoup costs for the most aggressive users.  Patrick Lucey, who recently wrote an article for the New America Foundation on data caps, stated a counterpoint that echoed his recent article, suggesting that data caps were essentially a profit generator for ISPs, and consumers are effectively captured because they have no real choice for providers.

I spent some time explaining the tool that Marshini Chetty and my students have built on top of BISmark called uCap (longer paper here).  Briefly, uCap is a tool that allows home network users to determine the devices in their home that consume the most data.  It also allows users to see what domains they are visiting that consume the most bandwidth.  It does not, however, tell the user which applications or people are using the most bandwidth (more on that below).  Below is a screenshot of uCap that shows device usage over time. My students have also built a similar tool for mobile devices called MySpeedTest, which tells users which applications are consuming the most data on their phones.  A screenshot of the MySpeedTest panel that shows how different applications consume usage is shown below.

uCap Screenshot Showing Device Usage Over Time
Screen Shot 2013-02-26 at 6.47.46 AM
MySpeedTest Showing Mobile Application Usagemst3

I used these two example applications to argue that usage caps per se are not necessarily a bad thing if the user has ways to manage these usage caps.  In fact, we have repeatedly seen evidence that tiered pricing (or usage caps) can actually improve both ISP profit and make consumers better off, if the consumer understands how different applications consume their usage cap and has ways to manage the usage of those applications.  Indeed, our past research has shown how tiered pricing can improve market efficiency, because the price of connectivity more closely reflects the cost to the provider of carrying specific data.  Further, we’ve seen examples where consumers have actually been worse off when regulators have stepped in to prevent tiered pricing, such as the events in summer 2011 when KPN customers all experienced a price increase for connectivity because KPN was prevented from introducing two tiers of service.

The problem isn’t so much that tiered pricing is bad—it is that users don’t understand it, and they currently don’t have good tools to help them understand it.  In the panel, I informally polled the room—ostensibly filled with broadband experts—about whether they could tell me off the top of their heads how much data a 2-hour high-definition Netflix movie would consume against their usage cap.  Only two or three hands went up in a room of 50 people.  I also confessed that before installing uCap and watching my usage in conjunction with specific applications, I had no idea how much data different applications consumed, or whether I was a so-called “heavy user” (it turns out I am not).  My own experience—and Marshini Chetty’s ongoing work—has shown that people are really bad at estimating how much of their data cap applications consume.  One interesting observation in Marshini’s work is that people conflate the time that they spend on a site with the amount of data it must consume  (“I spend most of my time on Facebook, therefore, it must consume most of my data cap.”).

If we are going to move towards pervasive data caps or tiered pricing models, then users need better tools to understand how applications consume data caps and to manage how different applications consume those caps.  I see two possibilities for better applications going forward:

  • Better visibility.  We need applications like uCap and MySpeedTest to help users understand how different applications consume their data cap.  Helping users get a better handle on how different applications consume data is the first step towards making tiered pricing something that users can cope with.  In addition to the applications that show usage directly, we might also consider other forms of visibility, such as information that helps users estimate a total cost of ownership for running a mobile application (e.g., the free application might actually cost the user more in the long run, if downloading the advertisements to support the free application eats into the user’s data cap).  We also need better ways of fingerprinting devices; applications like uCap still force users to identify devices (note the obscure MAC addresses in the dashboard above for devices on my network that I didn’t bother to manually identify).  Solving these problems requires both deep domain knowledge about networking and intuition and expertise in human factors and interface design.
  • Better control.  This area deserves much more attention.  uCap offers some nice first steps towards giving users control because it helps users control how much data a particular device can send.  But, shouldn’t we be solving this problem in other ways, as well?  For example, we might imagine exposing an SDK to application developers that helps them write applications that are more cognizant of data constraints—for example, by deferring updates when a user is near his or her cap, or deferring downloads until “off peak” times or when a user is on a WiFi network.  There are interesting potential developments in both applications and operating systems that could make tiered pricing and demand-side management more palatable, much like appliances in our homes are now being engineered to adapt to variable electricity pricing.

Finally, Patrick made a point that even if users could understand and control usage caps, they often don’t have any reasonable alternatives if they decide they don’t like their current ISP’s policies.  So, while some of the technological developments we discussed may make a user’s life easier, these improvements are, in some sense, a red herring if a user cannot have some amount of choice over their Internet service provider.  This issue of consumer choice (or lack thereof) does appear to be the elephant in the room for many of the policy discussions surrounding data caps, tiered pricing, and network neutrality.  Yet, until the issues of choice are solved, improving both visibility and control in the technologies that we develop can allow both users and ISPs to be better off in a realm where tiered pricing and data caps exist—a realm which, I would argue, is not only inevitable but also potentially beneficial for both ISPs and consumers.

Internet Relativism and the Hunt for Elusive “Ground Truth”

Networking and security research often rely on a notion of ground truth to evaluate the effectiveness of a solution.  “Ground truth” refers to a true underlying phenomenon that we would like to characterize, detect, or measure.  We often evaluate the effectiveness of a classifier, detector, or measurement technique by how well it reflects ground truth.

For example, an Internet link might have a certain upstream or downstream throughput; the effectiveness of a tool that measures throughput could be thus be quantified in terms of how close its estimates of upstream and downstream throughput are in comparison to the true throughput of the underlying link.  Since there is a physical link with actual upstream or downstream throughput characteristics—and the properties of that link are either explicitly known or can be independently measured—measuring error with respect to ground truth makes sense.  In the case of analyzing routing configuration to predict routing behavior (or detect errors), static configuration analysis can characterize where traffic in the network will flow and whether the configuration will give rise to erroneous behavior; either the predictions correctly characterize the behavior of the real network, or they don’t.  A spam filter might classify an email sender as a legitimate sender or a spammer; again, either the sender is a spammer or it is a legitimate mail server.  In this case, comparing against ground truth is more difficult, since if we had a perfect characterization of spammers and legitimate senders, we would already have the perfect spam filter.  The solution in these kinds of cases is to compare against an independent label (e.g., a blacklist) and somehow argue that the proposed detection mechanism is better than the existing approach to labeling or classification (e.g., faster, earlier, more lightweight, etc.).

Problem: Lack of ground truth.  For some Internet measurement problems, the underlying phenomenon simply cannot be known—even via an independent labeling mechanism—either because the perpetrator of an action won’t reveal his or her true intention, or sometimes because there actually is no “one true answer”. Sometimes we want to characterize scenarios or phenomena where the ground truth proves elusive.  

Consider the following two problems:

  • Network neutrality.The network neutrality debate centers around the question of whether Internet service providers should carry all traffic according to the same class of service, regardless of various properties such as what type of traffic it is (e.g., voice, video) or who is sending or receiving that traffic.
  • Filter bubbles.  Eli Pariser introduced the notion of a filter bubble in his book The Filter Bubble.  A filter bubble is the phenomenon whereby each Internet user sees different Internet content based on factors ranging from our demographic to our past search history to our stated preferences.  Briefly, each of us sees a different version of the Internet, based on a wide range of factors.

These two detection problems do not have a notion of ground truth that can be easily measured.  In the latter case, there is effectively no ground truth at all.

In the case of network neutrality, detection boils down to determining whether an ISP is providing preferential treatment to a certain class of applications or customers.  While ground truth certainly exists (i.e., either the ISP is discriminating against a certain class of traffic or it isn’t), discovering ground truth is incredibly challenging: ISPs may not reveal their policies concerning preferential treatment of different traffic flows, for example.

Similarly, in the case of filter bubbles, we want to determine whether a content provider or intermediary (e.g., search engine, news aggregator, social network feed) is manipulating content for particular groups of users (e.g., showing only certain news articles to Americans).  Again, there is a notion of ground truth—either the content is being manipulated or it isn’t—but the interesting aspect here is not so much whether content is being manipulated (we all know that it is), but rather what the extent of that manipulation is.  Characterizing the extent of manipulation is difficult, however, because personalization is so pervasive on the Internet: everyone effectively sees content that is tailored to their circumstances, and there is no notion of a baseline that reflects what a set of search results or a page of recommended products might look like before the contents were tailored for a particular user.  In many cases, personalization has been so ingrained in data mining and search that even the algorithm designers are unable to characterize what “ground truth” content (i.e., without manipulation) might look like.

Relativism: measuring how different perspectives give rise to inconsistencies.  In cases where ground truth is difficult to measure or impossible to know, we can still ask questions about consistency.  For example, in the case of network neutrality, we can ask whether different groups of users experience comparable performance.  In the case of filter bubbles, we can ask whether different groups of users see similar content.  When inconsistencies arise, we can then attempt to attribute a cause to these inconsistencies by controlling for all factors except for the factor we believe might be the underlying cause for the inconsistency.  One might call this Internet relativism, in a way: We concede that either there is no absolute truth, or that the absolute truth is so difficult to obtain that we might as well not try to know it.  Instead, we can explore how differences in perspective  or “input signals” (e.g., demographic, geography) give rise to different outcomes and try to determine which input differences triggered the inconsistency.  We have applied this technique to the design of two real-world systems that address these two respective problem areas.  In both of these problems, we would love to know the underlying intention of the ISP or information intermediary (i.e., “Is the performance problem I’m seeing a result of preferential treatment?”, “(How) is Google, Netflix, or Amazon manipulating my results based on my demographic?”).

  • NANO: Network Access Neutrality Observatory.We developed NANO several years ago to characterize ISP discrimination for different classes of traffic flows.  In contrast to existing work in this area (e.g., Glasnost), which requires a hypothesis about the type of discrimination that is taking place, NANO operates without any a priori hypothesis about discrimination rules and simply looks for systematic deviation from “normal” behavior for a certain class of traffic (e.g., all traffic from a certain ISP, for a certain application, etc.).  The tricky aspect involved in this type of detection is that there is no notion of normal.  For example, ISP Y might also be performing similar type of discrimination, so there is no firm ground truth against which to compare.  Ideally, what we’d like to ask is “What would be the performance that this user see using ISP X vs. the performance they would see if they were not using ISP X?”  Unfortunately, there is no reasonable way to test the performance that a user would experience as a result of not using an ISP.  (This is in contrast to randomized treatment in clinical trials, where it makes sense to have a group of users who, say, are subject to a particular treatment or not.)  To address this problem, the best we could do to establish a baseline was to average the performance seen by all users from other ISPs and compare those statistics against the performance seen by a group of users for the ISP under test.
  • Bobble: Exposing inconsistent search results.  We recently developed Bobble to characterize the inconsistencies that exist in Web search results that users see, as a result of both personalization and geography.  Ideally, we would like to measure the extent of manipulation against some kind of baseline.  Unfortunately, however, the notion of a baseline is almost meaningless, since no Internet user is subject to such a baseline—even a user who has no search history may still see personalized results based on geography, time of day, device type, and other features, for example.  In this scenario, we established a baseline by comparing the search results of a signed-in user against a user with no search history, making our best attempt to hold all other factors constant.  We also performed the same experiment with users who were not signed in and had no search history, varying only geography.  Unlike NANO, in the case of Bobble, there is not even a notion of an “average” user; the best we can hope for are meaningful characterizations of inconsistencies.

Takeaways and general principles.  These two problems both involve an attempt to characterize an underlying phenomenon without any hope of observing “ground truth”.  In these cases, it seems that our best hope is to approximate a baseline and compare against that (as we did in NANO); failing that, we can at least characterize inconsistencies.  In any case, when looking for these inconsistencies, it is important to (1) enumerate all factors that could possibly introduce inconsistencies; and (2) hold those factors fixed, to the extent possible.  For example, in NANO, one can only compare a user against average performance for a group of users that have identical (or at least similar) characteristics for anything that could affect the outcome.  If, for example, browser type (or other features) might affect performance, then the performance of a user for an ISP “under test” must be compared against users with the same browser (or other features), with the ISP being the only differing feature that could possibly affect performance.  Similarly, in the case of Bobble, we must hold other factors like browser type and device type fixed when attempting to isolate the effects of geography or search history.  Enumerating all of these features that could introduce  inconsistencies is extremely challenging, and I am not aware of any good way to determine whether a list of such features is exhaustive.

I believe networking and security researchers will continue to encounter phenomena that they would like to measure, but where the nature of underlying phenomenon cannot be known with certainty.  I am curious as to whether others have encountered problems that call for Internet relativism, and whether it may time to develop sound experimental methods to characterize Internet relativism, rather than simply blindly clamoring for “ground truth” when none may even exist.