Internet Censorship: Then and Now

I began working on Internet censorship nearly ten years ago, when Professors Hari Balakrishnan and David Karger talked about users who were behind the “Great Firewall of China” and their need to get more ready access to information.  In this post, I’ll talk about the state of censorship and censorship back then, how the landscape has changed, the lessons I have learned along the way, and my initial thoughts on future research in this area.

Censorship Then

Ten years ago, Internet use was exploding in the United States, so it was initially somewhat hard for me to comprehend that censorship and surveillance were taking place in other parts of the world, let alone what a pervasive problem censorship would become.  Intuition would suggest that the spread of Internet access would provide citizens with more access to information, not less.  In practice, however, the opposite can be true: the Internet gives a government a finite and fixed set of points from which they can monitor or restrict access.  The Berkman Center has a web site where they report on the complexity of the internal networks within a variety of countries.  Essentially, they are comparing the complexity of ISP interconnections within a number of countries: the more “rich” these Internet connections, the more difficult it is for a country to restrict, monitor, or block content.  Most remarkable are the ISP structures of countries like China, where most ISPs connect through a single backbone network (presumably where the blocking takes place); compared to Nigeria, for example, the Chinese network is much more like a hub and spoke, with all regional ISPs connecting through the ChinaNet Backbone (which is the parent of nearly 2/3 of all of the countries IP address space).

Nearly ten years ago, the Berkman Center published a nice report about the state of Internet censorship in China, exposing the extent of censorship in China and the determination of the government to develop more refined and sophisticated censorship techniques.  In response, people have developed techniques to try to circumvent censorship techniques.  Conceptually, every circumvention system works roughly as shown in this picture:

The helper has access to content outside the censorship firewall and can communicate with Alice, who is behind the firewall.  The helper’s job is to allow Alice and Bob to exchange content.  In practice, this helper might be a Web proxy (e.g., Anonymizer), a network of proxies (e.g., Tor), or, as we will see below, an intermediate drop site (e.g., Collage).

In response to ongoing censorship efforts at the time, we developed Infranet, a system to circumvent censorship firewalls.  The state of the art in circumventing censorship at the time (e.g., Anonymizer) were essentially glorified Web proxies: a user in a censored regime would connect to a cooperating proxy outside of the firewall, which would, in turn, fetch content for the user and return that content over an encrypted channel.  However, censors could discover and block such proxies, and simply connecting to a proxy like this could raise suspicion.  In other words, existing proxy-based systems lacked two important properties:

  • Robustness. The mechanism that citizens use to circumvent censorship should be robust to the censor’s attempt to disrupt, mangle, or block the communication entirely.  Most existing systems (even widely used anonymization tools like Tor) are not inherently robust, because censors can block entry and exit nodes.
  • Deniability. Users of an anti-censorship system could be subject to extreme sanctions or repercussions.  For example, last year, a Chinese blogger was stabbed; many believe the outspoken nature of his blog to have provoked the violence.  Due to the consequences of violating censorship regulations, users in countries such as China even practice what is known as “self-censorship”: pre-emptively avoiding the exchange of content that might be incriminating or otherwise subject to censorship.  Therefore, any censorship system must also be deniable: that is, the users of the system must be able to deny that they were even using the system in the first place.  Achieving this goal is more difficult on the Internet than with certain communications media (e.g., radio, television), and most existing tools for circumventing censorship of providing anonymity do not achieve deniability, either.

Infranet relies on a covert channel between the user behind a firewall and the censor outside of the firewall.  The main idea is to allow a user to “cloak” a request for some censored Web site in other seemingly innocuous Web traffic.  In the case of Infranet, the proxy outside of the firewall hosted a Web server itself.  A user would issue requests for content on that Web site, but the proxy would interpret those sequence of requests as a coded message that was actually requesting some other censored content.  Despite its improvements over existing technology, Infranet did not gain widespread adoption, for (I think) two reasons:

  1. Simple schemes worked. When we talked to Voice of America about the tool, they said that most people were happy with simple proxy-based schemes; of course, the proxies had to move continually, but by the time the censors found out the new locations of the proxies and managed to block access to them, the proxy had moved to a new location.  Infranet was the circumvention equivalent of pounding a thumbtack with a sledgehammer.
  2. It required too much effort. Most censorship or anonymization tools require “helpers” outside of the censorship firewall that the censored users can communicate with.  For example, someone might need to set up a machine that runs a secure Web proxy.  Running Infranet required philanthropic users to run an Apache Web server, patch it with special software, and then face the prospect that their legitimate content hosted on the site might be blocked as a result of trying to help.  All of this seems like too much to ask.

The lack of adoption was frustrating, and it seemed difficult to have real, measurable impact.  The research problems also seemed fuzzy, ill-defined, and unsolvable.

Censorship Now

Two important developments have occurred since that time, however, both of which give me more hope that this topic area both has interesting research questions and the potential for impact.

On the downside, censorship is becoming more pervasive. Many countries around the world have gone to råemarkable lengths to restrict access to content on the Internet.   According to the Open Net Initiative, twelve countries around the world have implemented some pervasive form of censorship.  Internet censorship has also played a significant role in political events, such as the Iranian elections.  A report by Freedom House last year also reported on the state of censorship, and found that censorship is now prevalent in nearly 60 countries around the world.  Internet censorship also matters more as increasingly more people use the Internet to communicate.

On the other hand, building circumvention tools is easier. One of the major problems with Infranet was that it required philanthropic individuals to host dedicated infrastructure.  Since ten years ago, however, “Web 2.0” has made it much easier for the average user to publish content on the Internet.  Users no longer have to maintain our own Web servers to host photos, videos, etc.  It occurred to me, then, that censorship circumvention technologies could also ride the Web 2.0 wave, using infrastructure in the cloud as the foundation for hiding information and building covert channels.  Sites such as Flickr that host “user generated content” appeared the perfect place to create “drop sites” for users to hide and exchange censored content.

Collage. Based on these observations, we designed Collage, which allows users to hide messages in photos that they post to user-generated content sites like Flickr and Twitter.  The tool allows message senders to hide messages in photos and tweets and upload them to respective user-generated content sites.  Its design has several advantages.  First, it does not require users to set up fixed infrastructure (e.g., Web servers).  Second, it uses erasure coding to “spread” any single message across multiple drop sites, making the system more robust to blocking than a proxy-based system.  Collage appeared at USENIX Security Symposium last Friday (paper here) and has appeared in the press recently.  Time will tell whether this tool sees more widespread adoption.

Lessons and Looking Forward

My experience with research in Internet censorship taught me an important lesson for research in general: continually reconsider old problems. An old problem that was once uninteresting or unsolvable might become tractable because of other, seemingly unrelated developments.  In the case of Collage, the advent of Web 2.0 allowed us to significantly advance the state of art over a system like Infranet.  It is worth repeatedly asking yourself about what bearing a particular development might have on any other problem, even if the two areas seem unrelated.  You might find the right-sized hammer for your nail in the most unlikely of places.

We still don’t understand very much about Internet censorship.  We are still trying to understand its extent.  We have even less understanding of how various circumvention technologies work in practice.  It’s even harder to try to “measure” the level of deniability or robustness that a censorship circumvention tool might provide.  Debugging is also difficult: When a certain circumvention technology fails, is the failure a bug, or a direct consequence of censorship?  Finally, getting the software into the hands of the people who need it and helping them get set up (“bootstrapping”) remains a challenging problem, particularly considering that any information that a normal user could get is also accessible to a censor.  Given this wide array of open questions—ranging from theory to practice, and from technology to policy—I believe we may be at the dawn of a new and exciting research area.