Internet Censorship: Then and Now

I began working on Internet censorship nearly ten years ago, when Professors Hari Balakrishnan and David Karger talked about users who were behind the “Great Firewall of China” and their need to get more ready access to information.  In this post, I’ll talk about the state of censorship and censorship back then, how the landscape has changed, the lessons I have learned along the way, and my initial thoughts on future research in this area.

Censorship Then

Ten years ago, Internet use was exploding in the United States, so it was initially somewhat hard for me to comprehend that censorship and surveillance were taking place in other parts of the world, let alone what a pervasive problem censorship would become.  Intuition would suggest that the spread of Internet access would provide citizens with more access to information, not less.  In practice, however, the opposite can be true: the Internet gives a government a finite and fixed set of points from which they can monitor or restrict access.  The Berkman Center has a web site where they report on the complexity of the internal networks within a variety of countries.  Essentially, they are comparing the complexity of ISP interconnections within a number of countries: the more “rich” these Internet connections, the more difficult it is for a country to restrict, monitor, or block content.  Most remarkable are the ISP structures of countries like China, where most ISPs connect through a single backbone network (presumably where the blocking takes place); compared to Nigeria, for example, the Chinese network is much more like a hub and spoke, with all regional ISPs connecting through the ChinaNet Backbone (which is the parent of nearly 2/3 of all of the countries IP address space).

Nearly ten years ago, the Berkman Center published a nice report about the state of Internet censorship in China, exposing the extent of censorship in China and the determination of the government to develop more refined and sophisticated censorship techniques.  In response, people have developed techniques to try to circumvent censorship techniques.  Conceptually, every circumvention system works roughly as shown in this picture:

The helper has access to content outside the censorship firewall and can communicate with Alice, who is behind the firewall.  The helper’s job is to allow Alice and Bob to exchange content.  In practice, this helper might be a Web proxy (e.g., Anonymizer), a network of proxies (e.g., Tor), or, as we will see below, an intermediate drop site (e.g., Collage).

In response to ongoing censorship efforts at the time, we developed Infranet, a system to circumvent censorship firewalls.  The state of the art in circumventing censorship at the time (e.g., Anonymizer) were essentially glorified Web proxies: a user in a censored regime would connect to a cooperating proxy outside of the firewall, which would, in turn, fetch content for the user and return that content over an encrypted channel.  However, censors could discover and block such proxies, and simply connecting to a proxy like this could raise suspicion.  In other words, existing proxy-based systems lacked two important properties:

  • Robustness. The mechanism that citizens use to circumvent censorship should be robust to the censor’s attempt to disrupt, mangle, or block the communication entirely.  Most existing systems (even widely used anonymization tools like Tor) are not inherently robust, because censors can block entry and exit nodes.
  • Deniability. Users of an anti-censorship system could be subject to extreme sanctions or repercussions.  For example, last year, a Chinese blogger was stabbed; many believe the outspoken nature of his blog to have provoked the violence.  Due to the consequences of violating censorship regulations, users in countries such as China even practice what is known as “self-censorship”: pre-emptively avoiding the exchange of content that might be incriminating or otherwise subject to censorship.  Therefore, any censorship system must also be deniable: that is, the users of the system must be able to deny that they were even using the system in the first place.  Achieving this goal is more difficult on the Internet than with certain communications media (e.g., radio, television), and most existing tools for circumventing censorship of providing anonymity do not achieve deniability, either.

Infranet relies on a covert channel between the user behind a firewall and the censor outside of the firewall.  The main idea is to allow a user to “cloak” a request for some censored Web site in other seemingly innocuous Web traffic.  In the case of Infranet, the proxy outside of the firewall hosted a Web server itself.  A user would issue requests for content on that Web site, but the proxy would interpret those sequence of requests as a coded message that was actually requesting some other censored content.  Despite its improvements over existing technology, Infranet did not gain widespread adoption, for (I think) two reasons:

  1. Simple schemes worked. When we talked to Voice of America about the tool, they said that most people were happy with simple proxy-based schemes; of course, the proxies had to move continually, but by the time the censors found out the new locations of the proxies and managed to block access to them, the proxy had moved to a new location.  Infranet was the circumvention equivalent of pounding a thumbtack with a sledgehammer.
  2. It required too much effort. Most censorship or anonymization tools require “helpers” outside of the censorship firewall that the censored users can communicate with.  For example, someone might need to set up a machine that runs a secure Web proxy.  Running Infranet required philanthropic users to run an Apache Web server, patch it with special software, and then face the prospect that their legitimate content hosted on the site might be blocked as a result of trying to help.  All of this seems like too much to ask.

The lack of adoption was frustrating, and it seemed difficult to have real, measurable impact.  The research problems also seemed fuzzy, ill-defined, and unsolvable.

Censorship Now

Two important developments have occurred since that time, however, both of which give me more hope that this topic area both has interesting research questions and the potential for impact.

On the downside, censorship is becoming more pervasive. Many countries around the world have gone to råemarkable lengths to restrict access to content on the Internet.   According to the Open Net Initiative, twelve countries around the world have implemented some pervasive form of censorship.  Internet censorship has also played a significant role in political events, such as the Iranian elections.  A report by Freedom House last year also reported on the state of censorship, and found that censorship is now prevalent in nearly 60 countries around the world.  Internet censorship also matters more as increasingly more people use the Internet to communicate.

On the other hand, building circumvention tools is easier. One of the major problems with Infranet was that it required philanthropic individuals to host dedicated infrastructure.  Since ten years ago, however, “Web 2.0” has made it much easier for the average user to publish content on the Internet.  Users no longer have to maintain our own Web servers to host photos, videos, etc.  It occurred to me, then, that censorship circumvention technologies could also ride the Web 2.0 wave, using infrastructure in the cloud as the foundation for hiding information and building covert channels.  Sites such as Flickr that host “user generated content” appeared the perfect place to create “drop sites” for users to hide and exchange censored content.

Collage. Based on these observations, we designed Collage, which allows users to hide messages in photos that they post to user-generated content sites like Flickr and Twitter.  The tool allows message senders to hide messages in photos and tweets and upload them to respective user-generated content sites.  Its design has several advantages.  First, it does not require users to set up fixed infrastructure (e.g., Web servers).  Second, it uses erasure coding to “spread” any single message across multiple drop sites, making the system more robust to blocking than a proxy-based system.  Collage appeared at USENIX Security Symposium last Friday (paper here) and has appeared in the press recently.  Time will tell whether this tool sees more widespread adoption.

Lessons and Looking Forward

My experience with research in Internet censorship taught me an important lesson for research in general: continually reconsider old problems. An old problem that was once uninteresting or unsolvable might become tractable because of other, seemingly unrelated developments.  In the case of Collage, the advent of Web 2.0 allowed us to significantly advance the state of art over a system like Infranet.  It is worth repeatedly asking yourself about what bearing a particular development might have on any other problem, even if the two areas seem unrelated.  You might find the right-sized hammer for your nail in the most unlikely of places.

We still don’t understand very much about Internet censorship.  We are still trying to understand its extent.  We have even less understanding of how various circumvention technologies work in practice.  It’s even harder to try to “measure” the level of deniability or robustness that a censorship circumvention tool might provide.  Debugging is also difficult: When a certain circumvention technology fails, is the failure a bug, or a direct consequence of censorship?  Finally, getting the software into the hands of the people who need it and helping them get set up (“bootstrapping”) remains a challenging problem, particularly considering that any information that a normal user could get is also accessible to a censor.  Given this wide array of open questions—ranging from theory to practice, and from technology to policy—I believe we may be at the dawn of a new and exciting research area.

Tell Me a Story

Commencement time brings commencement speeches; one of my favorite commencement speeches is a speech by Robert Krulwich at Caltech in 2008, where he discusses the importance of storytelling in science.  His speech makes a case for talking about science to audiences that may not be well-versed experts in the topic being presented.   This speech should be required listening for any graduate student or researcher in science.

Krulwich begins the speech by putting the students in a hypothetical scenario where a non-technical friend or family member asks “What are you working on?” What would you think: Is it worth the effort to try to explain your work to the general public?  Do you care to be understood by average folks? His advice: When someone asks this question, even if it is hard to explain, give it a try.  Talking about science to non-scientists is a non-trivial undertaking.  And, it is an important undertaking, because the scientific version of things compete with other perhaps equally (or more) compelling stories.

As researchers, we are competing for human attention; we love to hear stories.  Storytelling is perhaps one of the most important—and one of the most under-taught—aspects of our discipline.  The narrative of a research writeup or talk can often determine whether the work is well-received—or even received, for that matter.  Some cynics may dismiss storytelling as “marketing”, “hype”, or “packaging”, but the fact of the matter is that packaging is important.  Certainly, research papers (or talks) cannot have merely style without substance, people are busy, and many people (reviewers, journalists, and even other people within your field) will not stick around for the punchline if the story is not compelling.  Of course, this advice applies well beyond the research community, but I will focus here on storytelling in research, and some things I have learned thus far in my experiences.

When I began working on network-level spam filtering, I was initially pretty surprised at how much attention the work was receiving.  In particular, I viewed our first paper on the topic as somewhat light in terms of results: there was no sound theory or strong results, for example.  But, the work was quickly picked up by the media, on multiple occasions.  I found myself talking to a lot of reporters about the work, and, as I repeatedly explained the work to reporters, I found myself getting better at telling the story of the work.  I was using analogies and metaphors to describe our techniques, and I got much better at setting the stage for the work.  I also realized what gave the work such broad appeal: everyone understands email spam, and the conceptual differences with our approach were very easy to explain.  Here is the story, in a nutshell:

“Approximately 95% of all email traffic is spam.  Conventional mail filters look at the contents of the message—words in the mail, for example, to distinguish spam from legitimate content.  Unfortunately, as spammers get more clever, they can evade these filtering techniques by changing the content of their messages.  In contrast, our approach looks at behavioral characteristics: rather than looking at the message itself or who sent it, look at how it was sent.  To understand this, think about telemarketer phone calls: you know when someone calls first thing in the morning or right during dinner that the call is most likely a telemarketer, simply because your friends or family are too considerate to call you at those times.  You know the call is unwanted and can dismiss it before you even answer the phone.  We take the same approach with email messages: we identify behavioral characteristics that allow a mail server to reject a message based on the initial contact attempt, before it even accepts or examines the message.  Our method filters spam with 99.9% accuracy, and network operators can deploy our techniques easily without modifications to existing protocols or infrastructure.”

It turns out that this message is relatively easy for the average human to understand; they can relate to this story because they can see what it has to do with their lives, and the approach is explained clearly, and in terms of things they already understand.  Even after this initial work was published, it took me years to refine the story, so that it could be expressed crisply.  Introductions to papers and talks should always be treated with similar care. One can think of the introduction to a paper as a synopsis of the entire story, with the paper itself being the “unabridged” version (i.e., it may include many details that only the most interested reader will pore over).

How does one tell a story that readers or listeners actually want to hear?  Unfortunately, there really is not a single silver bullet, and storytelling is certainly an art.  However, there are definitely certain key ingredients that I find tend to work well; in general, I find that good stories (and, in particular, good research stories) have many common elements.  Based on those common elements, here is some advice:

  • Have a beginning, middle, and an end. At the beginning, a research paper or talk should set the context for the work.  A reader or listener immediately wants to know why they should devote their time or attention to what you have to say.  Why is the problem being solved important and interesting?  Why is the problem challenging?  Why is the solution useful or beautiful?  Who can use the results, and how can they use them?  For example, in the above story on spam filtering, there is a beginning (“users get spam; it’s annoying, and current approaches don’t work perfectly”), a middle (“here’s a new and interesting approach”), and an end (“it works; people can use it easily”).
  • Use analogies and metaphors. People have a much easier time understanding a new concept if you can relate it to something they already understand.  For example, the above story uses telemarketing as an analogy for email spam; nearly everyone has experienced a rude awakening or disruption from a telemarketer, which makes the analogy easy to understand.  In some cases, it may be that the analogy is not perfect; in these cases, I find that it helps to use an analogy anyway and explain subtle differences later.
  • Use concrete examples. People like to see concrete examples because they are exciting and much easier to relate to.  It’s even better if the example can be surprising, or otherwise engaging.  For example, the above story gives a statistic about spam that is concrete, and some may even find surprising.  In a talk, I often augment this concrete example with a news clipping, a graph, or an interactive question (e.g., one can have people guess what fraction of email traffic is spam).
  • Write in the active voice. Consider “It was observed.” (passive) vs. “We saw.” (active).  The first is boring, indirect, and unclear: the reader (or listener) cannot even figure out who observed.  I find this writing style immensely frustrating for this reason.  My frustration generally comes to a boil when someone describes a system using primarily verbs in the passive voice (“The message was sent.”).  Passive voice makes it nearly impossible for the reader to figure out what is happening because the subject of the verb is unspecified.  Often, when I press students to turn their verbs into active voice, we find out that even they were unclear on what the subject of the verb should be (e.g., what part of the system takes a certain action).
  • Be as concise as possible, but not too concise. We’ve all complained about movies that “drag on too long” or a speech that “does not get to the point”.  Humans can be quite impatient, and, in the context of research papers, people want to know the punchline quickly, as well.  Research papers are not mystery novels; they should be interesting, but they should also convey findings clearly and efficiently.  Most of my time editing writing involves removing words and otherwise shortening paragraphs to streamline the story as much as possible.

A final point is to consider the audience.  Someone you meet in an elevator or hallway might be much less interested in the details of your work than someone listening to a conference talk or thesis defense.  For this reason, it’s important to have multiple versions of your story ready.  I call this a “multi-resolution elevator pitch”, because it’s a pitch where I can start with a high-level story and dive into details as necessary.  Having a multi-resolution elevator pitch ready also makes it much easier to convey your point to very busy people who may not have the time to stick around for more than 30 seconds.  If, however, you can hook them in the first 30 seconds, you may find that they stick around to hear the longer version of your story.

Show Me the Data

One of my friends recently pointed me to this post about network data. The author states that one of the things he will miss the most about working at Google is the access to the tremendous amount of data that the company collects.

Although I have not worked at Google and can only imagine the treasure trove their employees must have, I have also spent time with lots of sensitive data during my time at AT&T Research Labs.  At AT&T, we had—and researchers still presumably have—access to a font of data, ranging from router configurations to routing table dumps to traffic statistics of all kinds.  I found having direct access to this kind of data tremendously valuable: it allowed me to “get my hands dirty” and play with data as I explored interesting questions that might be hiding in the data itself.  During that summer, I developed a taste for working on real, operational problems.

Unfortunately, when one retreats to the ivory towers, one cannot bring the data along for the ride.  Sitting back at my desk at MIT, I realized there were a lot of problems with network configuration management and wanted to build tools to help network operators run their networks better.  One of these tools was the “router configuration checker” (rcc), which has been downloaded and used by hundreds of ISPs to check their routing configurations for various kinds of errors.  The road to developing this tool was tricky: it required knowing a lot about how network operators configure their networks, and more importantly direct access to network configurations on which to debug the tool.  I found myself in a catch-22 situation: I wanted to develop a tool that was useful for operators, but I needed operators to give me data to develop the tool in the first place.

My most useful mentor at this juncture was Randy Bush, a research-friendly operator who told me something along the following lines: Everyone wants data, but nobody knows what they’re going to do with it once they get it.  Help the operators solve a useful problem, and they will give you data.

This advice could not have been more sage.

I went to meetings of the North American Network Operators Group (NANOG) and talked about the basic checks I had managed to bootstrap into some scripts using data I had from MIT and a couple other smaller networks (basically, enough to test that the tool worked on Cisco and Juniper configurations).  At NANOG, I met a lot of operators who seemed interested in the tool and were willing to help—often they would not provide me with their configurations, but they would run the tool for me and tell me the output (and whether or not the output made sense).  Guy Tal was another person who I owe a lot of gratitude for his patience in this regard.  Sometimes, I got lucky and even got a hold of some configurations to stare at.

Before I knew it, I had a tool that could run on large Internet Service Provider (ISP) configurations and give operators meaningful information about their networks, and hundreds of ISPs were using the tool.  And, I think that when I gave my job talk, people from other areas may not have understood the details of “BGP”, or “route oscillations”, or “route hijacks”, but they certainly understood that ISPs were actually using the tool.

We applied the same approach when we started working on spam filtering.  We wrote an initial paper that studied the network-level behavior of spammers with some data we were able to collect at a local “spam trap” on the MIT campus (more on that project in a later post).  The visibility of that work (and its unique approach, which spawned a lot of follow-on work) allowed us to connect with people in industry who were working on spam filtering, had real problems that needed solving, and had data (and, equally importantly, expertise) to help us think about the problems and solutions more clearly.

In these projects (as well as other more recent ones), I see a pattern in how one can get access to “real data”, even in academia.  Roughly, here is some advice:

  • Have a clear, practical problem or question in mind. Do not simply ask for data.  Everyone asks for data.  A much more select set is actually capable of doing something useful with it.  Demonstrate that you have given some thought to questions you want to answer, and think about whether anyone else might be interested in those questions.  Importantly, think about whether the person you are asking for data might be interested in what you have to offer.
  • Be prepared to work with imperfect data. You may not get exactly the data you would like.  For example, the router configurations or traffic traces might be partially anonymized.  You may only get metadata about email messages, as opposed to full payloads.  (And so on.)  Your initial reaction might be to think that all is lost without the “perfect dataset”.  This is rarely the case!  Think about how you can either adjust your model, or adapt your approach (or even the question itself) with imperfect data.
  • Be prepared to operate blindly. In many cases, operators (or other researchers) cannot give you raw data that they have access to; often, data may be sensitive, or protected by non-disclosure agreements.  However, these people can sometimes run analysis on the data for you, if you are nice to them, and if you write the analysis code in a way that they can easily run your scripts.
  • Bring something to the table. This goes back to Randy Bush’s point. If you make yourself useful to operators (or others with data), they will want to work with you—if you are asking an interesting question or providing something useful, they might be just as interested in the answers as you are.

There is much more to say about networking research and data.  Sometimes it is simply not possible to get the data one needs to solve interesting research problems (e.g., pricing data is very difficult to obtain).  Still, I think as networking researchers, we should be first looking for interesting problems and then looking for data that can help us solve those problems; too often, we operate in reverse, like the drunk who looks for his keys under the lamppost because it is brighter where the light is shining.  I’ll say more about this in a later post.

Networking Meets Cloud Computing (Or, “How I Learned to Stop Worrying and Love GENI”)

If you build it, will they come? In Field of Dreams, Ray Kinsella is confronted in his cornfield by a whisper that says, “If you build it, he will come,” which Ray believes refers to building a baseball field in the middle of a cornfield that will play host to Shoeless Joe and members of the 1919 Black Sox.  Only Ray can see the players initially, leading others to tell him that he should simply rip out the baseball field and replant his corn crop.  Eventually, other people see the players, too, and decide that keeping the baseball field might not be such a bad idea after all.

I can’t help but wonder if  this scenario might have an analogy to the Global Environment for Network Innovations (GENI) effort, sponsored by the National Science Foundation.   The GENI project seeks to build a worldwide network testbed to allow Internet researchers to design and test new network architectures and protocols.  The project has many moving parts, and I won’t survey all of those here.  A salient feature of GENI, though, is that it funds infrastructure prototyping and development, but does not directly fund research on that infrastructure.   One of the most interesting challenges for me has been—and still is—how to couple projects that build infrastructure with projects that directly use that infrastructure to develop interesting new technologies and perform cutting-edge research.

Can prototyping spawn new research? This is, in its essence, the bet that I think GENI is placing: If we build a new experimental environment for networking innovation, the hope is that researchers will come use it.  Can this work? I think the answer is probably “yes”, but it is too soon to know the answer to this question in this context.  Instead, I would like to talk about how our GENI projects have spawned new research—and new educational material—here at Georgia Tech.

The Prototype: Connectivity for Virtual Networks. One of the the GENI-funded projects is called the “BGP Multiplexer” or, simply the “BGP Mux”.  If that sounds obscure, then perhaps you can already begin to understand the challenges we face. Simply put, the BGP Mux is like a proxy for Internet connectivity for virtual networks (BGP is the protocol that connects Internet Service Providers to one another).  The basic idea is that a developer or network researcher might build a virtual network (e.g., on the GENI testbed) and want to connect that network to the rest of the Internet, so that his or her experiment could attract real users.  You can read more about it on the GENI project Web page.

Some people are probably familiar with the concept of virtualization, or creating “virtual” resources (memory, servers, hardware, etc.) based on some shared physical substrate.  Virtual machines are now commonplace; virtual networks, however, are less so.  We started building a Virtual Network Infrastructure (VINI) in 2006.  The main motivation for VINI was to allow experimenters to build virtual networks on a shared physical testbed.  One of the big challenges was connecting these virtual networks to the rest of the Internet.  This is the problem that the BGP Mux solves.

Providing Internet connectivity to virtual networks is perhaps an interesting problem within the context of building a research testbed, but, in my view, it lacked broader research impact.  Effectively, we were building a “hammer” that was useful for building a testbed, but I wanted to find a “nail” that was solving a real problem, could be published, and could also be used in the classroom.  This was not easy.

The Research: Networking for Cloud Computing.  To broaden the applicability of what we had built, essentially we had to find a “nail” that might need fast, flexible way for setting up and tearing down Internet connections.   Cloud computing applications seemed like a natural fit: services on Amazon’s EC2, for example, might want to control inbound and outbound traffic with their customers.  They might want to do this for cost or performance reasons, for example.  Today, this is difficult.   When you rent servers in EC2, you have no control over how traffic comes over the Internet to reach those servers—if you want paths with less delay or otherwise better performance, you are out of luck.  Using the hammer that we had built with the BGP Mux, however, this was much easier: instead of solving a problem in terms of “virtual networks for researchers” (something only a small community might care about), we were solving the same problem, but in terms of users of EC2.   Essentially, the BGP Mux offers EC2 “tenants” the ability to control their own network routing.  This capability is now deployed in five locations and we are planning to expand its footprint.  A paper on this technology will appear at the USENIX Annual Technical Conference in June. We welcome any other networks that would like to help us out with this deployment (i.e., if you can offer us upstream connectivity at another location, we would like to talk to you!).

Education: Transit Portal in the Classroom. I’ve been teaching a course called “Next-Generation Networking”, a course on Future Internet Architectures that I plan to discuss at more length on this blog at some point.  Typical networking courses are not as “hands on” as I would prefer: I, for one, graduated from college without ever even seeing a router in person, let alone configuring one.  I wanted networking students to have more “street cred”—they should be able to say, for example, that they’ve configured routers on a real, running network that’s connected to the Internet and routing real traffic.  This sounds like lunacy.  Who would think that students could play “network operator for a day”?  It just sounds too dangerous to have students play around on live networks with real equipment.   But with virtual networking and the BGP Mux, it’s possible.  I recently assigned a project in this course that had students build virtual networks, connect them to the Internet, and control inbound and outbound traffic using real routing protocols.  Seeing students configure networks and “speak BGP with the rest of the Internet” was one of my proudest days in the classroom.  You can see the assignment and videos of these demos if you’d like to learn more.

Prototyping and research.  Will the researchers come? Our own GENI prototyping efforts have been an exercise in “working backwards” from solution to networking research problem.  I have found that exercise rewarding, if somewhat counter to my usual way of thinking about research (i.e., seek out the important problems first, then find the right hammer).  I think now the larger community will face this challenge, on a much broader scale: Once we have GENI, what will we do with it?  Some areas that seem promising include deployment of secure network protocols and services (our current protocols are known to be insecure), better support for mobility (the current Internet does not support mobility very well), new network configuration paradigms (networks of all kinds, from the transit backbone to the home, are much too hard to configure), and new ways of pricing and provisioning networks (today’s markets for Internet connectivity are far too rigid).  We have  just finished work on a large NSF proposal on Future Internet Architectures that I think will be able to make use of the infrastructure that we and others are building; in the coming months, I think we’ll have much more to say (and much more to see) on this topic.

A New Window for Networking

It’s an exciting time to be working in communications networks.  Opportunities abound for innovation and impact, in areas ranging from applications, to network operations and management, to network security, and even to the infrastructure and protocols itself.

When I was interviewing for jobs as networking faculty about five years ago, one of the most common questions I heard was, “How do you hope to effect any impact as a researcher when the major router vendors and standards bodies effectively hold the cards to innovation?”   I have always had a taste for solving practical problems with an eye towards fundamentals.  My dissertation work, for example, was on deriving correctness properties for Internet routing, and developing a tool, router configuration checker (rcc), to help network operators check that their routing configurations actually satisfied those properties.  The theoretical aspects of the work were fun, but the real impact was that people could actually use the tool; I still get regular requests for rcc today from both operators and various networking companies who want to perform route prediction.

This question about impact cut right to the core of what I think was a crisis of confidence for the field.  Much of the research seemed to be focused on performance tuning and protocol tweaks.  Big architectural ideas were confined to paper design, because there was simply no way to evaluate them.  Short of interacting directly with operators and developing tools that they could use, it seemed to me that truly bringing about innovation was rather difficult.

Much has happened in five years, however; there are seemingly countless exciting opportunities in networking; there are more exciting problems than there is time to work on them.  There are many areas where exciting innovation is happening, and it is becoming feasible to effect fundamental change to the network’s architecture and protocols.   I think several trends are responsible for this wealth of new opportunities:

  • Network security has come to the forefront.  The rise of spam, botnets, phishing, and cybercrime over the past few years cannot be ignored.  By some estimates, as much as 95% of all email is spam.  In a Global Survey by Deloitte, nearly half of the companies surveyed reported an internal security breach, a third of which resulted from viruses or malware.
  • Enterprise, campus, and data-center networks are facing a wealth of new problems, ranging from access control to rate limiting and prioritization to performance troubleshooting.  I interact regularly with the Georgia Tech campus network operators, as a source of inspiration for problems to study.  One of my main takeaways from that interaction is that today’s network configuration is complex, baroque, and low-level—far too much so for the high-level tasks that they wish to perform.  This makes these networks difficult to evolve and debug.
  • Network infrastructure is becoming increasingly flexible, agile, and programmable.  It used to be the case that network devices were closed, and difficult to modify aside from the configuration parameters they exposed.  Recent developments, however, are changing the game.  The OpenFlow project at Stanford University makes it much more tenable to write software programs to control the entire network at a higher level of abstraction, and provides more direct control over network behavior, thus potentially providing operators easier ways to control and debug their network.
  • Networking is increasingly coming to blows with policy.  The collision of networking and policy is certainly not new, but it is increasingly coming to the forefront, with front-page items such as network neutrality and Internet censorship.  As the two areas continue on this crash course, it is certainly worth thinking about the respective roles that policy and technology play with respect to each of these problems.
  • Networking increasingly entails direct interaction with people of varied technical backgrounds.  It used to be that a “home network” consisted of a computer and a modem.  Now, home networks comprise a wide range of devices, including media servers, game consoles, music streaming appliances, and so forth.  The increasing complexity of these networks makes each and every one of us a network operator, whether we like it or not.  The need to make networks simpler, more secure, and easier to manage has never been more acute.

The networking field continues to face new problems, which also opens the field to “hammers” from a variety of different areas, ranging from economics to machine learning to human-computer interaction.  One of my colleagues often says that networking is a domain that draws on many disciplines.  One of the fun things about the field is that it allows one to learn a little about a lot of other disciplines as well.  I have had a lot of fun—and learned a lot—working at many of these boundaries: machine learning, economics, architecture, security, and signal processing, to name a few.

The theme of my blog will be problems and topics that relate to network management, operations, security, and architecture.  I plan to write about my own (and my students’) research, current events as they relate to networking, and interesting problem areas and solutions that draw on multiple disciplines.  I will start in the next few posts by touching on each of the bullets above.