There are few things worse than being on a highway somewhere, cruising along at the posted speed limit, then seeing the tell-tale flashes of red brake lights: There’s traffic congestion ahead.
Suddenly, you’re trapped with hundreds of other cars and trucks, on a highway with no nearby exits, crawling along at walking speed (if you’re lucky), and probably little idea of what’s causing the delay. In the city, this problem is mitigated somewhat by the fact that there are more exits enabling you to get off the road and onto city streets that can route you around the problem. Larger cities will have traffic reports on the radio that will alert you to potential problems, so you may be able to avoid route altogether–if the radio report is timely.
Things are getting better. If I am driving to Chicago, for instance, a passenger can look up a Google Map of the road ahead and tell me if it’s green, yellow, or red in time for me to shift to an alternate route. Provided I have the presence of mind to have them look–usually I don’t start caring about the traffic ahead until there’s congestion. In the absence of a problem, I will happily toodle along only caring about my immediate surroundings until the next problem.
This attitude about vehicular traffic is often mirrored in our approach to network traffic. When things are humming along, we often ignore the health of the network, reacting only to problems that, for whatever reason, we didn’t anticipate. In other words, we cruise along without complaint on our high-bandwidth Internet superhighways until there’s a traffic jam.
And, like concrete roads, our solution to the problem has been fairly singular in its approach: we add more lanes in the form of bandwidth. Got a slow network? Add more pipe, that’ll take care of the problem, right?
One network engineer says otherwise. Bandwidth is only part of the solution, according to Jim Gettys, one of the creators of the X Window system: Network latency is the other half of the solution. And, Gettys is saying, because we have thrown so much technology at improving throughput and bandwidth for the sake of an exploding rate of growth of consumer Internet traffic, we have overridden the basic congestion avoidance protocols that could reduce latency and prevent Internet traffic jams in the first place.
Gettys has coined the term for this problem: bufferbloat.
What is bufferbloat?
You may have heard quite a bit about bufferbloat in recent months, as Gettys’ writings about the problem have gotten a lot of people worked up about the continued health of the Internet.
The idea behind bufferbloat is this: our operating systems and routers have large, scaling TCP network buffers that, by design, “trap” large amount of packets in order to maintain the maximum possible throughput (the actual number of packets that get from point A to point B). Throughput, it has been reasoned, is the best value for a network to have: The fewer lost packets, the better the integrity of the data coming in. And, as files have gotten larger — and larger files more numerous and traffic far more busy — more buffers have been added to hardware and software in order to handle the flow in a smoother manner.
But in the obsessive quest for reducing packet loss and smoothing out traffic, Getty argues, another bad situation has been made worse.
Even though there are a lot of buffers on a given network, there is a chance that one (or more) of those buffers will become full. If that full buffer happens to be near a known (or unknown) network bottleneck and that bottleneck gets saturated, suddenly you have packets running smack dab into a queue, waiting for the buffer in question to empty out and deliver said packets to the next hop in the network. Packets are getting lost, so TCP (and UDP) will try to work around the problem and deliver the information via another route. But if that full buffer is on the last network hop or two before its destination (or just outside the source of the packets), then there is very little the packets can do but sit through the buffer queue and wait to get passed on.
Buffers in situations like this actually defeat congestion avoidance protocols, because they’re impossible to get around.
Like running into a traffic jam on a highway with no exits around.
It’s important to note that the problem of bufferbloat is not just because there are more buffers around. It’s that there are so many buffers on the network that the odds of one being near a network bottleneck have risen very significantly.
Recently, the situation has been exacerbated by the proliferation of wireless networks, which are almost automatically going to be a bottleneck, given that transmission speeds are usually slower than over-the-wire. Another problem, which Gettys has repeatedly pointed out, is the retirement of Windows XP, which capped buffer space at the OS level to 64K. Newer versions of Windows — and OS X and Linux — have TCP window scaling, which enables the operating system to increase the buffer size as needed.
Now think about bufferbloat and picture a variable-sized buffer right next to a wireless home network. The problem is not hard to imagine.
The solution, at least in a broader sense, will be.
What makes this problem even more difficult to solve is Gettys’ assertion that buffers will only be noticeable in a network path where they are near a saturated bottleneck. So you may have a buffer causing problems one day and be perfectly fine (and invisible) the next.
There are, as Gettys himself points out, various ways to engineer around the problem. Gamer routers and other end-to-end congestion avoidance solutions can be applied, usually to short-term gains. But such solutions may only work for a short time, because while they benefit the few using them, such congestion-avoidance systems can disrupt “normal” traffic, further damaging the network ecosystem. For Gettys, it’s an application of gaming theory: short-term gain may not bring a long-term win.
Gettys’ blog and the new bufferbloat website have become a touchstone for network engineers working on solving this problem, but it will be tricky. Active queue management (AQM) is pointed to as a good solution, but even though AQM has been around a while it is not widely deployed, and when it is deployed, it may not be properly configured. Wider deployment and education will help.
Traffic classification may also help in the short-term. Gettys has plenty of advice on his sites for tweaking Quality of Service settings on routers to try to decrease latency.
In the end, it may take an entirely new kind of protocol for packet delivery to solve this problem. Hardware and software deployed now is abusing TCP/UDP congestion avoidance mightily, and if that can’t be put in check, new protocols may have to be the answer.
Build smarter cars, in other words, and maybe you’ll avoid more traffic jams.