When the Skype network suffered a massive outage on August 16th, it got me
to thinking about chokepoints and lockin. Which you should also consider when
you’re rolling out VoIP for your shop.
VoIP is a peer-to-peer protocol, just like any IP traffic, so there shouldn’t
be chokepoints or lockin. Of course in the real world it’s always less than
ideal because vendors want to lock us in by any means except, it seems, better
service and prices, and someone somewhere is always wanting to raise fences
and dig moats. I believe that this is a counter-productive strategy, and that
openness benefits everyone, including those who must keep eyes on their bottom
lines. Open standards and open networks make the pie bigger, which is better
than fighting over a tiny pie. (Sorry for the horrible Dilbert-quality metaphors,
but I couldn’t think of anything else.)
What happened with Skype and why did it take so long to fix? A lot of people outside the United States might think this is a silly question, because in many countries they have not ever had the benefit of a first-rate reliable telephone network, so a downtime of 12-48 hours after four years of nearly stellar reliability doesn’t seem all that terrible. But this particular incident highlights a number of bad things that, in my nearly-humble opinion, you should not allow on your network.
Skype says the cause of the outage was this:
“The disruption was triggered by a massive restart of our users’ computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update.
“The high number of restarts affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.”
So in effect, a chain of circumstances created a Distributed Denial of Service
attack (DDoS). There are a number of problems that got us to this point.
First of all is the way Windows patches are applied. Patch Tuesday is silly.
Updates, especially security patches, should be released as soon as they are
ready. Then requiring a reboot to apply the patches is just plain 19th-century.
Real operating systems don’t require reboots for every little thing, but only
major events like kernel and hardware upgrades. But as fun as it is to blame
Windows, it’s really not fair to give it the sole blame it in this case. Because
the next question is why were all those Windows machines set to automatically
log in to Skype at bootup without any user intervention? Security 101 says don’t
allow unattended logins, and anyway what’s the point of logging in to remote
services like Skype when the user isn’t even present?
So now we come to chokepoints, which for Skype are their login servers. Skype
has always handled this particular issue very capably, so don’t think that a
single outage means their whole network is poo, because it isn’t.
I don’t know of any way to get around this: If a service provider requires
you to log in then you’re going to have a limited number of gateways into their
network. So then you need to ask yourself, what’s my backup plan when this service
goes down? For Skype users it was simply “do without.” Pick up their plain old-fashioned
telephones and carry on as best they could. Which I suppose isn’t so terrible,
but if your business relies on your Skype account that’s going to hurt.
Fonality handles this elegantly:Both PBXtra and trixbox Pro use the hybrid
hosting model that offloads the networking and systems management guff to Fonality’s
data center. If something happens and Fonality goes offline, your phone service
automatically falls back to the PSTN. So you’ll lose VoIP, but you won’t lose
all of your telephony and you won’t have to make the switchover yourself.
Redundancy and failover
A perennial problem for the computer network administrator is planning for failures.
How many belts and suspenders do you need? How many routers, servers, how many
different Internet service providers. With telephony it’s getting absurd—the
legacy PSTN network, cell phones, VoIP, and text messaging. Then there’s e-mail
and instant messaging, and I challenge you to count all the separate, incompatible
IM networks a person can collect. We need giant size business cards just to
hold all of our contact information, and personal hand trucks to hold all of
Which is all a bit of exaggeration, but not that much. These are all the things
the wise network administrator takes into consideration and tries to plan for.
My own approach is to invest in smaller quantities of quality hardware, rather
than throwing masses of leftover and inferior hardware into the mix. Even the
most die-hard do-it-yourselfer (like me) has to depend on outside service providers,
and since things always break, it’s smarter to plan for it. Which makes shopping
for good service providers very important, because a low-cost provider can cost
you more over the long run.
While we’re talking about costs, don’t forget to do your PSTN-VoIP comparisons. People get all excited about “free or dirt cheap long distance!” and don’t really compare the numbers. The popularity of VoIP has driven down the costs of most traditional phone network services, so don’t be shy about shopping around and getting some competitive bidding going.
The VoIP servers I’ve been covering here—Asterisk, Trixbox, and SipX—allow
you a tremendous amount of flexiblity in mixing and matching services and networks,
so you can knit together your best deals.
Next week we’ll return to our torture-testing of trixbox Pro, and see how it measures up in the real world.