Stopping Spam at the Gateway

By Steven J. Vaughan-Nichols | Oct 2, 2003 | Print this Page
http://www.enterprisenetworkingplanet.com/netsysm/article.php/3086521/Stopping-Spam-at-the-Gateway.htm

I hate spam. You hate spam. We all hate spam. But none of us hate spam as much as ISPs and business network administrators do. Alexis Rosen, president and co-owner of Public Access Networks, which runs Panix, one of the oldest ISPs, concedes that while spam may "not be as bad as Adolph Hitler, it is morally evil."

Well, that's clear enough. Why such strong feelings? Rosen explains that spam "chews up a lot of bandwidth and disk space," and the non-stop disk I/O sucks down system resources and significantly stresses the mail server. And why exactly is this so annoying? Because it directly interferes with the ability to perform as an ISP, and that, in turn, directly – and negatively – affects the bottom line. This certainly isn't just Panix's problem; all ISPs and corporate networks face it.

So what can you do about it?

Stopping Spam at the Gateway

There are four basic ways you can try to block spam at the gateway: blacklists, whitelists, rules-based filtering, and Bayesian filters. None of them is perfect, and none of them ever will be perfect. All of them working together will never be perfect either, but they are much more effective than doing nothing at all.

The fundamental problem with anti-spam protection, as David Ferris, president of leading e-mail researcher Ferris Research, says, is that "the ideal goal is 100% effectiveness with 0% false positives — an impossible ideal." Still, "most people will find high false positive rates of the order of one in 1,000 quite acceptable." Unfortunately, the very, very best anti-spam programs, when set to stop the most possible spam, average one false positive in a hundred.

Still, for the sake of end users, not to mention the workload on your mail servers and network bandwidth, a network engineer must do the best he or she can, so let's take a look at each of the spam-blocking methods.

Blacklists

The idea is simple. Determine the domain names or IP addresses of known spammers and their ISPs, and then block them. Typically, you subscribe to a blacklist listing and then use it at your gateway to refuse any mail traffic (SMTP or POP) from the spammers. Unfortunately, blacklists can also block perfectly fine users that happen to have the bad luck of being at the same ISP, or simply having an IP in the same IP address range, as a known spammer.

Worse still, blacklists are as subject to human error as any such listing, and many users or their e-mail systems are unfairly tarred with a blacklist. Adding insult to injury, getting off some blacklists can be almost impossible for ISPs or individual owners.

SpamCop, for example, is infamous for being overaggressive in blocking possible spam sites. Another problem is that when a spammer can change his e-mail address faster than you can change your underpants, the overall effectiveness of blacklisting drops enormously. For example, Giga reported in "MAPS Realtime Blackhole List Under Fire" that even the well-respected Mail Abuse Prevention System/Realtime Blackhole List RBL (MAPS/RBL) snags only 25% of spam and can block up to 34% of good mail.

That said, careful use of blacklists can still be helpful from keeping spam from ever getting past your network perimeter. The Spamhaus Project, for example, has a reputation for maintaining accurate and up-to-date spammer lists, and the Open Relay Database remains useful for identifying unsecured mail servers that can easily be used for spamming.

Whitelists

Whitelists sound like a good idea. Users simply refuse to receive mail from anyone unless they've first approved the specific message or the sender. This works in two ways. In the first method, users simply block any message from anyone not on their approved list, while in the second form, software automatically replies with a verification message to emails sent from unknown addresses. These messages usually require the sender to send a message back confirming that there is in fact a real person on the other end of the Internet

Both forms of whitelists have two problems in common — they're cumbersome and they don't always work. For example, if a user likes getting mail from Amazon.com or an e-mail list, he or she must set up a specific rule to allow this. As another example, if a previously approved friend moves to a different e-mail address, the user must update his whitelist with the friend's new address or risk not receiving the friend's emails.

The list of negatives goes on and on. Whitelists sound like a good idea in theory, but they're too much of a pain for most users to be worth considering. Worse still, from an ISP's viewpoint, they're very cumbersome since they can generate tons of mail asking spammers for response messages, which is likely to only cause more spam.

Page 2: Rule-based filters

Rule-based filters

The idea of rules-based filtering is very simple. These filters use many spam identification tests on the mail headers and body text to identify spam. In this method, the software looks for terms like "SEX" or "Hair Growth" and then deletes them at the mail server.

The problem to this approach is that it's always a step or two behind the spammers. For instance, we know that a message with the subject of "F R E E V I A G A R A" is spam, but a ruled-based program might miss it because of the spaces between the letters. The rules-based approach is a good one, but keeping the rules accurate and up-to-the-minute is a never-ending job. Another problem is that the more comprehensive a rules-based program gets, the slower it will run.

Make no mistake about it, trainable rule-based filters are an excellent technique. But they're condemned to always be at least one step behind, and they come with a built-in, eternally growing performance hit.

Bayesian filters

At first glance Bayesian filtering appears to be a lot like rules filtering, but instead of starting with preset rules, Bayesian filters, with a user's or administrator's help, learn to tell the difference between spam and good mail. This is expressed in terms of a probability, and so after a few hundred messages, a good Bayesian filter will automatically recognize that the odds are seriously against any message with a subject of 'sex' with the HTML coding for bright red being anything other than spam.

Because they're simple to program and highly accurate – success rates of 98% are not uncommon – Bayesian filters have become the hottest anti-spam technology.

At the Gateway

There are more than a dozen commercial anti-spam programs available, including Brightmail, Cloudmark Authority, CipherTrust IronMail, Trend Micro, and Tumbleweed. All these companies use several, if not all, of the anti-spam methods identified above to try to build the perfect anti-spam program.

Still, while they're all trying to get there, none of the tools is anywhere close to acheiving perfection yet. As a result, it's important to obtain evaluation copies and first test them with your users and on your network before being able to make an informed choice.

Many ISPs and companies build their own solutions. Of these, most are built on the foundation of the procmail Unix mail processing utility and SpamAssassin, a powerful Unix-based, open source mail filtering program.

SpamAssassin isn't just for Unix and Linux shops, though. There are many versions available, including Network Associate's McAfee System Protection SpamKiller for Microsoft Exchange Small Business for Exchange 2000. There are also a variety of other commercial and open source programs based on SpamAssassin that will work in concert with almost any mail server.

None of these anti-spam programs, however, is that fast. Most network administrators find that these programs require their own servers for effective mail throughput. Other administrators use outsourced anti-spam services such as those provided by Postini and MessageLabs.

If you do elect to use your own in-house server, it needs fast connections to both your Internet gateway and the e-mail server. I'd recommend Fast Ethernet at a minimum, and if you have more than 500 user mailboxes, gigabit Ethernet for inter-server connections should be seriously considered.

The machines themselves should have ample memory and storage capacity — at least 512MB of RAM and fast 120GB+ hard drives. System speed, while important, isn't as critical as memory and disk space. That's because when you boil spam-protection down to its basics, it comes down to lots and lots of string comparisons. Such procedures always tend to be processor light but memory intensive. Finally, these machines should have no other jobs except spam-bashing.

If possible, as Ferris recommends, end users should have direct access to spam messages. You may be sure a given message is spam, and the anti-spam tool may be certain it's spam, but only the user can tell if it really is spam. If the user has to go through a help desk to get at the message, he's not going to be a happy user. Some server programs, like ActiveState's PureMessage, already enable users to get directly at their 'spam' mail.

Does this sound like building in-house server anti-spam protection will be a lot of trouble or will be quite expensive if you outsource it? You're right — it will very likely be one or the other.

Is it worth it? You tell me? Are your users sick of spam? Are you tired of having large chunks of your Internet bandwidth taken up by spam? Are you tired of watching your mail servers' hard drives glow from constant use? If your answer is yes to two or more of those questions, it's time to add anti-spam services to your network.

Story courtesy of EITPlanet

» See All Articles by Columnist Steven J. Vaughan-Nichols