Filter The Web With squidGuard

Putting up a fence to keep out the bad parts of the Internet is one of those
thankless, heroically crazy jobs required of network administrators. It’s not
like TV, with well-defined channels to manage. Context is everything, and no
one has yet written a filter that can differentiate between a site with health
and medical information, and a redhot triple X porn site that uses similar
terminology. Think of how many words are double entendres- a person can hardly
say anything anymore. Just naming fried chicken parts can get a page blocked.

Even more difficult are Web filters that try to screen for ‘bad attitudes’:
hate, intolerance, racism- lotsa luck. No regular expression can differentiate
between a news report of a hate crime, and a site that promotes such.

Commercial Web filtering software is expensive and secretive. With the exception
of Net Nanny, blocked URLs and keywords are hidden underneath a layer of
encryption- the user is not allowed to review them. You may never know what
you’re missing. Server-level licensing is generally per-seat, and requires a
separate proxy server, adding to the cost. squidGuard is free, completely open
right down to the source code, and runs on the free proxy server Squid, on
Linux or *BSD, also free.

squidGuard has no hidden agenda. The default blacklists, printed in plain text,
contain a disclaimer printed right at the top. For example, from
/var/squidGuard/blacklists/drugs/urls:

# !!! WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING !!!
# Don't blame us if there are mistakes, but please report errors with
# This list is entirely a product of a dumb robot (squidGuardRobot-2.3.6).
# This list was compiled from 48 link sources and 2876 links,
# This list was compiled in 0:11:02 on 2002.01.26 00:35:41.
# We strongly recommend that you review the lists before using them!
# of which 2473 tested successfully.
# the online tool at http://www.squidguard.org/blacklist/

If you are truly suspicious, which is an admirable trait, view the associated
.db file in a hex editor to see if the text file is telling the truth. See? No
secrets.

Installing
First you need Squid, the excellent Unix Web proxy cache. Squid comes in all
major Linux distributions, or get it from squid.org. RPMs are on the Uptime RPM
Archive. Squid must be configured and running before squidGuard will work.

The easy way to get squidGuard up and running is to install from RPM. Two good
RPMs exist: one by Oliver Pitzeier, on the Uptime RPM Archive, and one from the
excellent Eric Harrison of the K-12 Linux Terminal Server project. I’ve
installed and run both of them on Red Hat 7.2 and 7.3. Warning: they have the
same name, squidGuard-1.2.0-3.i386.rpm, but there are significant differences
between the two RPMs. Of course building from source guarantees it will work on
any system.

squidGuard’s installation page
says version 2.x of the Berkeley DB library is required. However, the changelog
reports that support for version 3.2 was added in March 2001, so it’s safe to
say the installation page has not been updated in a while. Both RPMs call for
version 3.2, on your system it appears as libdb-3.2.so. No problem with having
both if you want to cover all the bases, squidGuard will find the correct one
at installation.

Home User, Business User
squidGuard is equally useful at home or in the workplace. Rather than installing
standalone products on every machine on your home network, head ’em off at the
pass. Some of squidGuard’s nicer features:

  • blazingly fast
  • fine-grained controls: configure individual users and groups
  • redirects to the URLs of your choice
  • filter on URLs or domain names
  • block banners (redirect to empty .png)
  • define access rules by time of day and date
  • define access rules for different user groups

It does not filter on page content, or on embedded scripting languages like
JavaScript or VBscript.

Configuration
It is helpful to download both RPMs just to examine the configuration files. #
comments out a line, {} enclose a group declaration, – defines a range. Don’t
use reserved words, see the documentation for a list. There’s some
inconsistencies between where the documentation says files should be, and where
they actually exist on your system. Both RPMs put documentation in
/usr/share/doc/squidGuard, and the main configuration file is in
/etc/squid/squidGuard.conf. Building from source puts files where the docs on
squidguard.org says they should be.

It is best to explicitly declare even the defaults, for the sake of us feeble
humans, as this example of squidGuard.conf shows:

logdir /var/log/squidGuard #defines where logfiles are
dbhome /var/squidGuard/blacklists/ #defines where blacklists are

One approach is to accept all of the default blacklists, and refine them. The
other is to start from a clean slate, and add restrictions as you think them
up. Here is the absolute minimum config file:

logdir /var/log/squidGuard  

acl {
   default {
   pass all
   }
}

This restricts nothing, it is like not using squidGuard at all. acl = access
control list.

To create a new database file of blocked URLs or domains, or a file of allowed sites, create first a plain
text file containing your list. Use the same format as the default squidGuard
text lists: one item per line, plain ASCII text. To convert it to a .db file,
run this command:

# squidGuard -C filename

That’s what the Berkeley DB library is for.

Let’s call our blacklist files verboten/domains and verboten/urls.

Now edit squidGuard.conf:

logdir /var/log/squidGuard
dbhome /var/squidGuard/blacklists/

dest blockedstuff {
   log verboten
   domainlist verboten/domains
   urllist verboten/urls
   redirect www.bratgrrl.com
}

acl {
   default {
    pass !verboten all
    redirect 302:http://www.bratgrrl.com
   }
  }

dest = defines a category. squidGuard wouldn’t care if you lumped everything
into one big file, organizing into categories makes life easier for the
overworked admin. ! means don’t pass. Redirect sends requests for blocked sites
or URLs to the page of your choice. bratgrrl.com is a fine choice, though a
custom page may be more suitable. Some businesses like to use scary warnings in
large type:

WARNING!!! THIS SITE IS RESTRICTED! YOU ARE BAD AND WILL GO TO HECK!

Use your imagination, for home use post a picture of your kids grounded for
life.

# killall -HUP squid

tells Squid to re-read its configuration file, /etc/squid/squidGuard.conf, and
effect the changes.

Eric Harrison’s version automatically updates its blacklists nightly, see the
MESD page for details.

As with all things Linux, the more you know about scripting, the more things
make sense, and the more power at your disposal. squidGuard also supports
regular expressions, here’s a sample for blocking ads:

(/ads/|/ad/|/banner/|/sponsor/|/event.ng/|/Advertisement/|adverts/)

The squidGuard documentation is quite thorough, hopefully this will get you past
the more common pitfalls. If you’d like to see a Squid tutorial, drop me a
line, it’s a great tool.

Resources


»


See All Articles by Columnist
Carla Shroder

Latest Articles

Follow Us On Social Media

Explore More