Although native Linux networking allows you to set up your Linux server as an Internet firewall and gateway for a number of machines on a network, utilizing a caching proxy server can help reduce your bandwidth usage, as well as give you enhanced logging and filtering capabilities.
The Squid Web Proxy Cache is a popular, free implementation of such a server, and it runs on most Unix systems. The Squid homepage is at http://www.squid-cache.org/. Squid can cache HTTP, FTP, and DNS lookups, enhancing the sharing of an Internet connection by storing frequently accessed data on the local network.
Getting and Installing Squid
Downloads are available from the Squid homepage, either as binary files or source tarballs. The stable version as of this writing is version 2.3.
If you build from source, the compilation is quite easy. A basic installation with the default options would go something like this:
tar -xzf squid-2.3-200101270000-src.tar.gz cd squid-2.3-200101270000 ./configure make make install
If you want to explore the options available at compile time, type:
A number of switchable options are available to control where Squid installs itself, memory usage, and default language, among others.
If you’ve installed Squid using the defaults, the configuration file can be found at /usr/local/squid/etc/squid.conf.
The first option you will see in this file is http_port. By default, Squid uses port 3128. Otherwise you define your port(s) as follows:
Another important item is the amount of memory allocated to the cache. The directive must be defined in multiples of 4KB. The default is 8MB:
cache_mem 8 MB
Squid also caches DNS lookups, which can also save time and bandwidth. The default setting is 1024 entries, and is controlled by the following line:
By default, Squid stores the cached data in/usr/local/squid/bin/squid. This directive controls the filesystem type, the directory used, the allowed size in MB, and the number of first- and second-level subdirectories:
cache_dir ufs /usr/local/squid/cache/ 100 16 256
Logging is done in /var/log/squid/access.log and /var/log/squid/cache.log. Other directives control where these logs are placed, and the level of logging:
cache_access_log cache_log debug_options log_fqdn
If Squid dies, e-mail is sent to the user defined under cache_mgr. This address is also appended to error pages the users might see. The default is webmaster, but you can set it appropriately:
You should either create a “squid” user and group ID for the Squid server process, or assign it to another account with few system rights, like “nobody”:
cache_effective_user nobody cache_effective_group nobody
You will also need to create the cache directory and change the ownership of both the cache and log directories to the squid user:
cd /usr/local/squid mkdir cache chown nobody.nobody cache logs
Finally, we get to access control. It allows you to limit where, when, and what machines can access certain sites. You can get really draconian here and severely restrict access, or drill down and address problem employees who would rather surf than work. A very basic set of control lines is the following:
acl manager proto cache_object acl localhost src 127.0.0.1/255.255.255.255 acl all src 0.0.0.0/0.0.0.0 acl allowed_hosts src 192.168.192.0/255.255.255.0 http_access deny manager all http_access allow allowed_hosts http_access deny all icp_access allow allowed_hosts icp_access deny all
The allowed_hostsline should correspond to your internal network configuration.
Many things can be done with combinations of access control lists and access rules. For example, these lines would keep all internal IPs off the Web except during lunchtime:
acl lunchtime MTWHF 12:00-13:00 http_access allow allowed_hosts lunchtime
And the following would bar a problem user from the ebay domain:
acl problem_user src 192.168.192.22/255.255.255.0 acl ebay dst ebay.com http_access deny problem_user ebay
To start Squid, you’ll need to run the following commands:
/usr/local/squid/bin/squid -z /usr/local/squid/bin/squid
The first pass creates the cache directories, and the second starts the daemon. The first command only needs to be run the first time the proxy is used.
For initial testing, you can use the squid client program:
/usr/local/squid/bin/client -h www.squid-cache.org -p 80 / /usr/local/squid/bin/client -h moe -p 3128 http://www.squid-cache.org/
The first command gets data directly from the Squid Web page, and the second goes through the proxy server, moe. The client program also has a number of options, which can be viewed with the -?command.
You’ll probably want to add a startup script to start Squid with the rest of the system daemons. The method will vary, depending on your OS and/or distribution.
To set up your client browsers to use the proxy, set the HTTP and FTP proxy to point to the Squid proxy machine, port 3128. To force clients to use the proxy, you’ll need to modify your firewall/masquerading setup. Under Linux, you’ll need to enable an additional feature while compiling your kernel:
IP: transparent proxy support
This transparent proxy support will allow you to define a ruleset in ipchains to redirect all external HTTP requests to the proxy server’s port:
ipchains -A input -p TCP -d 127.0.0.1/32 www -j ACCEPT ipchains -A input -p TCP -d 192.168.192.1/32 www -j ACCEPT ipchains -A input -p TCP -d any/0 www -j REDIRECT 3128
These lines enable access to the local Web server, but redirect all other HTTP requests through the proxy. You could also add an additional rule for FTP requests. These ipchains commands should be added to the end of the rest of your firewall script, typically /etc/rc.d/rc.firewall.
By using this procedure, you don’t need to configure the client browsers to use the cache. Some additional squid.conf lines are needed to go with this setup:
httpd_accel_host virtual httpd_accel_port 80 httpd_accel_with_proxy on httpd_accel_uses_host_header on
Otherwise, the redirect will send the user to a Squid error page, noting the absence of the “http://” prefix on the request. You’re still blocking direct access, but not as transparently or elegantly.
The User Guide goes into much more detail on additional options and how to retrieve logging information, but this article should give you a pretty good start. Happy proxying! //
Stew Benedict is a Systems Administrator for an automotive manufacturer in Cleveland, OH. He also is a freelance consultant, running AYS Enterprises, specializing in printed circuit design, MSAccess solutions for the Windows platforms, and utilizing Linux as a low cost alternative to commercial operating systems and software. He has been using and promoting Linux since about 1994. When not basking in the glow of a CRT, Stew enjoys time with his wife, daughter, and 2 dogs at his future (not too much longer!) retirement home overlooking Norris Lake in the foothills of the Smokies in Tennessee.