WAN Optimization the Open Source Way
WAN optimization is a complex and expensive, yet sometimes required investment. Even if you aren't running a branch office in Africa over ISDN, the need for WAN optimization and acceleration exists within nearly every business. The problem is that these products are extremely expensive. Wouldn't it be great if the same functionality could be accomplished with commodity PC hardware and free open source tools?
Mostly, it can be done.
What Can't Be Done
Riverbed and other vendors implement Wide Area File Services (WAFS), which is a fancy way to say it caches CIFS and NFS data. If multiple people are working on the same file, or if the same file gets opened and closed more than once, that data does not really need to be shipped to the remote file server. It's even fancier than that; long-term caching of files also makes opening Word documents, which require a lot of bi-directional communication just to open, much faster. WAFS implements a (generally) safe mechanism for caching data when sending it over the WAN would be redundant. General caching proxies are not optimized for file sharing data, and will often have to send the whole thing, whereas the WAFS-style devices can be much more clever about it.
There are, unfortunately, no open source tools to create a systems like this, but WAFS is only one (albeit powerful) method for optimizing the WAN. Most businesses can realize substantial performance and usability improvements by leveraging QoS, caching proxies, and compression available in various open source tools. Jumping straight into a commercial WAFS solution is not recommended; the pricing is staggering, you may not even need WAFS-like features, and the architectural limitations of WAFS are sure to put a damper on your plans.
QoS and Queuing With FreeBSD or Linux
A big part of making a congested WAN link usable is prioritization. Especially if VoIP traffic traverses the WAN link! The good news is that Linux and the BSD family of operating systems can employ effective QoS and traffic shaping / queuing (shaping is accomplished by queuing packets). The prioritization aspect comes in when the kernel is deciding which traffic to allow through when some is queued.
Things can get very complex, very quickly, when configuring QoS. There are different methods for classifying and queuing traffic, as well as for determining how to stall (or kill) non-priority traffic. Ultimately, it certainly is possible to deploy a firewall with traffic shaping such that VoIP always works, Web browsing to internal applications gets priority over Web browsing to the Internet, and whatever other critical traffic you may have is given the proper consideration. Understanding how to classify packets is required to configure even a SOHO-class router with these features, so in the end it's worth doing it in Linux or FreeBSD to get the full feature set.
Web traffic, even to internal servers, can be drastically reduced by using a caching HTTP proxy. Images and other large items can be cached locally using the standard Squid or Varnish proxy servers. Instead of using the WAN link to fetch images on remote Web and application servers every time a user clicks, the content is served from the local cache. All HTTP traffic can be transparently redirected through a proxy without any client-side configuration. HTTPS traffic can be proxied as well, but the Web browsers will need to be configured to use the proxy.
Squid can also proxy FTP traffic, or any other protocol it is configured to work with. This is beneficial for sites that may have a custom application and communication protocol that doesn't use HTTP or FTP. The vast majority of use cases, though, simply require HTTP caching to realize a huge decrease in the amount of WAN traffic.
Before we talk about the two solutions that implement caching, compression, and other tricks, take note that another partial solution exists. In addition to providing security, OpenVPN can also employ compression. It runs in user-space, so latency will increase a tad, but when you need to physically ship less data (or risk saturating the link), OpenVPN is a good option.
WANProxy is a generic (as in flexible) TCP proxy. It can be deployed transparently to compress all TCP traffic between two endpoints. WANProxy can be used to filter data through a Squid proxy instead of the standard method of redirecting traffic using iptables rules in Linux. The benefit of filtering traffic through WANProxy as well, is that you get the compression benefits. And finally, WANProxy also caches (in RAM) some data, so that duplicate data doesn't have to be re-sent over the WAN.
With the combination of WANProxy and Squid, a random remote file, say a Word document, will be transferred over the WAN (compressed) in full the first time someone in the office opens it. The second time, however, it gets served from local cache using no WAN bandwidth and providing immediate response to the end user.
Traffic Squeezer does all that and more. In addition to compression and QoS, it also provides traffic coalescing and protocol specific acceleration. Caching can be had by combining Traffic Squeezer with Squid, or even throw both WANProxy and Squid in the mix.
Traffic coalescing refers to the sending of multiple packets as one, to reduce the impact of protocol overhead and small packets. Traffic Squeezer coalesces every packet, and the benefits add up dramatically. See their documentation on coalescing for some great illustrations of how this works.
TCP and protocol-specific optimization (such as HTTP) also go a long way toward making high latency and low bandwidth WAN links more usable. Other features that may be implemented include post-optimized encryption and VoIP optimization. Traffic Squeezer is an extremely interesting project, and we will be watching to see if the scheduled features get implemented.
As you have no doubt surmised, piecing together a WAN optimizing cache and firewall is extremely complex. This is always the case in the open source world. The freedom to construct whatever system works best for you means it cannot be cookie-cutter-simple; the simple to use devices are likely based on Linux, and they are simple because someone chose how to construct the system for you.
With the right systems/network administrator at the helm, you can squeeze an amazing amount of performance out of 128Kb/s WAN links using open source software. Ongoing maintenance is not usually required more than once every few months, assuming the system was configured well. The choice to build or buy (or have a consultant build) is ultimately governed by your budget and time constraints. The "just make it work" solutions, and they will work fairly well, definitely strain the budget. Customized solutions can perform as well, or even better since they can be customized to your specific data needs.
When he's not writing for Enterprise Networking Planet or riding his motorcycle, Charlie Schluting works as the VP of Strategic Alliances at the US Division of LINBIT (of DRBD fame). He also operates Longitude Technologies, which offers world-wide Linux & Network support and consulting services. Charlie also wrote Network Ninja, a must-read for every network engineer.