“Cluster” is probably the most heavily abused term in the computing world. In this article we’ll talk about what a cluster really is, and give an overview of the Linux technologies that can help you implement various types of clusters. The main focus will of course be on building clusters for highly available services.
There are three basic types of clustering technologies, and each of them cluster resources in a different manner, at a different level.
High Performance Computing uses clusters to gain absurd computational capacity. Scyld is an example of HPC clustering, and so is LAM/MPI and MPICH. The MPI-based clusters require an application that can take advantage of the cluster. HPC is for computationally intensive tasks, where the tasks can be broken up into many tasks, for execution on various nodes. Your standard single-threaded application won’t run on these clusters.
Scyld-like clusters are similar, in that they require your application to be spawning many compute tasks, but Scyld will present the cluster as a single machine. On the head node, or master, you will see every single process that’s running on the various nodes—it’s quite cool! This does mean that you’ll be running a custom-built kernel, of course.
But the real question is how to set up a cluster to attain redundant and highly available services. Especially services that are potentially a single-point-of-failure, like NFS servers, e-mail servers, and Web servers. There are two options.
High Availability clusters are focused on redundancy. If a critical server explodes, a standby can take over instantly. This is normally accomplished by having the standby server monitor with something as simple as a ping, or as complex as a program to check that the specific service is responding properly.
The linux-HA project provides the Heartbeat program, which is used by standby servers to verify the health of the active server. It also provides failover functionality and IP address management.
There are problems with HA configurations, however. In a perfect world, we could just throw up two NFS servers with access to the same storage, and let them fight it out. This doesn’t work, but there are cluster-aware file systems that allow multiple active nodes to utilize the storage simultaneously. Throwing databases at clusters is troublesome as well, since data can be left in an inconsistent state when one node fails.
Sun Cluster, Linux-HA, and Piranha are all examples of highly available cluster technologies.
Load Balancing is where one point of contact will dole out jobs to other nodes. The master node can be a single point of failure, so HA clustering is normally used to provide redundant head nodes.
Load balancing doesn’t even have to be about “clustering.” It can be done with network equipment as well as software. Software-based solutions are normally a bit smarter, however; they can monitor the load and responsiveness of the backend nodes, and then send traffic to the most available server.
The most widely known load balancing cluster product is the Linux Virtual Server project, or LVS. LVS operates at a higher level than other technologies. Instead of trying to provide a transparent set of nodes to run processes on, it load balances at the network level. The advantage is that nearly anything can be run on an LVS cluster, assuming it communicates via TCP or UDP.
Piranha, the Red Hat cluster software, uses LVS as well. LVS is really just a glorified proxy server, when you think about it. It does operate at layer 4, though, so it doesn’t need to understand the layer 7 protocols you’re trying to balance, which is a great advantage.
So the question remains: what method of clustering should one use? It really depends on the goals of the server. If you want to implement an NFS server cluster, you’re in for a world of hurt. For many reasons, the foremost being locking, and the fact that NFS isn’t really stateless, means that you will run into problems. To implement a cluster with a shared file system you really must invest in something like GFS or Veritas.
Most often, people are looking for a way to provide a cluster of Web, mail, or other similar servers. The solution is usually going to be something LVS related, if not LVS. TurboLinux Cluster, Red Hat High Availability Server, and many others all use LVS.
Kimberlite does not use LVS, and it seems highly recommended by numerous people. Kimberlite runs services in a highly available active-active configuration, with shared storage. When one node is unavailable, the other starts up the missing services and keeps plugging along.
And wouldn’t you know, there are probably 50 other cluster options that we haven’t mentioned here. The most common are LVS and Linux-ha, and note that Linux-ha.org is quite different from Linuxha.net. Yes it is quite confusing.
SUSE Linux ships with Linux-HA (Heartbeat) installed and ready to use. Red Hat Cluster is another option, and of course rolling your own solution using the myriad of available technologies is yet another option. Just remember to read and understand the drawbacks and limitations of the cluster solution you choose—they all have limitations.