In the first three installments of this series (
Is a Server Cluster Right for Your Organization?
Choosing the Cluster Type that’s Right For You
Network Load Balancing Clusters
), I discuss the concepts involved in setting up a server cluster. As I do, I discuss some of the differences between the Network Load Balancing (NLB) model and the server cluster model. In this final article in the series, I’ll discuss the server cluster model in greater detail.
|"If you’re the type who wants it all, you’ll be happy to know that you
can have high availability with load balancing. "
A Quick Cluster Refresher
Just as a reminder, I’ll take just a moment to describe what constitutes a server cluster. On a Windows network, a server cluster is a cluster of two or more machines running Windows 2000 Advanced Server that function as a single machine. Although the machines have separate CPUs and network cards, they are linked to a common storage unit–usually through a fiber channel or SCSI bus. If either unit were to fail, the other unit would keep running, thus providing continuous availability of the application the cluster is hosting.
Keep in mind that not all configurations keep the servers mirrored. Instead, the server cluster model relies on something called a fail-over policy. The fail-over policy dictates the behavior of the cluster during a failure situation. For example, suppose that the first CPU in a cluster were to fail. The fail-over policy on the second CPU would dictate which applications from the failed first CPU would temporarily run on the second CPU. The fail- over policy can also shut down non-critical services and applications on the functional CPU to make way for the extra load it must endure during a failure situation.
Configuring a Server Cluster
There are several different ways to configure a server cluster. Which method is right for you depends largely on what you’re trying to accomplish. For example, are you more worried about high availability, load balancing, or both?
If you’re the type who wants it all, you’ll be happy to know that you can have high availability with load balancing. To do this, you’ll have to set the cluster’s policies to run some applications or services on one CPU and the remaining applications and services on the other CPU. You must then set the cluster’s fail-over policy in such a way that if any of the applications or services fail, they will be run on the other CPU. Obviously, during a failure situation, the functional CPU may become bogged down, because it’s performing twice the usual workload. Therefore, you might set the fail-over policy so that if either machine has to take over for a failed CPU, the unnecessary services or applications will be temporarily suspended until the failed unit comes back online. Although this method is tedious to configure, it provides a great mix of performance and availability.
If the idea of having a server bog down during a failure or the thought of shutting down unnecessary services bothers you, there are alternatives. One such alternative is to implement high availability without load balancing. In this implementation, one server basically runs everything. The other server in the cluster is on constant standby as a hot spare. If the first CPU fails, the fail-over policy shifts control of all applications and services to the second CPU. By using this method, your end users will probably never even notice when a problem occurs. When the failed CPU is brought back online, it takes over control of all of the services and applications, and the second CPU goes back into standby mode.
In the past, I’ve worked for several organizations in which management deemed one or two applications to be mission critical. In these environments, management never wanted to see a network failure of any kind; but if the network did fail, they really didn’t care what failed, as long as those essential applications were still running.
In such environments, load shedding is a great configuration. This configuration is especially effective because it not only guarantees that the application will be available under any circumstances, it also ensures that the application’s performance won’t suffer because of a bogged-down server.
In the load-shedding model, the clustered servers each run their own set of applications, just as you normally would on two separate servers (remember that the cluster is still seen as a single server by the rest of the network). The only difference is that the fail- over policy defines the critical applications. Now, suppose that one of the CPUs fails. During this failure, the second CPU would detect the failure and look at the fail-over policy. The fail-over policy would then tell the CPU to shut down all non-essential applications and to begin servicing any essential applications that were previously running on the failed CPU.
Once you have an idea of which cluster model is right for your environment, you have a lot more planning to do. The first part of this process is to create an exhaustive list of your applications. This list should include things like the current location of each application, any dependencies related to the application, and just how critical the application is. For example, if you have a critical customer management program, you might list the place that the program currently resides and indicate that the program is dependant on the sales database running in the background. Therefore, you’d also want to document the location of the sales database and flag both the program and the underlying database as critical applications. If you’re questioning the critical status of the database, consider that the customer management program is critical and can’t run without the database; therefore, the database is also critical.
While determining dependencies, you must also look for applications that have common dependencies. For example, suppose that you have two applications that both depend on the same underlying database. Because of the dependency structure, these applications and their dependencies must always be grouped together.
Finally, when designing your fail-over policy, you must consider the impact of that policy. For starters, if you make the second server take over running a critical application, will all the dependencies be in place for the application to run? You must also consider hardware-related issues, such as whether the CPUs have a fast enough processor and enough memory to handle the fail-over policy that you’ve designed without crashing or bogging down. As you can see, setting up a cluster can be a great way to protect your data or to increase the speed of a Web site. In this article, I’ve explained the type of clustering environment that’s suitable for both situations. //
Brien M. Posey is an MCSE who works as a freelance writer. His past experience includes working as the director of information systems for a national chain of health care facilities and as a network engineer for the Department of Defense. Because of the extremely high volume of e-mail that Brien receives, it’s impossible for him to respond to every message, although he does read them all.