It used to be that the mere mention of clustering was enough to send a shudder through anyone holding the purse strings on a company’s network infrastructure. Today, things are different. With operating systems sporting integrated clustering capabilities and the cost of hardware falling at an ever-increasing rate, the benefits of clustering can be had by organizations of almost any size and with any budget.
Although Windows server platforms have provided clustering capabilities since Windows NT 4.0, the clustering features in Windows Server 2003 are now much easier to implement. It’s just one more reason, if you have not already done so, to take a look at clustering for your network.
Windows Server 2003 supports two types of clustering — server clustering and network load balancing. Server clustering is a relatively complex topic, and one that generally involves the purchase of at least some additional hardware. Although it is unarguably the more powerful and versatile of the two clustering methods, this need for additional hardware is often enough to discourage most people.
Network load balancing, however, is much less complex than server clustering, and offers similar benefits: increased fault tolerance and improved performance for server-based applications. Fault tolerance comes from the fact that more than one server is hosting an application. If a server in the cluster fails, users can continue to access the application via one of the other servers in the cluster. Increased performance is provided by virtue of more than one physical computer system answering requests from clients.
In an NLB cluster, multiple systems appear to the network as a single entity. On Windows Server 2003, up to 32 systems can be included in a single NLB cluster, although the cluster can be created with just two machines and then be scaled as needed. From an end-user perspective, the operation of a cluster is invisible. They will have no way of knowing that the application they are accessing is being hosted in a cluster, but they will appreciate the additional performance.
How NLB works
In its most basic form, NLB is a mechanism for distributing incoming requests among multiple network interfaces. When a request is received via the IP address assigned to the cluster, one of the servers in the cluster takes the request and processes it. The other servers in the cluster ignore the request. The next request that comes in goes to one of the other servers, and so on until the original server answers another request.
Although the principle is straightforward, the mechanics behind such a system are quite complex. For example, TCP/IP communication relies on the fact that an IP address can be resolved to a Media Access Control (MAC) address, which is encoded on a network interface. In a regular system this equates to one IP address per MAC address and vice versa.
In the case of NLB, a single IP address must equate to more than one MAC address — the MAC address of each system in the NLB cluster. This is achieved by the use of a software based virtual network adapter that sits between the network adapter and its regular network card driver. The virtual network adapter has an IP address and MAC address associated with it that is used as the external presence of the cluster. Because the virtual network adapter is aware of the real IP and MAC addresses of the systems behind it in the cluster, it is able to channel requests through to systems using this information.
Each system in an NLB cluster receives every request, but elects to process onlycertain requests based on a mathematical algorithm. It’s easy to gain a simple understanding of the process through the following example. Let’s imagine that there are four technical support operators manning a help desk. The four agree that each will answer one in every four calls, in order. So, operator 1 will answer calls 1, 5, 9, 13 and so on. Operator 3 on the other hand will answer 3, 7, 11, 15 and so on. Once the initial agreement is made, the only communication required between the operators is to make sure that the other operators are still answering calls. If one of the operators were to stop answering calls for some reason, the other operators must detect this and shift their call-answering ration to 1:3. Otherwise, every fourth call would go unanswered.
To detect the presence of the other servers in the cluster, NLB uses heartbeat messages that are sent among the servers. If a server doesn’t receive a heartbeat message from one of the other servers in the cluster, that server recalculates the clustering algorithm to accommodate the change. When the missing server comes back online, the other servers in the cluster detect the presence of the server and once again recalculate the algorithms. All of this calculating and recalculating is done automatically, and, in general, the end users using the applications hosted on the servers in the NLB cluster will be oblivious to the change.
What Can You Use NLB for?
NLB is not suited to every application or environment. NLB is a form of clustering, but unlike server clustering it does not necessarily employ a shared data store. In other words, each server in an NLB cluster can be a fully self-contained server in its own right. This self-contained approach brings with it one major issue that dictates where NLB can be implemented — that of the “state” of the applications hosted on the cluster.
The optimal application for an NLB cluster in which there is no shared data storage, is one that is almost totally read-only, and whose data does not change on a frequent basis. Such an application is said to be stateless, as opposed to stateful. A good example of a stateless application would be a Web server that provided static information, and that had very little dynamic content.
A corporate Intranet with perhaps a searchable database of product information is one application that comes to mind. A relational database like SQL Server, on the other hand, is a good example of a stateful application because the data on the server is likely to be changing on an ongoing basis.
NLB is not suited to stateful applications because, as was mentioned earlier, the data store is not necessarily shared among the servers. As a result, if NLB is used on servers that are each hosting a separate copy of a stateful applications, it would likely not be long before the data stores on each server became out of sync with each other. This is because, although servers in an NLB cluster communicate heartbeat information with each other, they do not communicate data.
Running a stateful application on an NLB cluster would likely result in a query run against one server yielding a different result than one run against another server in the cluster at the same time. The issue of whether or not an application is stateless is perhaps the biggest factor in deciding how you can use NLB on your network.
What Do You Need to Make NLB Work?
NLB is included with all versions of Windows Server 2003 including the Web Edition. In all cases, NLB clusters of up to 32 nodes can be created. From a hardware perspective, the only addition you will need to consider is an extra network card for each system in the cluster. The extra NIC allows servers to communicate normal network traffic with each other without impacting the performance of the network links in the cluster. If an additional network card is not available, you can still create a cluster with only one network card per system, but you will need to make sure that your network cards support multicast mode (which most do). You will also need to ensure that any routers you have on the network support multicast MAC addresses (which not all do).
While the additional hardware requirements for NLB may be minimal, it is also worth mentioning some of the other, non-Windows, considerations that you should think about before implementing NLB. Specifically, this refers to the network infrastructure surrounding the servers in the cluster. For example, there is little point in creating an NLB cluster of three servers with a view to providing fault tolerance, if those three servers are all connected to the same network switch. Doing so would create a single point of failure at the switch, and although switches are generally resilient, the ideal of eliminating single points of failure should still be a priority.
In part two of this article, we will look at the process of implementing NLB on a Windows Server 2003 system and at some of the tools used to create and monitor a cluster.