Virtualization allows you to better utilize your servers, and effectively increasing that utilization can be a big cost-saver. Now that it is so easy to duplicate server instances using virtual machines, it makes sense to start talking about the best way to take advantage of this new ability. Improved application availability and performance can now be realized without many of the complications that used to plague us. How you go about it, however, may vary depending on the application in question.
Out vs. Up
I’d like to first spend some time talking about the philosophical difference between scaling “out” and scaling “up.” Some applications can be scaled up, that is they can be run on faster hardware to support more transactions. We also call this “scaling vertically.” Say we have a database server that can handle one million requests per second but, due to new demands, we need the database to handle at least two million per second. Databases are well suited for scaling up, since bottlenecks are frequently RAM and CPU. If the bottleneck happens to be disk IO, then you can also scale up the storage system the DB is using.
Scaling out (horizontally) means to add more servers and spread the load across multiple machines. In the database example, this may be extremely difficult, since all database servers will need to use the same data and it will have to be replicated. Scaling out application servers, however, is a common practice.
Before deciding whether to scale up or out, you must realize that scaling out presents its own problems. Web applications require session data so that a load-balanced cluster of servers will have the same state. A common example is authentication: if a user is authenticated with one server, and the load balancer decides to serve that user’s next request (page click) via another server, it could fail to recognize the user is logged in.
Here are a few questions to ask before deciding that scaling out is the right solution:
- Does the application operate properly in a load-balanced environment?
- Will the application scale up to serve enough users without load balancing?
- Can I run many instances of the same application in an automatable and manageable way?
If you find yourself in the situation of having to scale up because of application limitations, you probably shouldn’t be using virtualization at all. An application that requires its own server is not a candidate for virtualization. The overhead of virtualization, as small as it may be these days, will contribute to limiting your performance. Furthermore, you will gain none of the benefits of virtualization, such as consolidation and migration between physical servers, because the application must run on its own dedicated server anyway. The migration argument, in case of a hardware failure, is a weak reason to use virtualization since failover setups can easily be configured between two physical servers.
That said, if you can scale horizontally (out), you’d probably benefit greatly from virtualization. It’s easier to manage virtual machines than physical hardware, you can take down hardware with zero downtime, and your utilization can be maximized.
Load Balancing or Separation of Duties
Assume we have a Web infrastructure that hosts a few thousand Web sites with the Apache Web server. Anyone with that many sites has probably already scaled out to a certain extent, by hosting maybe 500 sites on four servers. There are a few problems with that.
- Any machine failure means that 500 Web sites are down.
- Apache is horrible at handling that many sites and it may take a very long time to restart a Web server.
- Utilization on each server is likely very high.
We have made some assumptions, but let’s suppose the above problems are all true in our fictional environment. There are two solutions to the problem.
Load balancing isn’t such a great solution in this case. As previously mentioned, load-balanced setups require that the application keep state in a central, shared location. It isn’t likely that all 2,000 sites on these server have the ability to do this. A good load balancer can track incoming requests and always send them to the same back-end server, which helps a bit, but if a server goes offline, those clients must re-authenticate. Load balancing, therefore, is best suited for scaling specific applications that support horizontal scaling.
The separation of duties approach is required in the Apache-hosting-too-many-sites scenario. Ideally, we’d like to see no more than 100 Web sites per Apache instance. You won’t have Apache fail because it has too many open configuration files (one for each vhost, most likely), and you won’t have to deal with load balancer issues. It depends on the hardware, but it’s quite possible that you can run four to five VMs on each server, which gets you 500 sites per server again. Wait a minute, that’s the same utilization level we had before, but with virtualization overhead and the extra CPU and RAM requirements for five virtual OS instances! Indeed, but each set of 100 Web sites won’t be utilizing resources the same.
When we scale out, we can realize better utilization by taking advantage of auto-migration functionality in VMWare based on server load. The more you scale out, the more opportunities there are to optimize. If you’re using Xen with some sort of cluster management software, similar rules can be constructed. Disclaimer: in reality, you will probably want to throw another server in the mix to handle future growth.
In short, scaling horizontally using virtualization is much easier than doing it with physical servers, with the added benefit of shuffling around the load at-will. Scaled-out services that can be separated into many smaller portions will offer the most flexibility and benefit for a multi-VM scaling strategy. However, if a single, mission-critical application properly supports a load-balanced configuration, load balancing is probably your best bet.
In next week’s article, we will talk about VM migration and what’s required to do it. Be sure to come back and learn how to run all these VM instances without worrying about application load or being dependent on a single piece of hardware.