Virtualization is great; that much we can all agree
on. Virtual machines (VMs) can tend to grow out of control, however, now that
it’s so easy to create them. This should not be all that surprising, but
apparently many small to medium businesses are also dabbling in VMs, and they
are suddenly overwhelmed by the VM growth.
Each VM is another server that an administrator must manage. Security updates
must be applied and global configuration changes now need to be propagated to
all these new machines. While it’s easy to create 3-4 (or more) servers on
one physical piece of hardware, you’ll certainly struggle if you
aren’t already set up to scale.
Unfettered Growth
The number of physical machines in a small company may drop dramatically;
maybe 40 percent, when virtualization is implemented. Unfortunately, the number
of operating system (OS) instances will generally increase by two-fold or more at the same
time. The power and cooling savings are realized, as was promised by
virtualization, but taking 20 servers to 12 servers, for example, will mean you
may soon have 40 OS instances to manage.
The reasons for VM proliferation depend on your culture, but the most common
reason is that delegating control of an entire OS is easier than managing an
application for customers. IT customers, be they engineers, application
developers, or smaller IT units within an organization, frequently need more
access than central IT is willing to give. The easy solution: Give them a server
of their own. Test environments, too, are well served by virtual machines.
To keep hardware (and power and cooling) costs down, many companies introduce
policies about the implementation of new services. New applications and servers
need to be run on VMs first, unless they really requires their own
server. Policies such as these are good, in that they limit wastefulness, but
they do tend to exacerbate VM sprawl.
Sprawl aside; it’s worth noting that higher utilization levels on your
servers does not mean that they’ll use an appreciably larger amount of
power. In fact, the power savings claims are really true, and can be even
greater if your utilization is low and you use VirtualCenter’s power
management features. VMWare can migrate VMs to fewer servers if utilization
isn’t high enough, and actually power off unnecessary servers. This works
best with Dell hardware, but other large vendors are supported as well. Imagine:
all your VMs migrating to a few blades in a blade server during the nighttime,
and then as utilization increases during the day, blades quickly boot up and
take the load as needed. Granted, I don’t personally know any enterprise
environments that are brave enough to try it yet, but in theory the concept is
wonderful.
Dealing
Something magical happens when a company grows to around 50 operating system
instances. That’s too many to manage by simply logging in and running
commands, so people start to write scripts. In Windows land, if it hasn’t
already happened, you must implement Active Directory
. For the Unix/Linux servers, configuration management becomes even more
important. Writing a script that SSH ’s to each server
and runs a command doesn’t scale, no matter how hard people want it
to. You need a real configuration management system (such as puppet or cfengine)
to ensure that servers are configured exactly how you want.
If you already operate in a large environment with good automated
installations and configuration management systems, chances are scaling 100-fold
won’t be a problem. Barring scaling issues with the management software
itself, that is. A good network-booting deployment system is only half the
battle, because every server isn’t going to be configured identically. If
you’re “doing it right,” you should be able to arbitrarily
reinstall any server, walk away, and know that it’ll come back up patched
and running all the services it’s supposed to. Servers, or rather the OS
that runs on them, should be truly disposable.
VMWare promises management of a “golden image”, probably because
ITIL mentions it, but it
doesn’t really help in practice. You have to create your images
(somehow). There’s no mechanism to update a golden image with security
patches and apply them to existing systems; you’ll generally have to
reinstall the OS instances. And that’s what you should do periodically,
but without some kind of configuration management system, you’ll also be
manually installing and configuring the services that the VMs used to provide in
order to restore service functionality.
VM growth, therefore, is no different from server growth. It may be easier
and cheaper, but from the OS management viewpoint, you’re doing the same
thing. Likewise, the availability of your services is also in danger. Running
five VMs on a single piece of hardware means that a hardware failure takes out
five servers instead of one. VMWare and Xen can both be clustered and run from
shared storage, such that a hardware failure will result in the VMs immediately
(instantly, even) being migrated to other servers. The problem is that VMotion
requires the most expensive VMWare license, and a VirtualCenter server. Shared
storage isn’t as big of an issues these days with iSCSI , but it’s still
another aspect that must be configured. We’ll cover this issue in-depth in
a future article, focusing on Xen and RHEL Clustering Services.
The point is: dealing with VM sprawl is no different than dealing with
scaling up to support more physical servers. Use whatever mechanisms are
available on your given platforms, and “do it right.” A VM is, and
always will be, just another server.