The Pros & Cons of Booting from SAN
Like most things, booting up from SAN vs. the server's hard drive does offer many benefits but there are few caveat emptors along the way. But, if managed correctly, they may not be that big a deal.
Booting servers from data held on a storage area network (SAN) is becoming an increasingly popular alternative to the traditional process of booting servers from their own internal disks. To understand why, it's helpful to analyze the pros (and cons) of boot from SAN (BfSAN).
But first, a very simplified overview of how the boot process works: During a local boot, information in a server's BIOS points to the local boot disk, from where the operating system is loaded. BfSAN is slightly different, because the boot disk, as the name suggests, is on the SAN. In this case, the server relies on the BIOS of its host bus adapter (HBA)(or iSCSI adapter in the case of an iSCSI SAN) to find the boot device on the SAN, from where the operating system can be loaded.
So what are the benefits of Boot from SAN?
Less power, less heat, less state - Removing internal hard drives from servers means they consume less power and generate less heat. That means they can be packed more densely, and the need for localized cooling is reduced. And without local storage the servers effectively become "stateless" compute resources which can be pulled and replaced without having to worry about the data stored locally.
Less server capex - Boot from SAN enables organizations to purchase less expensive diskless servers. Further savings can be made through reduced storage controller costs, although servers still need bootable HBAs.
More efficient use of storage - Whatever the footprint of a server's operating system, it will always be over-provisioned in terms of internal storage to accommodate it. Using BfSAN the boot device can be configured to match the capacity it requires. That means a large number of servers running a range of operating systems can boot from a far smaller number of physical disks.
High availability - Spinning hard drives with moving internal components are disasters waiting to happen in terms of reliability, so removing reliance on internal hard drives guarantees higher server availability. The servers still rely on hard drives, but SAN storage arrays are much more robust and reliable, with far more redundancy built in to ensure that servers can boot.
Rapid disaster recovery - Data, including boot information, can easily be replicated from one SAN at a primary site to another SAN at a remote disaster recovery site. That means that in the event of a failure, servers should be up and running at the remote site very rapidly indeed.
Lower opex though more centralized server management - BfSAN provides the opportunity for greatly simplified management of operating system patching and upgrades. For example, upgraded operating system images can be prepared and cloned on the SAN, and then individual servers can be stopped, pointed to their new boot images, and rebooted, with very little downtime. New hardware can also be brought up from SAN-based images without the need for any Ethernet networking requirements, and LUNs can be cloned and used to test upgrades, service packs and other patches or to troubleshoot applications.
Better performance - In some circumstances the rapidly spinning, high performance disks in a SAN may provide better operating performance than is available on a lower performance local disk.
As is always, there are some drawbacks to Boot from SAN which have to be weighed against the benefits just described. These include:
Compatibility problems - Some operating systems, systems BIOSes and especially HBAs may not support Boot from SAN. Upgrading these components may change the economics in favor of local boot.
Single point of failure - If a server hard drive fails then the system will be unable to boot, but if a SAN or its fabric experience major problems then no servers may be able to boot. Although the likelihood of this happening is relatively small because of the built-in redundancy in most SAN systems, it is nevertheless worth considering.
Boot overload potential - If a large number of servers try to boot at the same time -- after a power failure, for example -- this may overwhelm the fabric connection. In these circumstances, booting may be delayed or, if timeouts occur, some servers may fail to boot completely. This can be prevented by ensuring that boot LUNs are distributed across as many storage controllers as possible and that individual fabric connections are never loaded beyond vendor limits.
Boot dependencies - In some server environments, systems may depend on Active Directory (AD) servers, which may not be available after a power failure. To mitigate this risk it may be necessary to allow one of more AD servers to boot from local disks before the rest of the environment is booted from the SAN.
Configuration issues - Diskless servers can easily be pulled and replaced, but their HBAs have to be configured to point to their SAN-based boot devices before they boot. Unexpected problems can occur if a hot-swappable HBA is replaced in a running server: unless the HBA is configured for Boot from SAN the server will continue to run but fail to boot the next time it is restarted.
LUN presentation problems - Depending on your hardware, you may find that some servers can only BfSAN from a specific LUN number. If that's the case, then you will need to have some mechanism in place to present the unique LUN that you use to boot a given server as the LUN it (and other similar servers) expects to see.
Additional complexity - There is no doubt that BfSAN is more complex than common local booting, and that adds can introduce an element of operational risk. As IT staff get accustomed to the procedure, however, this risk should diminish. But the potential for problems in the early stage of Boot from SAN adoption should not be discounted.
For detailed technical instructions on booting from Fibre Channel SAN in a Microsoft Windows environment, see this document http://www.microsoft.com/download/en/details.aspx?id=2815
Paul Rubens has been covering IT security for over 20 years. In that time he has written for leading UK and international publications including The Economist, The Times, Financial Times, the BBC, Computing and ServerWatch.