SAN Multipathing for All Your Points of Failure
SAN's great and it spares system administrators a lot of headaches, but with five points of failure, you need multipathing to keep the support pages at bay.
Keeping up with storage requirements is a never-ending battle. Our users always want more of it and we systems administrators are always struggling to meet that demand. Thankfully a little thing called storage area networking (SAN) (define) came along to save the day. Now we can hook up all of our servers to 'the SAN' and retire in The Bahamas. But wait, on second thought, now that everything is SAN attached, there is one giant, humongous, massive, colossal, awe inspiring single point of failure!
Actually it's even worse than that. There are about five single points of failure in the path from your server to the SAN storage device. You've got the host bus adapter (HBA) in the server itself, the cable to your switch, the switch, the cable to the SAN storage device, and the controller on the SAN storage device. If any of these components fail, your server will lose its connection to the storage device. Worse yet, if the switch or controller fails then all of your SAN attached servers will loose their connection to the SAN storage device.
So what can you do to fix this little problem? Before we delve into the answer let's back up a bit and take a wider look at some SAN basics and the different types of storage options available to us. Getting some perspective on different storage technologies will help us to see why (or maybe why not) it's worth the additional complexity that a SAN brings to the table.
There are three main storage options in today's market: direct attached storage (DAS), network attached storage (NAS), and SAN technology. DAS is just a fancy way of referring to the hard disks that are installed inside of your server. In a small environment where growth is limited, DAS is a great way to meet your storage needs. Load up a server with several disks and you have a solid, quick storage solution. NAS solutions usually come in the form of an appliance or turn key device that is loaded with disks. The NAS appliance typically comes with some sort of proprietary operating system that can serve files via an Ethernet port. NAS is a good plug-and-play solution, but once you fill it up, it's back to the drawing board again.
Enter SAN technology. You can choose between a fiber-based (often referred to with the spelling "fibre") SAN or an iSCSI based SAN (define). The difference being that a fiber SAN is comprised of a fiber optic network, and iSCSI uses Ethernet technology to create your storage area network. Fiber technology has been around longer than iSCSI so it is a little more reliable at this point.
A SAN is essentially a mini network (the SAN) physically similar to a standard TCP/IP based network. You'll need a switch with fiber or Ethernet cable connecting your SAN storage device to all of the servers that will use disk space on the storage device. I'll gloss over the details here, but with SAN you carve up the space on your storage device into logical unit numbers (LUNs) and allocate them to each of your severs as you see fit. When one of your server volumes runs low on space all you have to do is allocate another LUN or two to your server.
That's a very rosy picture, but what happens when your storage device runs out of space? This is where SAN technology really shines. Most SAN storage devices make it easy to add additional disks on the fly. You can carve these new disks into additional LUNs and allocate them to your servers. But even if you've reached the limit on additional disks you can just add another storage device. Assign LUNs from your new storage device to servers as you see fit.
Now that we have a basic understanding of SAN technology, let's turn back to our single points of failure. All of the components that make up a SAN are susceptible aging, failure and the need for maintenance. This is where dynamic multipathing comes into play. For a robust SAN environment, dynamic multipathing is a must.
A SAN is comprised of an HBA in your server, a cable connecting the HBA to a switch, the switch itself, a cable connecting the switch to the storage device controller, and the controller itself. A failure anywhere in this path can lead to disaster, and this is where dynamic multipathing will save the day.
The first thing you will need to do is purchase additional hardware and software. The hardware required is an exact replica of the path from your server to the SAN storage device. You add an additional HBA to your server and connect that to the storage device through separate cabling and a separate switch. Your storage device must have a secondary controller to receive the new connection. The second piece to this equation is dynamic multipathing software on your server. This software is absolutely essential and controls which path is currently active. Some configurations will even allow an active/active connection for increased bandwidth.
You will probably have several options available to you for the dynamic multipathing software. The HBA vendor and/or storage device vendor may provide or sell this software. Additionally, you can purchase the software from a third party. Symantec's Veritas Storage Foundation software provides a good third party dynamic multipathing option. The Storage Foundation software is also very helpful for managing disk volumes on your servers.
Once the dynamic multipathing solution is in place you can expose the same LUNs from both of your storage device controllers. Be careful to follow the directions and read the documentation thoroughly. If everything is not setup properly then you may find your server writing data to both paths at the same time. This is very bad, but if you've followed the directions then you can finally relax. If one of the paths fails at any point your server will automatically switch to the other path!