Storage Networking 101: Planning Your Fabric
Once a SAN fabric grows beyond the initial two SAN switches, it's time to start planning and designing your network. SANs can be easy to slap together, but so are Ethernet networks. Both require some careful planning for scalability. This week we will discuss some common SAN design principles and help you plan for expansion.
Experience has shown that migrating to a SAN environment is best done in stages. Perhaps you'll still have many NAS or DAS devices lingering for a while after your initial SAN investment—that's OK. The most dangerous method of migrating to a SAN environment, barring tons of consulting dollars, is to replace everything in one fell swoop. When adding SAN equipment piecemeal, however, it helps to have a big picture end-result in mind.
The end result, for many businesses, is a total SAN environment. All servers should have two paths to the SAN, one to each distinct fabric. All storage, and servers requiring storage, are connected to the SAN. In theory, this is great, but it gets expensive quickly. Then people start to think they don't need to follow best practices in SAN design, because "that much redundancy is overkill." Sometimes, in really small businesses, it is. More often, it is required. Be extremely careful about dismissing redundant switches, even when it seems like the redundancy is becoming overkill.
Industry consensus is that a core-edge design is ideal. A core-edge design means that you have two sets of core switches, one for each fabric that individual nodes will connect to, and then fanned-out switches from there. The model is simple when you can have core devices that are truly redundant and highly-available. You connect the core switches to a few edge switches, and the edge switches connect to end-node devices, like servers.
A "director class" switch is a SAN switch that has built-in redundancy, performance and scalability features to meet extremely flexible needs. These SAN switches are very, very expensive, but they make great core switches. The advantage to purchasing your "core" in a single (actually, it should be a pair of directors per fabric) is that you have fewer devices to manage. The same redundancy and performance requirements can be met with a little creative engineering.
The most common core design, for people lacking the director-class switches, of course, requires more standard SAN switches. Instead of a dual-director core (per fabric), each director can be replaced with a pair of switches. Each switch will connect to the others, meaning you'll have to burn three ports just to configure the core. Luckily, Fibre Channel is smart enough to deal with redundant paths, so it's safe to do this. The cost of initially using four ports on cheaper switches is minuscule compared to the price of a director switch.
With both designs, director and non, the next step is to connect edge switches. The number of edge switches you require is completely dependent on the number of nodes that need to connect. Furthermore, you also need to plan for capacity. Having enough available ports doesn't mean you have the capacity. Edge switch throughput is generally good, internally, but you must be careful not to place an extremely popular set of servers behind a single 2Gb link. Yes, 4Gb links exist now, but the problems don't go away. Thankfully we can use another port between two switches, and combine the available throughput. These aggregated ISLs, or inter-switch links, allow us to continue using edge-class switches in the core, even when we've scaled our usage beyond original planning.
Connecting the Pieces
It helps to think of the "core" as being only two-sided: One set of ports connect to hosts, the others connect to storage, backup, or everything else. Every single device will connect to two fabrics via two different edge switches, each with their own core. To provide the best availability, the edge switches must also connect to each half of the individual core. If this isn't making sense, google for "SAN design" images; there are much prettier images from Brocade and Cisco than I could ever craft. This one, from Microsoft, demonstrates the non-director core very well.
Do note that ISL links will likely be required throughout the core, depending on a few variables. If your links are all 4Gb, and data transfer rates are generally low, you're probably safe. However, if there is a storage array that server 20 servers, as well as tons of other traffic, a bottleneck can be difficult to track down. Your core switches definitely need enough throughput to support all aggregate traffic through them, but so do the edge switches. A common mistake is to create a core capable of 10Gb/s, but fail to realize that most of that traffic comes from a single storage array. If the array is connected to the core via a 4Gb link, there's another bottleneck.
New SAN deployments do not usually run into bottlenecks right away, but when it happens, it's always at the most inopportune time. Ideally we would like to plan for and engineer out the possibility of bandwidth contention.
Bandwidth issues aside, we also need to plan for scalability and redundancy. Both aspects are inherent in the edge-core design, but there are, in fact, other schemes you may consider. It's very tempting, and cost effective, to start growing "pairs of switches" everywhere, and then just begin linking them together. That's fine, but pretty soon you'll have a chain, and straight lines are easily broken, even if they're comprised of two parallel lines (remember, pairs of switches).
If you'll never scale past six switches, you'll likely be fine just connecting them together in a circle, but most businesses will need to scale, eventually.