Storage Networking 101: Understanding Storage Routing
Storage networking is not unlike IP networking, most of the time. In the IP world there are numerous routing protocols and standards; you have dozens of options. In the storage world, however, there are no official routing protocols. Routing does exist, though it may not be what you're imagining.
Storage Area Networks can generally be thought of as huge Layer 2 networks. SANs implement a mechanism not unlike Spanning Tree from the Ethernet world to keep themselves loop-free. The big problem with sprawling networks is that one problem can impact the entire network. In a SAN environment this gets more exacerbated due to the nature of the FC protocol itself. One way to combat problems associated with too-large fabrics is to isolate them into distinct networks.
Everyone knows that silos in IT are a bad thing, but sane network design often requires them. When a SAN sprawls too wide, stability needs often dictate the creation of multiple fabrics. This does not mean you've created a standard "bad" IT silo, just that you've created two separate fabrics. The good news is that in much the same way IP networks operate, we can route traffic between fabrics.
In IP networking we must have unique IP addresses, but it doesn't really matter if MAC addresses overlap if they aren't in the same subnet. Fibre Channel has no Layer 3 addresses, so the Layer 2 address, or World Wide Name, must be globally unique. In SAN routing, there are two ways to "route" traffic: by translating, or virtualizing, WWNs into fake ones on the other fabric, or by spoofing the address. The fact that WWNs must be unique shouldn't be a problem because they are carefully assigned, but it does help to visualize the difference between layering and translating.
Remember, there is no protocol for SAN routing. Everything we're talking about here is vendor specific, unlikely to interoperate with other vendors' products, and subject to interpretation and bias when evaluating the effectiveness of such mechanisms.
Routing by Termination on the Fabric
The first method of SAN routing is best thought of as a proxy server; albeit an extremely smart one. McData Corporation developed a mechanism to connect multiple SAN silos together. When configured appropriately, switches can masquerade as both a target and an initiator, essentially proxying a FC connection between two SAN fabrics. When a port is terminated on a SAN switch, administrators still have the same amount of flexibility when configuring access to LUNs, and in many cases, but not as much security.
Remember the article that covered the differences between hard and soft zones? If we need to ensure a certain amount of security, both against attackers and configuration errors, we prefer to use port-based, or hard zones. When configuring these silo connections, translations, or mappings, as we'll call them, the only choice is to configure WWN based restrictions. While unlikely to cause a problem, it is something to be aware of. Part of a procedure for replacing HBAs usually includes updating WWN mappings on switches and storage arrays, and now you need to make sure "routing" configurations get made aware of any changes too.
Terminating SAN connections on each silo switch is nevertheless advantageous. The likelihood of SAN-wide outages is greatly minimized, and isolating problems can be much quicker in a router environment. SAN routing is also heavily used in geographically dispersed networks to increase the reliability and stability of the network as a whole.
Other Routing Methods
SAN routing can also take the form of various other technologies. It turns out that routing in a SAN is just segmentation, and the routing glue to make it work. It's precisely the same in IP networks, except that there are clear mechanisms for dealing with passing packets between domains: layers and routing protocols. SANs need to segment for the same reasons as IP networks, but with the additional and looming stability issue added in.
Many people view protocol encapsulation and translation as routing. The FC over IP (FCIP) and even the iSCSI protocol are in a sense routing protocols. They enable SAN extensions over larger IP networks without adding to the sprawl of a fabric. These technologies are frequently used in remote SAN replication to a backup site. You certainly wouldn't want to extend an entire fabric to another city, especially when only one device needs connectivity. The iSCSI target is normally hosted on a storage device, so calling it a routing mechanism is a bit of a stretch. Some SAN switches can act as a translator between FC and iSCSI, though, making the role of a router even clearer. In fact, that's the exact same thing an IP node does—take Layer 2 data and add Layer 3, or IP data on top of it.
SAN virtualization gets even hairier. Certain applications of LUN pooling and the subsequent translation into iSCSI are clear-cut candidates for being called routers, but in general, virtualization isn't really routing. Storage virtualization is precisely what the very first method of routing did. When a SAN switch presents its own LUNs, which are in fact hosted on storage array elsewhere, it is creating a virtualized storage device. Virtual LUNs enable some of the more creative ways to go about routing in a SAN fabric, but the concept of virtualization itself isn't really about routing.
On the other hand, if you stick to the simple definition of routing—segmenting and subsequently gluing parts together—then virtualization as described above is routing. Virtualization takes many other forms, however, such as: LUN pools, remote replication, and snapshots. These other uses of virtualization don't facilitate segmentation of a network.
Yes, it is confusing. There's no routing, even though practice has proven it to be necessary. Tricky segmenting, piecemeal glued together with virtualized LUNs is all we really need. When real routing is needed, for instance, when you need to ship FC packets across the Internet, we simply use IP. It works, so why not just use existing routing infrastructure. Long distances imply high latency anyway, so the main speed benefits of an FC SAN (block-level access and the avoidance of protocol encapsulation) aren't as meaningful.