Storage Networking 101: Understanding SANs and Storage
Welcome! We begin out Storage Networking 101 series with an introduction to Storage Area Networks and storage technologies. In case you missed it, be sure to read the entire Networking 101 series before embarking on the Storage journey—a solid understanding of various network protocols is required.
What is a storage network?
A storage network is any network that's designed to transport block-level storage protocols. Hosts (servers), disk arrays, tape libraries and just about anything else can connect to a SAN. Generally, one would use a SAN switch to connect all devices, and then configure the switch to allow friendly devices to pair up. The entire concept is about flexibility: in a SAN environment you can move storage between hosts, virtualize your storage at the SAN level, and obtain a higher level of redundancy than was ever possible with direct-attached storage.
An FC-SAN, or Fibre Channel SAN, is a SAN comprised of the Fibre Channel protocol. Think of Fibre Channel (FC) as an Ethernet replacement. In fact, Fibre Channel can transport other protocols, like IP, but it's mostly used for transporting SCSI traffic. Don't worry about the FC protocol itself for now; we'll cover that in another article later on.
A fairly new type of SAN is the IP-SAN: an IP network that's been designated as a storage network. Instead of using FC, an IP-SAN uses Ethernet with IP and TCP to transport iSCSI data. There's nothing to stop you from shipping iSCSI data over your existing network, but an IP-SAN typically means that you're using plumbing dedicated for the storage packets. Operating system support for the iSCSI protocol has been less than stellar, but the state of iSCSI is slowly improving.
Another term you'll frequently see thrown around is NAS. Network Attached Storage doesn't really have anything to do with SANs—it's just file servers. A NAS device runs something like Linux, and serves files using NFS or CIFS over your existing IP network. Nothing fancy to see here; move along.
There is one important take-away from the NAS world, however. That is the difference between block-level storage protocols and file-level protocols. A block-level protocol is SCSI or ATA, where as file protocols can be anything from NFS or CIFS to HTTP. Block protocols ship an entire disk block at once, and it gets written to disk as a whole block. File-level protocols could ship one byte at a time, and depend on the lower-level block protocol to assemble the bytes into disk blocks.
A protocol always defines a method by which two devices communicate. Block storage protocols are no different: they define how storage interacts with storage controllers. There are two main block protocols used today: SCSI and ATA.
ATA operates in a bus topology, and allows for two devices on each bus. Your IDE disk drive and CD ROM are, you guessed it, using the ATA protocol. There are many different ATA standards, but we'll cover just the important ones here. ATA-2 was also known as EIDE, or enhanced IDE. It was the first of the ATA protocol we know today. ATA-4 introduced ATAPI, or the ATA Packet Interface, which allows for CD ROM devices to speak SCSI-like on the same bus as a regular ATA device.
The neat thing about ATA is that the controllers are integrated. The only "traffic" sent over the ATA bus is plain electrical signals. The host operating system is actually responsible for implementing the ATA protocol, in software. This means that ATA devices will never, ever be as fast as SCSI, because the CPU has to do so much work to just talk to these devices. As far as SANs are concerned, ATA isn't that important. There are some ATA-based devices that allow you to connect cheap disks, but they translate operations into SCSI before sending them out to the SAN.
SCSI, on the other hand, is very confusing. SCSI-1 and SCSI-2 devices were connected via a parallel interface to a bus that could support 8 or 16 devices, depending on the bus width. Don't worry about the details unless you're unfortunate enough to have some older SCSI gear lying around.
SCSI-3 separated the device-specific commands into a different category. The primary SCSI-3 command set includes the standard commands that every SCSI-3 device speaks, but the device-specific commands can be anything. This opened up a whole new world for SCSI, and it has been used to support many strange and wonderful new devices.
SCSI controllers normally contain a storage processor, and the commands are processed on-board so that the host operating system doesn't become burdened to do so, as with ATA. Such a SCSI controller is called a Host Bus Adapter. In the SAN world, the FC card is always called an HBA.
The main thing to know about SCSI is that it operates in a producer/consumer manner. One SCSI device (the initiator) will initiate the communication with another device, which is known as the target. The roles can be reversed! Most people call this a command/response protocol, because the initiator sends a command to a target, and awaits a response, but not always. In asynchronous mode, the host (initiator) can simply blast the target with data until it's done. The SCSI bus, parallel in nature, can only support a single communication at a time, so subsequent sessions must wait their turn. SAS, or Serial Attached SCSI, does away with this limitation by automatically switching back and forth.
SCSI is tremendously more complex, but that's the gist of it.
We need to understand SCSI to know how our storage network is going to ship data. The SCSI protocol plays an enormous role in storage networking, so you may even want to look at it more in-depth.
Next up, we'll begin talking about Fibre Channel itself, which, as chance would have it, is much more complex than Ethernet. This is certainly going to be a fun journey.