Storage Networking 101: Shopping for Disks
There are lots of choices when it comes to buying disk space for your networked storage.
Before jumping into fibre channel (FC) protocol details, we need to discuss the various types of disks and RAID available. There are many factors to consider before buying storage, and knowing the difference between SATA, SCSI and SAS drives is required. A storage network often bottlenecks at the storage devices themselves, so it's important to sped some time discussing RAID and disk technologies before jumping into the network aspects of it all.
SATA: it's what everyone wants. There are 1TB SATA drives now, and they're extremely inexpensive. Unfortunately, they're slow. There are two reasons: too much density in a single spindle, and just the nature of SATA disks themselves.
Density is the largest factor; especially now that some SATA drives have command queuing. Placing dense storage into a single spindle means that the heads in the drive will need to seek longer for data, and there's no getting around that. SCSI and FC disks are low capacity for a reason. They're also spinning at a faster rate, up to 15K RPM. With command queuing, which SCSI has done all along, SATA performance is actually improving for multiple-user access.
SATA is really designed for desktop use, and when multiple users are accessing a SATA device at the same time, it will slow way down. Again, command queuing helps, but it's still limited by the fact that SATA is spinning at a lower speed, and there's more storage density. The only way to improve storage performance is to add more spindles—true for SCSI as well. Just be careful about buying SATA arrays. They may seem cheap and wonderful, but generally not so much on the wonderful part, unless you're just using them for archival storage.
Archival storage generally involves the copying of large files to and from a disk. This is called sequential I/O, because data is being written sequentially. If you're copying tons of small files all over the disk, this is random I/O, and is much slower. SCSI deals with random I/O much better because it's spinning faster, it seeks faster, it queues and optimizes commands, and there's less density to deal with. Most of the time throughput isn't very important, depending on your workload, so don't pay much attention to SCSI or SATA speeds unless you're really planning on copying gigabytes of data 24x7. The nature of I/O also plays an important role in your RAID choice, which we'll talk about shortly.
FC disks are actually SCSI disks with an FC interface. They gain the advantage of some FC protocol fanciness, and perform roughly the same as SCSI in most cases. SAS, or Serial Attached SCSI, is a new type of SCSI disk. SAS drives are neat because the controller is backward compatible with SATA, so you can mix and match drives if necessary. SAS performs extremely well because it's serial in nature: more than one device can talk on the bus at a time. It's not generally available in RAID arrays yet though; SAS is mostly found installed in 2.5" drives within servers themselves.
So we know what drives to buy for heavily used storage: SCSI or FC. You'll actually have a hard time finding SCSI disks in a FC array these days, so you'll likely be using FC disks. To illustrate RAID considerations, we'll talk a bit about connecting SCSI disks to a PCI RAID card, but don't worry; we'll focus on enterprise storage again quickly.
A RAID controller, first of all, is a CPU. It takes the load off the host computer, and shoves the burden down to specialized hardware. The hardware may be a PCI-E RAID card, or it may be built into your server's motherboard. It may even be part of your desktop motherboard—beware! That's possibly not a RAID controller: some require OS driver support to operate. PseudoRAID controllers are an abomination, but they're useful for light workloads.
The next step up is a PCI RAID controller. These may or may not have battery-backed cache. If the card supports RAID-5, then it had better have a cache, and said cache had better be battery-backed. If not, you'll have no write caching. A RAID-5 controller can quickly return a "done" message when data is written, and then fulfill the request later. This is called write caching, and it dramatically improves performance. Even without write caching, RAID-5 is a bad idea on RAID controllers that don't have internal memory because of the RAID-5 write hole.
A really quick RAID summary:
- Striping. Combine multiple disks into one device. Performance good.
- Mirroring. Takes pairs of disks and mirrors them. Read performance good, write performance bad. Eats half your space.
- Redundancy. Writes stripes of blocks, distributed across disks. A stripe contains a parity block, which can be used for data recovery. Lose one disk's worth of space per RAID-5 set. Performance is good, assuming you have a decent controller.
There are numerous other RAID levels, but the block-level RAIDs are all similar (or a combination of) the three main RAID levels listed above.
Then we come to the real stuff: arrays with 1GB or more of cache, capable of many different RAID levels. They are either DAS, direct attached storage, connected via SCSI or FC, or they're SAN-based. We'll dedicate an entire article to talking about storage arrays and their configuration, later.
The moral of the story is: enterprise-grade equals battery-backed cache. It also means that you aren't going to be using SATA drives for anything but archival purposes.
Realize that purchasing a storage array always involves sticker shock. Once that subsides, don't forget to induce another round by including backup capacity in your planning. RAID is not backup. It can help to prevent data loss in the event of a few disks failing, but it isn't to be relied on. The chances of two or more disks failing at the same time may seem slim, but that's not the case. When the array start rebuilding lost data onto previously underutilized disks they frequently fail.