Build Your Own RAID Storage Server with Linux

By Carla Schroder | Feb 19, 2008 | Print this Page
http://www.enterprisenetworkingplanet.com/netsysm/article.php/3728971/Build-Your-Own-RAID-Storage-Server-with-Linux.htm

If you've been thinking of building yourself a dedicated storage server, this is a good time to do it. Prices are so low now that even a small home network can have a dedicated storage and backup server for not much money. SATA hard drives have large capacities and high speeds for low prices, and you don't need the latest greatest quad-core processor or trainloads of RAM. The ultimate in flexibility and reliability combines Linux software RAID (Redundant Array of Inexpensive Disks) and LVM (Linux Volume Manager).

These are good times for hardware geeks of all kinds: prices are low and features abundant. Most motherboards include a feast of onboard controllers that used to require separate expansion cards: audio, video, RAID, Firewire, and Ethernet. Laptops and monitors come with integrated microphones and cameras. Hordes of USB 2.0 ports means easy connectivity for peripherals. Gigabit Ethernet? They're practically giving it away.

Data Storage and Retrieval

The best part is it's easier than ever to store, backup and retrieve your data. My personal favorite use for USB is connecting external re-writable storage devices; everything from little thumb drives to big hard drives. These are absolutely great for inexpensive backups and file transfers, and even my most relentlessly techno-gumby friends and relatives can copy files to a USB stick. I suffered during those awkward transition years when 3.5" diskettes were too small and there was nothing comparable to replace them. Zip drives were too unreliable, and non-standard — can you read those disks now? CDRWs were funky — sometimes you could read them, sometimes not, and packet-writing never did work reliably on any platform. DVD-RWs offered bigger capacities, but hard drives still outstripped them. Plus there were (and still are) too many competing DVD standards, and just like their CDRW cousins they are not reliable enough.

My favorite solution for large-capacity backups and storage is hard drives. Yes, I know that tape storage rivals hard disks for storage capacity, but I don't like it. It's cumbersome, expensive, and non-portable. Hard drives are fast, easy, inexpensive, and —best of all— very portable. They are readable without any special software or hardware; just stuff a drive into any PC, or in an external USB or Firewire enclosure attached to a PC. It doesn't even have to be a Linux PC as long as you have a Linux LiveCD or USB stick. You'll be able to read nearly any filesystem, and Linux offers a number of good data-recovery utilities if you need them.

But as excellent as all of these are, there comes a time when they're not quite adequate, and that's when a dedicated storage server is the right tool for the job.

Why Use RAID?

One word: uptime. RAID protects you from drive failures. When a drive fails— and it's always "when", not "if"— the remaining disks carry on until you replace the dead disk. But do not expect RAID to replace regular backups, because it doesn't. There are many things that can wipe out a RAID array: power surges, multiple drive failures, undiscovered drive failures, theft, and disk controller failures are just a few examples. You can't read individual disks from a RAID array, except for RAID 1, so you have to rebuild the array to access your data. If too many drives fail, you won't be able to recover anything.

A RAID array, no matter how many disks are in it, looks like a single logical storage drive to your system. There are several different basic levels of RAID, from RAID 0 to RAID 6. They use mirroring, striping, or parity, and various combinations of these. These are the three that are most commonly used:

RAID 0
A striped set with no error-checking. Striping means data are split equally across all disks in the array. It requires a minimum of two disks. It's fast and increases your total available storage capacity, combining all the drives in the array into a single storage unit, but it's also as fragile as relying on a single hard drive— if any one disk fails, the whole array is lost. It's not really RAID because it's not redundant, and you definitely don't want to use it in any mission-critical applications that require high uptimes. It's good for I/O intensive jobs like video production, because you get a large storage volume and the combined bandwidth of all the drives, up to the limitations of the RAID controller.
RAID 1
RAID 1 is mirroring. You need at least two disks, and each one is an exact copy of the other. If one disk fails you don't lose a thing, and there aren't any fancy striping or parity schemes to go haywire. I've used it successfully on client installations for that bit of extra redundancy when they're careless or even interfering with a proper backup setup. ($Deity save us all from Knowitall Managers and their "Talented" Teen-age Nephews.)
RAID 5
This is my favorite general-purpose RAID. It uses both parity and block-level striping across at least three drives. Parity means you get data redundancy via some fancy on-the-fly calculations, and spreading it across all the disks in the array means you can lose one and still rebuild your array. There is some overhead for the extra storage equivalent to one disk divided by the number of disks. So if you have a 3-disk array, 33 percent of your total storage volume is dedicated to parity. On a 4-disk array it's 25 percent, and so on. Reads are fast, but writes are slowed down by the parity calculations.

There are other basic RAID levels, and combinations of the various basic levels, and Google is full of information on those. We're going to stick with the basics here.

Software RAID vs. Hardware RAID Smackdown

Ever since the vi vs. Emacs wars died of boredom it's been difficult to find good flamefests. Even software RAID vs. hardware RAID has become mundane. But it's worth reviewing the merits of each, because this isn't a case of one being clearly superior over the other, but deciding which one meets your needs best.

I wouldn't even bother with a PATA RAID controller; they're more trouble than help. SATA is where it's at these days. First the advantages of a good-quality SATA hardware controller:

  • Offloads all the processing from the CPU
  • Add more disks than your motherboard allows
  • No booting drama

The two disadvantages of good RAID controllers are cost and inflexibility. 3Ware controllers are first-rate, but not cheap. Hardware controllers are picky about what hard disks you can use, and the entire disk must belong to the array, unlike Linux software RAID which lets you select individual disk partitions. Recovery from a controller failure means you need the exactly correct new controller. Some admins think that using a hardware controller is riskier because it adds a point of failure.

Poor-quality hardware RAID controllers are legion. Those onboard RAID controllers and low-end PCI controllers aren't really hardware controllers at all; they do all their work in (usually crappy) software.

Linux software RAID has these advantages:

  • Cost- free!- and these days CPU cycles cost a lot less than good hardware RAID controllers
  • Very flexible; mix-and-match PATA and SATA, individual partitions
  • More recovery options: any Linux PC can rebuild an array

If I were running a super-important mission-critical server that had to be up all the time and no excuses, I'd use SCSI drives and controllers. For everything else, Linux RAID + SATA + LVM. Why do we want LVM? So we can resize our storage volumes painlessly. Come back next week to commence construction.

Resources