Manage a Linux RAID 10 Storage Server
Part 3: Learn how to monitor, maintain, and make changes in a Linux RAID 10 array.
Today we'll learn how to monitor, maintain, and make changes in our RAID 10 array. We'll make it bigger, smaller, safely test failure recovery, and set up monitoring and failure notifications.
In part 2 of this series we learned how to create a RAID 10 array during a clean, new Kubuntu installation. The same method works with all the *buntus, Debian and CentOS 5.1. However, there is an even easier way — the Fedora 8 Anaconda installer supports RAID 10, and it recognizes existing RAID and LVM volumes without having to resort to the hacks we used last week. In fact the Fedora 8 graphical installer is sleek and fast; in my opinion the best of the batch. The graphical installers in CentOS 5.1 and Debian Lenny tie for second place, and Lenny's installer includes buttons for taking screenshots.
Another way to create a RAID array is to not have a root filesystem on the system at all, but to boot from a USB device, or even netboot. If you stuff enough RAM in the box you can do away with swap, and have 100 percent of your array devoted to data storage. However you set it up, it's best to keep your root filesystem out of both RAID and LVM for easier management and recovery.
Linux RAID and Hardware
I've seen a lot of confusion about Linux RAID, so let's clear that up. Linux software RAID has nothing to do with hardware RAID controllers. You don't need an add-on controller, and you don't need the onboard controllers that come on most motherboards. In fact, the lower-end PCI controllers and virtually all the onboard controllers are not true hardware controllers at all, but software-assisted, or fake RAID. There is no advantage to using these, and many disadvantages. If you have these, make sure they are disabled.
Ordinary PC motherboards support up to six SATA drives, and PCI SATA controllers provide an easy way to add more. Don't forget to scale up your power and cooling as you add drives.
If you're using PATA disks, only use one per IDE controller. If you have both a master/slave on a single IDE controller, performance will suffer and any failure risks bringing down both the controller and the second disk.
GRUB Legacy's (v. 0.9x) lack of support for RAID is why we have to jump through hoops just to boot the darned thing. Beware your Linux's default boot configuration, because GRUB must be installed to the MBRs of at least the first two drives in your RAID1 array, assuming you want it to boot when there is a drive failure. Most likely your Linux installer only installs it to the MBR of the drive that is first in the BIOS order, so you'll need to manually install it on a secondary disk.
First open the GRUB command shell. This example installs it to /dev/sdb, which GRUB sees as hd1 because it is the second disk on the system:
[root@uberpc ~]# grub
GNU GRUB version 0.97 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename.]
grub> root (hd1,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd1)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd1)"... 17 sectors are embedded. succeeded
Running "install /boot/grub/stage1 (hd1) (hd1)1+17 p (hd1,0)/boot/grub/stage2 /boot/grub/grub.conf"... succeeded
You can do this to every disk in your RAID 1 array. /boot/grub/menu.lst should have a default entry that looks like something like this:
title Ubuntu 7.10, kernel 2.6.22-14-generic, default
kernel /boot/vmlinuz-2.6.22-14-generic root=/dev/md0 ro
Let's say hd0,0 is really /dev/sda1. If this disk fails, the next drive in line becomes hd0,0, so you only need this single default entry.
GRUB sees PATA drives first, SATA drives second. Let's say you have two PATA disks and two SATA disks:
/dev/hda /dev/hdb /dev/sda /dev/sdb
GRUB numbers them this way:
hd0 hd1 hd2 hd3
If you have one of each, /dev/hda=hd0, and /dev/sda=hd1. The safe way to test your boot setup is to power off your system and disconnect your drives one at a time.
Managing Linux RAID With mdadm
There are still a lot of howtos on the Web that teach the old md command and raidtab file. Don't use these. They still work, but the mdadm command does more and is easier.
Creating and Testing New Arrays
We used this command to create a new array in part 2:
# mdadm -v --create /dev/md1 --level=raid10 --raid-devices=2 /dev/hda2 /dev/sda2
You may want to have a hot spare. This is a partitioned, formatted hard disk that is connected but unused until an active drive fails, then mdadm (if it is running in daemon mode, see the Monitoring section) automatically replaces the failed drive with the hot spare. This example includes one hot spare:
# mdadm -v --create /dev/md1 --level=raid10 --raid-devices=2 /dev/hda2 /dev/sda2 --hot-spares=1 /dev/sdb2
You can test this by "failing" and removing a partition manually:
# mdadm /dev/md1 --fail /dev/sda2 --remove /dev/sda2
Then run some querying commands to see what happens.
When you have more than one array, they can share a hot spare. You should have some lines in /etc/mdadm.conf that list your arrays. All you do is create a share group by adding lines as shown in bold:
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=004e8ffd:05c50a71:a20c924c:166190b6
ARRAY /dev/md1 level=raid10 num-devices=2 UUID=38480e56:71173beb:2e3a9d03:2fa3175d