Tonic for your Backup Woes: CD Backups In Linux (Part Two) - Page 3

 By Carla Schroder
Page 3 of 3   |  Back to Page 1
Print Article

Good Ole Split

What do you do when your backup is larger than a single CD? Split to the rescue. When the tarball is larger than a single CD-R/RW, use split to break it up into manageable chunks. Split needs to be told how big each chunk should be. A 650-megabyte CDR is about 681,000,000 bytes. (1,048,576 x 650).

Leave room for filesystem overhead, such as table of contents and other housekeeping. I like to leave a good margin, as cdrecord will warn when the data becomes too large but won't stop, which of course creates errors.

# split -b 650000000 /backup/big_backup.tar.gz

Split names these files xaa, xab, xac, and so on. If it runs out of x-prefixes (more than 676) it starts a new series with zaa, zab, and so forth. If you're trying to split a tarball over 676 CDs, I suggest looking for larger media. cd to the directory containing the tarball splits and convert each x file to .iso.

#for i in 'ls xa*'; do echo -e "$i"; mkisofs -o $i.iso $i; done

This will take some time. When it has completed, burn each iso to disk.

To restore files, the split files must all be copied to a hard drive and concatenated. Then the newly-rebuilt big tarball needs to be unpacked. Call the file anything you want as long as it has the .tar.gz extension:

# cat /cdrom/xaa /cdrom/xab /cdrom/xac > /restore/wholefile.tar.gz

Once the original mondo tarball has been reassembled, extract files as needed. tar has a huge set of command options -- here are a few of the more commonly used ones:

  • -p -- Preserve permissions when extracting files
  • -P -- Keep absolute file paths
  • -S -- Record sparse files efficiently
  • -W -- Verify archive after creation
  • -T -- Get names of files to backup from file

tar can also perform incremental backups. If multiple disks are needed for the full backup, it will mean having to periodically feed piles of disks to the computer and will also involve keeping them sorted and organized. If the full backup uses no more than 3 or 4 disks, and a week's worth of incremental backups use a single disk, it is still manageable; anything beyond this becomes questionable in terms of efficiently managing the backups.

The advantages here are that tar and split are standard GNU/Linux utilities and CD writers are much less expensive -- and faster -- than tape drives. The downside is one bad bit ruins the entire set. However, even that is not fatal, as Linux has many tools for reading and recovering data. These days I primarily rely on CDs and extra hard drives for backups, rather than tape. Of course, hard drives and rsync are wonderful backup tools, but that's a subject for another day.

info tar
info split
info cdrecord
info mkisofs

» See All Articles by Columnist Carla Shroder

This article was originally published on Jan 16, 2003
Get the Latest Scoop with Networking Update Newsletter