Quick and Dirty Backups with rsync

It's not always the best tool for the job, but if you need to get a backup into the cloud quickly and easily, rsync might do the trick. Charlie Schluting steps you through how to build a script to do just that.

By Charlie Schluting | Posted Mar 8, 2010
Page of   |  Back to Page 1
Print ArticleEmail Article
  • Share on Facebook
  • Share on Twitter
  • Share on LinkedIn

We've all seen countless articles, blog and forum posts explaining how to back up a server with rsync and other tools. While I've cringed when people talked about using non-scalable methods, there actually is a place for quick and dirty backup mechanisms. Small companies running just a few virtual machines in the cloud, or even enterprises with test instances, may wish for a quick and effective backup.

Here is a great way to backup a one-off server, including its MySQL database. To function best with hosted virtual machines, it is important to not store backup data. The script below compresses all data and ships it across the wire to a backup server in real time, implementing bandwidth throttles to avoid pummeling the remote server. This will work on any Linux server, especially a recent Debian install.

Considerations

In Unix-land, we often worry about how various archival tools will handle different types of files. Will sparse files be preserved, or will the archive tool copy all the zeroes? Will permissions (and extended ACLs) be preserved? Will hardlinks result in two copies? All good questions, and all handled fairly well with both rsync and tar using the right options, as we will see in a moment. Next is the issue of incremental backups. The great thing about centrally-managed backup software is that it generally handles incremental backups quite well. Scripting something yourself requires you do this manually, but not to worry, I've got a few tricks to show you.

Finally, we need to decide which backup methods to use. You can take a whole disk image if your hosting provider allows it, but that makes restoring files annoying and it also results in many copies of the same data.

Using rsync for backup has problems. If we don't use --delete, which tells rsync to delete files in that archive that have been deleted on the server, then we get an ever-growing archive of all files that have ever been created. Even if they have been deleted, the backup will always have them. If we use --delete, then it may be impossible to restore accidentally deleted files the next day. Bummer. Some people work around this by starting a new backup and then deleting the old after a week or so, but that's annoying to manage.

Ideally, we'd have both the simplicity and convenience of an rsync's file system at our fingertips, along with nightly snapshots. I prefer to rsync critical file systems with --delete nightly, which usually happens very fast, and also tar up the file system for archiving.

Doing It

First, there are some strange tricks I'd like to show you with regards to shipping these backups off-site. I'm not going to provide a copy and paste script, because your paths will be different and it won't work, but I will use a script I wrote yesterday to explain every hurdle I had to overcome. This script runs on the backup server, and backs up critical file systems with rsync and a nightly tar, as well as a MySQL database. It also implements bandwidth throttling on all commands that ship data.

First, it is important to set some variables to avoid typos and writing confusing, redundant commands below.

#!/bin/bash

The backup user and hostname. I've configured my backup server to accept connections to my account from the root SSH key on the remote server, because this backup script will have to run as root.

BACKUP_HOST="charlie@hostname.com"

For rsync commands, use these options. I am enabling archive mode, compression, and hardlink preservation, as well as capping the bandwidth used at around 20Mb/s.

RSYNC_CMD="/usr/bin/rsync -azH --delete --bwlimit=2400"

This command is used within rsync's -e option, which is the only way to tell rsync to connect to a remote server on another port, which is required for my situation.

REMOTE_CMD="/usr/bin/ssh -p 2022"

When running tar backups, use the following options: compress, and don't use absolute paths.

TAR_CMD="/bin/tar czfP"

When I'm sending tar files over ssh, use this command to wrap the ssh command in 'trickle' to cap the bandwidth, and also connect to my special ssh port:

TAR_SSH="/usr/bin/trickle -s -u 2400 ssh -p2022"

Where backups will be stored on the remote server:

DESTDIR="/remote/path/to/backup/storage"

Echo the date and time, so that if we're logging this script output, we have a sense of order:

/bin/date

For rsync backups, the following is all that is required. The first line prints what it's about to do, for logging purposes. This will create a /etc/ directory in the specified remote backup directory, which gets synced up.

echo "running /etc backup, destination: $BACKUP_HOST"

$RSYNC_CMD -e "${REMOTE_CMD}" /etc ${BACKUP_HOST}:$DESTDIR

You can run the same commands to backup /home, /var, and /root. These are the most critical file systems, as everything else should be managed by the operating system. It may also be wise to spit out a package list and write it to a remote file in case you need to rebuild your virtual machine from scratch.

However, /var/ takes some careful consideration. I did not want to backup the MySQL directory with these file archive methods, since I was going to take a database dump anyway. Here is how to exclude it, assuming it lives in /var/lib/mysql. Notice rsync requires a relative path for --exclude:

Note: lines ending with a \ are continued on the next line; it's all one line in reality.

echo "running /var backup, destination: $BACKUP_HOST"

$RSYNC_CMD -e "${REMOTE_CMD}" --exclude="lib/mysql" /var \

${BACKUP_HOST}:$DESTDIR

Now, to get those nightly snapshots of the critical directories with tar.

First check to see if any archives older than 7 days need to be deleted:

echo "deleting old tar FS backups"

/usr/bin/ssh $BACKUP_HOST -p2022 <<HERE
find /path/to/tarfiles/ -name '*.tar.gz' -and -mtime +7 \
| xargs rm -f
HERE

A heredoc probably wasn't necessary, but if you want to add more stringent checking or other commands, it's nice to simply add another line in. That 'find' command will return all files in the tar backup directory ending in .tar.gz and older than 7 days, feeding them to rm. Now we can start the real tar backup.

This next command inserts our tar command with arguments, and then provides two arguments: '-' instructing tar to send the output to stdout, and '/etc' for the directory to archive. It then pipes it to ssh, which accepts a final argument that is the command to run on the remote server. The remote server command does this: stdin is redirected to our backup directory, plus "/tars" and a file name that indicates the date. The resulting file will be called: etc-2010_03_07.tar.gz.

echo "tar /etc backup starting"
$TAR_CMD - /etc | $TAR_SSH $BACKUP_HOST \
"> ${DESTDIR}tars/etc-$(date +%Y_%m_%d).tar.gz"

To ignore the potentially huge MySQL directory, which is pointless to backup when MySQL is running anyway, use these tar arguments for your /var backup:

$TAR_CMD - /var --exclude "/var/lib/mysql" | $TAR_SSH ...

For our database backups, we first check to see if any need deleting, the same way as before:

echo "deleting old tar DB backups"
/usr/bin/ssh $BACKUP_HOST -p2022 <<HERE
find /path/to/db_backups -name '*.sql.gz' -and -mtime +7 \
| xargs rm -f
HERE

Then take a dump, gzip it on the fly, and write it to the remote backup location:

echo "running full DB backup"
/usr/bin/mysqldump --user=root --password='foooo' \
--all-databases | /bin/gzip | $TAR_SSH $BACKUP_HOST \
"> ${DESTDIR}db_backups/$(date +%Y_%m_%d).sql.gz"

You'll want to run this from cron, of course after you've added any other file systems or special items you need backed up.


When he's not writing for Enterprise Networking Planet or riding his motorcycle, Charlie Schluting works as the COO at Elevation Fitness, a Web-based fitness management platform. He also operates Longitude Technologies, which offers world-wide Linux & network support and consulting services. Charlie also wrote Network Ninja, a must-read for every network engineer. Follow Charlie on twitter: http://twitter.com/cschluti






Comment and Contribute
(Maximum characters: 1200). You have
characters left.
Get the Latest Scoop with Enterprise Networking Planet Newsletter