Plan a Conversion from UW IMAP

By Charlie Schluting | Oct 12, 2006 | Print this Page
http://www.enterprisenetworkingplanet.com/netos/article.php/3637646/Plan-a-Conversion-from-UW-IMAP.htm

Many people started using the University of Washington's IMAP/POP server (UW IMAP) back when e-mail was used for e-mail, instead of its current purpose: sending Web pages and huge attachments. UW IMAP uses the Unix mbox format for storing messages, which keeps everything in one huge file. This presents performance issues, and even bigger problems with file locking. It's time to use something better, so let's identify some of the alternatives, and see what happens when we take the plunge into a Maildir conversion.

The performance issues don't become much of an impediment until you have a few hundred megabytes in a file, or a few thousand messages. Each time an operation is performed on a message, the IMAP (or POP) server must parse the entire file, looking for the message in question. The operations include reading, marking email as read and even deleting messages. Mail clients do a pretty good job of dealing with this slowness by caching certain things, but once your mail files approach 1 GB, all hope is lost. Mail delivery to an mbox file also requires exclusive locking, which leads to even more contention for the ability to write to the file.

There are basically two alternatives, each of which will require that e-mail be converted to a new storage format. Cyrus, the first, includes much more than just a simple IMAP server. It completely replaces procmail as a delivery agent. There are benefits, but also drawbacks to not being able to use procmail. Cyrus stores mail in its own special format, which is similar to Maildir. Cyrus is quite complex, and not very feasible for everyone. For that reason, we'd like to focus on the details of using just plain Maildir.

Maildir format involves a directory, with a "cur" and a "new" subdirectory. New e-mail messages that have never been accessed by a client are stored in the new directory, and cur houses everything else. Each e-mail is an individual file in these subdirectories. No file locking is required, and thousands of messages in the same folder barely impact performance with this format.

The popular IMAP and POP servers that understand Maildir include Dovecot and Courier. Both perform well, and a wide variety of people like both. Dovecot is very nice, and we'd recommend looking into its configuration options before deciding which to use.

To perform this conversion, there are a few steps that must be taken:

  • Convert existing mail files into the Maildir format
  • Configure the delivery agent (procmail, most likely) to deliver to Maildirs
  • Delete the IMAP and POP servers from UW; install Dovecot

Well that doesn't sound too terribly bad. Unfortunately there are a lot of things to consider. The conversion is most challenging, because of the wide variety of possible files you need to convert for users. If people have been allowed to set up procmail filtering, there's no telling where they may have been tucking away random mbox files. The standard locations include $HOME/mail, $HOME/mbox (for pine users), $HOME/Mail, and just plain random droppings in $HOME/, to name a few.

The trick to sleuthing out all possible locations for various mbox files is to know what your users run for mail applications. The vast majority will be in default locations, for example, Apple's Mail.app wants to store sent messages in the home directory under "Sent Items," where Outlook does even crazier things. Finding (rather, the scripting of finding) mbox files is time consuming, and don't forget that people will have many levels of subdirectories as well.

Once the files to convert have been identified, the conversion can begin. There are many scripts for "mbox to maildir" conversion available via a google search. The first thing you'll notice is that everyone who uses Outlook or Mail.app will complain that the dates on all their emails have changed. Both mail clients employ a little trick to try and discern the "date received" of an email, instead of relying on the Date header itself. They send the IMAP command called FETCHINTERNALDATE. The IMAP server will use the timestamp of the file to report this date. A quick fix is to tell everyone to sort by "date sent" instead of received times. A permanent fix requires another script to go through every file and set the timestamp based on the Date header in the email. Verify that the conversion you use does this automatically, to avoid headaches.

The other important aspect to the conversion is to get delivery happening properly. If procmail is used, and most people do, this is a simple matter of changing the default delivery destination. In a global procmailrc file, the line that used to say /var/mail/$USER (or wherever, possibly in the users' homedir) gets changed to /var/mail/$USER/. The trailing slash tells procmail to deliver to a Maildir instead of mbox files. It's that simple, for the default delivery location.

If there are any custom procmails for users, they all must change too. Quite a few sites implement a webmail interface to modify filters, which in turn generates simple procmailrc files for the user. If a directory structure exists in a user's mail/ directory, this isn't very straightforward. Dovecot, by default, will only look in a user's mail/ directory, and it also requires that non-Inbox folders start with a dot. If a user previously had a directory structure of mail/foo, mail/subdir, mail/subdir/one, and mail/subdir/two, the resulting directories in mail/ would be: .foo, .subdir.one, and .subdir.two.

Converting can be a harrowing experience if you're supporting a Unix or Linux environment where users are dangerous enough to stray from the defaults. In a sane environment it's quite manageable, and definitely worthwhile. Immediate results are noticed when e-mail responsiveness increases by many orders of magnitude, and you're the hero of the day, or possibly week if you haven't accidentally lost e-mail.