High Availability for MySQL: Considering the Options

By Charlie Schluting | Mar 9, 2009 | Print this Page
http://www.enterprisenetworkingplanet.com/netos/article.php/3809331/High-Availability-for-MySQL-Considering-the-Options.htm

Running a basic MySQL setup is quite simple, but as soon as you begin thinking about the best way to provide redundancy, things quickly get complex. There are many layers to consider when it comes to databases, and the variety of solutions is difficult to choose from. Depending on your availability goals, the storage system back-ends you wish to use, and the type of MySQL tables used, the answer will vary. Let's try to untangle this mess.

To make matters even worse, people often combine multiple solutions to deal with the two types of scale at once: both vertical and horizontal. We covered the different types of scaling in a recent article about VM Scaling, but briefly: up (vertical) means more compute power, out (horizontal) means to spread the load across multiple servers.

More often, however, database administrators have more than enough power in a single server to cope with the load. The primary need in that situation is to provide a mechanism for high availability, so a failed server won't stop MySQL from functioning. Here are the most popular solutions to deal with both types of situations, in multiple ways.

Replication

MySQL replication is great for spreading the load across multiple servers, and it also provides a bit of redundancy. Traditionally, you would configure a master server to send updates to slave replicas. Writes would be served by the master, which would send the updated information to slave replicas. As many slave replicas as needed could serve read-only queries. This setup is great for scaling when the majority of your DB transactions are read-only queries, which is true about most workloads. Master-master replication improves upon this model by allowing write transactions to happen on multiple replicas at the same time, so that scaling is not limited to just read-only queries.

Replication is great, but slow transactions can cause replicas to become out of date. When a replica lags behind the master, a crash at an inopportune time can cause a data inconsistency. For this reason, and because mistakes will quickly be replicated, it is worth pointing out that replication is not a backup, it's a replica to use for scaling purposes only.

No special hardware is required, multiple servers can quickly be configured as replicas.

Clustering

Clustering, on the other hand, is another great mechanism for scaling out MySQL. With MySQL Cluster, you configure a master and multiple storage nodes. The data is spread out across multiple nodes, probably using multiple disk subsystems, so that disk IO issues can be alleviated. You can do the same with replication, but using NDB (Network DataBase) as the MySQL storage engine means that write transaction are synchronous and all the nodes' data are never out of date. MySQL cluster is definitely a better option for scaling queries and spreading IO load, but it does not address high availability concerns.

The typical configuration for MySQL Cluster is a Management node, an SQL node (or multiple ones), and multiple back-end storage nodes. If any one of the storage nodes goes down, that data is inaccessible. To deal with this problem, you can now (in MySQL 5.1) enable replication between two clusters. Two clusters replicating between each other is the ultimate in both scale and redundancy, but requires a large amount of servers.

There are no special hardware requirements for MySQL Cluster, but it does require high bandwidth and large amounts of RAM. Be sure to read through all the documentation if you're considering the cluster solution; it covers the selection criteria very well.

High-Availability Failover

Last but certainly not least, we have good old-fashioned failover. With the previously mentioned replication-based solutions, failover would have likely been used as well. In this case however, there is a slightly different approach using either shared or replicated storage. This is certainly where it gets complex, because we begin to see how multiple choices can be combined in creative ways to accomplish various architectures to address site-specific needs.

First, with shared storage, you can have multiple MySQL servers ready to access the storage and begin running mysqld to serve requests. We're talking about a single MySQL server here, but there is no reason you cannot have multiple servers poised and ready to take on any service, including some of the roles of a MySQL Cluster. The de facto standard for service monitoring is the heartbeat application from the Linux-HA project. A daemon on each server monitors the MySQL service, and if it stops responding, the other server will start it up using the shared SAN or iSCSI storage, bring up the shared IP, and begin serving requests.

The second option is to have replicated storage between two MySQL servers. Most often two standard servers with local storage use DRBD to replicate the MySQL file system. In a failover situation, the standby server will mount the replica file system before starting up MySQL, just the same as in the SAN-based example above. This configuration also makes sense in other configurations, where the storage back-end also supports replication, to avoid being dependent upon a single storage device. You can even create two groups of clusters using creative storage replication instead of MySQL-level replication.

Standby failover with MySQL works extremely well, and data integrity is guaranteed as far as the last successful transaction before the primary server stopped responding. In that sense, failover is safer than replication.

Replica storage, replica MySQL servers, clusters failing over; as I said, it gets complex. With every replication, failover, and clustering technology used, the fear that data inconsistency will afflict you becomes greater, so take care to ensure the added complexity will really pay off. In the event of a crash, even greater care must be paid to recovery procedures, because a mistake can quickly spell disaster.

Next week, we will show you how to configure DRBD with HA to quickly set up redundant MySQL servers.


When he's not writing for Enterprise Networking Planet or riding his motorcycle, Charlie Schluting is the Associate Director of Computing Infrastructure at Portland State University. Charlie also operates OmniTraining.net, and recently finished Network Ninja, a must-read for every network engineer.