Hybrid Cloud Disaster Recovery Best Practices
Managing a robust, workable disaster recovery (DR) plan in an enterprise environment poses challenges. Add in a hybrid cloud architecture, where private and public assets must co-exist effectively, and planning for crises becomes even more complex. But a mixed landscape doesn't have to be an obstacle if administrators create the right plan and know how to get the best performance out of their resources. It can, in fact, become an asset.
According to Mathew Lodge, vice president of cloud services at VMware, hybrid cloud offers a couple of distinct advantages when it comes to disaster and business continuity planning. "It can make disaster recovery much cheaper, and it can also dramatically improve the flexibility and the agility of disaster recovery," he explained.
Many enterprises look at the public cloud either as a way to introduce DR into their organization for the first time, or as a way to complement or improve their existing DR plan. The infrastructure available from cloud vendors offers the potential for companies to reduce the equipment needed to run multiple data centers in multiple regions. "They can rely on the public cloud provider instead of having to build it out themselves," Lodge said. Along the same lines, enterprises can also stop duplicating data center infrastructure while retaining robust replication capabilities at each recovery site. All of that adds up to significant cost savings.
One factor that may require some additional consideration, however, is that public cloud providers rarely have hardware and software configurations identical to what an enterprise is using at its primary site. This is often the first challenge administrators face, said Bryan Che, senior director and general manager of the cloud business unit at Red Hat, Inc. "There are going to be some fundamental differences in terms of their infrastructure versus yours," he said. Even if the vendor uses the same virtualization stacks, the configuration probably won't be the same. "Having portability of your workloads into disparate and heterogeneous environments becomes a huge consideration," Che explained.
Considering the differences in architecture, how do public and private clouds usually fit together? Che typically sees enterprises working in a couple of ways. In the first, they use one site as their primary and another for DR. A second usage pattern entails the use of several sites, each acting as a primary during its region's regular business hours and as a backup during off hours. This methodology requires enterprises to "start thinking about what becomes primary, what becomes DR, and how you flip that," Che said. It also forces a more focused approach to shifting workloads efficiently and consistently between the sites.
Disaster failover planning from private to public cloud also requires deeper planning that incorporates more than just applications and their associated data sets. Matt Richins, cloud solutions architect at Rackspace Hosting, said it demands a comprehensive profile of the network. "What is the storage, what are the inbound and outbound connection points to their customers and to their business partners?" he asked. True cloud applications may have resiliency designed into them, but legacy applications and other assets might require working with providers on things like VPN access or frame relay connections in the event of a disaster. "It becomes a lot more complex from that perspective," Richins said.
Bandwidth and the other resources that tie together hybrid cloud environments don't always fall under the control of enterprise administrators, either. In an event that affects a wide area of the public infrastructure (think Hurricane Katrina), failing over between private and public cloud infrastructure may create additional challenges. There are, however, ways to sidestep potential issues. "The first is having very high bandwidth connections and continuous replication of data and applications, so the application and the data that it requires is already at the recovery site," Lodge said. In addition, if a large storm or other event is brewing, administrators may find opportunities to move data prior to the event without hitting bandwidth limitations. "If you've got large amounts of data, rather than ship it over the network, you can ship a physical disc to the cloud provider," Lodge suggested.
Administrators may also discover that cloud and telecom providers can solve some large-scale disaster concerns for them. Many networks are designed with resiliency and additional capacity in mind. "Any good provider worth their salt is going to follow ratios like 10 to 1, where for every 1 meg I need of bandwidth, I have 10 megs on standby," Richins said. If transitioning between public and private cloud is part of the DR plan, enterprises should query their providers on existing contingency planning and disaster resources. Telcos often have "emergency teams whose job it is to literally stand up new LECs and COs," Richins said.
A handful of best practices will help administrators further prime their hybrid cloud environment for maximum durability and recoverability. The first is a focus on data. "You can rebuild your applications as you need to, you can replicate the application environments, but if you don't have access to the data, your application is basically useless," Che said. Efficient data replication technologies go a long way toward addressing that concern. In addition, he said it's also important to think about moving toward application architectures that are designed to go into hybrid cloud environments: "The more nimble, scalable, and optimized for cloud your application is, the easier it is to deal with these types of situations." The cloud is geared toward scalability, and DR in a hybrid environment is the ultimate scale scenario.
Remember, too, that bringing up a recovery site means more than just replicating data. The second part of the equation is to ensure that "your application is going to run the same way in the destination cloud that it's running in your data center today," Lodge said. This means truly understanding how the public cloud end of things is architected. It also requires that administrators understand the reliability of the cloud on the other end. "We've seen clouds have outages," Lodge said. "If that same kind of outage happened again, what's your backup plan for your backup plan?"
Perhaps the most crucial factor in successfully managing any sort of DR program is regular, methodical, and repeated testing. That's when enterprises discover if the right files are part of the failover plan, if the image of the operating system is the way they want it, or if a maintenance pack was missed. "It's by testing and recovery of those applications that you validate if the backup data is still good, and if you have captured the right procedures for recovery," Richins said.