Five Imperatives for Extreme Data Protection in Virtualized Environments

Transforming an organization through server virtualization requires a strategic and coordinated approach.

 By Peter Eicher
Page 1 of 2
Print Article
Transforming an organization through server virtualization requires a strategic and coordinated approach. Data protection – which includes not only backup, but also secondary storage and disaster recovery considerations – is an area that can easily complicate virtualized data centers if implemented hastily.  It is essential that data protection efforts reduce hardware purchases, rather than require additional hardware to make it work.  The following are five critical data protection imperatives that organizations must consider during virtual server planning.

#1: Minimize impact to host systems during backups

In virtual environments, numerous virtual machines (VMs) share the resources of the single physical VM host.  Backups – which are among the most resource intensive operations – negatively impact the performance and response time of applications running on other VMs on the same host.  On a large virtual machine host with many VMs, competing backup jobs have been known to bring the host to a grinding halt, leaving critical data unprotected.

There are various approaches for minimizing the impact to host systems during backups, though each has drawbacks. The simplest approach is to limit the number of VMs on a given system, making sure you do not exceed the number you can effectively back up. While effective, this goes counter to the purpose of virtualization, which is to consolidate applications to the fewest possible physical servers. It would also limit the financial benefits accrued from consolidation, perhaps significantly.

A second approach is to stagger the scheduling of VM backups. For example, if performance is impacted when four backups are running simultaneously, limit backups to three at a time. This can solve the performance issue, but it can create other challenges. For example, backup jobs cannot be scheduled without referencing all the other existing jobs. What if a particular job runs longer than expected and the next set of jobs start? Suddenly, the performance limit has been surpassed. As data grows over time, jobs may take longer to run, creating backup overlap. There is also no clear way to account for full backups and incrementals in such a scheme. Even if the scheduling is worked out, the total backup window has now been extended significantly by stretching backups over time.  

An early technical attempt at solving the backup problem was the use of a proxy server. For VMware, this model is known as VMware Consolidated Backup, commonly called VCB. With VCB, a separate server is dedicated for running the backups directly off the storage. The virtual machines do not participate in backups. While this seemed good in theory, in practice there was still significant performance impact due to the use of VMware snapshots. It also proved complex to configure. The result was that few users adopted this model and VMware has dropped support for it going forward.

In response to this, with vSphere 4.0 VMware released a new storage API called vStorage APIs for Data Protection. This introduced the concept of Changed Block Tracking (CBT). Simply put, CBT tracks data changes at the block level, rather than the file level. This results in significantly less data being moved during backup, making them faster and more efficient. CBT goes a long way toward solving the problem of backup impact, though it does still rely on VMware snapshots, which create impact, and the data tracking overhead can cause slower performance of virtual machines. CBT also requires backup software to integrate with the APIs, which can result in the need to upgrade the backup environment or change to a new vendor.            

A final approach is to install an efficient data protection agent on each virtual machine, and then run backup jobs just as they would be run in a physical environment.  The efficient agent requires technology that deftly tracks, captures, and transfers data streams at a block level without the need to invoke VMware snapshots.  By doing so, no strain is placed on the resident applications, open files are not an issue, and the file system, CPU, and other VMs are minimally impacted. 

#2: Reduce network traffic impact during backups to maximize backup speed

Slide Show

10 Way to Improve Data Backup

Every aspect of the data center environment can stand a little improvement. But if your backup capabilities are like most, they are in dire need of an upgrade.

Reduction of network traffic is best achieved through very small backups, which dart across the network rapidly, eliminating network bottlenecks as the backup image travels from VM to LAN to SAN to backup target disk.  Block-level incremental backups achieve this while full base backups, and even file-level incrementals, do not.

Minimal resource contention, low network traffic and small snapshots all lead to faster backups, which deliver improved reliability (less time in the transfer process means there is less time for network problems) and allowance for more frequent backups and recovery points.  In a virtual environment, this also means more VMs can be backed up per server, increasing VM host density and amplifying the benefits of a virtualization investment. Technologies such as CBT and other block-level backup models are the best way to limit network impact.

This article was originally published on Aug 18, 2010
Get the Latest Scoop with Networking Update Newsletter