Deduplication can help your organization reduce its data storage needs and the associated costs. To take advantage of the best deduplication has to offer, make sure you’re taking the right approach.
There are several approaches to deduplication. The first choice is between source-based and target-based. Source-based deduplication happens at the client, where the data resides. Target based, on the other hand, happens in or via a device — the data is fed from the source to a target.
In addition, there are a couple of types of target-based deduplication to choose from — inline or post processing. Inline means the dedupe process is interjected directly between the source and the backup target. Post-processing prefers an indirect approach where deduplication is not happening constantly online, but it is contained within specific periods. And then, of course, many vendor options are available.
So which is best and which types might be better suited for specific situations? Like most things, one type doesn’t fit all. Some vendors, of course, advocate the kind of deduplication they offer. Others appear to be taking a neutral stance by investing in all aspects of the technology. EMC, for example, actually offers the gamut — source-based via its Avamar acquisition, inline via its Data Domain acquisition, and post-processing via a partnership with Quantum (note that Quantum also offers inline).
“Source-based is effective when you are seeking to maximize the efficiency of dedupe or when you must dedupe before you transport data over the network,” said Mark Sorensen, senior vice president of the Enterprise Storage Division in EMC’s Storage Software Group.
To his mind, source-based is ideal for data centers running VMware, remote offices, and NAS; whereas target-based is more appropriate for SAN-based backup. Why? Sorensen said deduplication ratios are better in source-based, but it is far more CPU-intensive. Therefore, if you interject source-based deduplication while attempting to backup a SAN, you may inhibit performance. Using target-based, he said, would exert less of a burden on the SAN, although the deduplication ratios may not be as good. He also recommended target-based for databases where typical returns aren’t as good from dedupe as from other applications. Alternatively, if bandwidth is heavily constrained, Sorensen recommended deduplication be accomplished at source, as this would then require far less data to be transported across the network.