In a recent article for SearchDataBackup.com, Lauren Whitehouse examines the various approaches to deduplication, diving into issues such as the granularity of the process, inline vs. post-processing dedupe, and hardware vs. software dedupe. Understanding the technology driving dedupe systems is important, but what should you, as a solution provider, look for in deduplication?
First, it's important to remember that deduplication is merely a capability, and it fits within the broader practice of data reduction, which also includes compression and archiving (and I'd even argue, thin provisioning). Even Data Domain's appliances, often cited as market leaders in deduplication, are much more than deduplication products. They reduce data by compressing and replicating it, and they can integrate into various backup applications. So, rather than focusing on a deduplication product to offer customers, you should instead look to establish a data reduction practice.
The data reduction process you recommend to customers will depend on data type, whether it's on secondary storage or primary storage and, if it's on secondary storage, whether it's backup data or archive data. Each of these scenarios can benefit from data reduction.
In the backup tier, you'll need to offer two types of data reduction products. First consider backup software solutions that are integrating deduplication into the application itself, like those from Avamar, Atempo and Commvault; these products allow you to use any vendor's disk hardware and may provide some bandwidth reduction by deduping data before it goes across the network. With this kind of product, only the data that goes through that backup application will be optimized, so you'll need to make sure it covers a broad spectrum of data types. Most importantly, using this kind of application means your customer will need to ditch their current backup application(s). Some customers will be unwilling or unable to do so because of cost, while others won't hesitate since they've already planned to upgrade.
For customers that can't afford to throw away their investment in their backup application(s), you'll need a solution that augments rather than replaces the existing software -- like deduplication appliances offered by Data Domain, Exagrid or Sepaton. These products leverage the fact that most customers have multiple backup applications in their data center, allowing backup data reduction to be applied universally across data sets.
The deduplication appliance you recommend to customers should have two key capabilities: It needs to be able to compress data, since compression is a more effective means of data reduction than deduplication on some data, and it should be able to replicate data to a disaster recovery site. Replication to a secondary site is a critical feature that almost every end user I work with wants to add to their current backup strategy.
Once data is off-site, there's a good potential for solution providers to add value. For example, you could reduce customers' recovery time by scripting the recovery of the backup images so that in the event of a real disaster, the data is in place and able to be accessed by virtual machines.
In our next article, we will discuss developing a data reduction strategy for archive.
Here is Lauren Whitehouse's story on data deduplication in disk-based backup:
Where and how to use data deduplication technology in disk-based backup
Data deduplication promises to reduce the transfer and storage of redundant data, which optimizes network bandwidth and storage capacity. Storing data more efficiently on disk lets you retain data for longer periods or "recapture" data to protect more applications with disk-based backup, increasing the likelihood that data can be recovered rapidly. Transferring less data over the network also improves performance. Reducing the data transferred over a WAN connection may allow organizations to consolidate backup from remote locations or extend disaster recovery to data that wasn't previously protected. The bottom line is that data dedupe can save organizations time and money by enabling more data recovery from disk and reducing the footprint and power and cooling requirements of secondary storage. It can also enhance data protection.
Read the fine print when selecting a data dedupe product
The first point of confusion lies in the many ways storage capacity can be optimized. Data deduplication is often a catch-all category for technologies that optimize capacity. Archiving, single-instance storage, incremental "forever" backup, delta differencing and compression are just a few technologies or methods employed in the data protection process to eliminate redundancy and the amount of data transferred/stored.
Read the rest of Lauren Whitehouse's story on deduplication in disk-based backup.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.