Get started Bring yourself up to speed with our introductory content.

Primary storage deduplication's impact on backup deduplication

Learn about the issues customers should consider when implementing primary storage deduplication alongside a backup deduplication system, such as rehydration and performance penalties.

With backup deduplication entrenched at many customer sites, many IT shops are starting to consider adding primary storage deduplication to their arsenal of data reduction tools. But what impact, if any, will primary dedupe have on existing backup deduplication processes? And what are the prospects for storage systems that do end-to-end deduplication, from primary storage all the way through backup?

To answer these questions, we spoke with George Crump, president of Storage Switzerland.

You can listen to the podcast or read the transcript. For storage VARs that are helping customers implement primary storage dedupe, how might primary storage deduplication impact a customer's backup deduplication system?

Crump: I think that's one of the things that a lot of people are struggling with. In the early days of primary storage dedupe, I think the short answer is it won't impact it much. There's an assumption that because you're deduping primary, you either don't have to [deduplicate backup data] or somehow these are going to work together. In general, the reality of today's technology [is that] you'll dedupe it on primary and then when you send it to your backup device, it'll rehydrate, or undedupe, and send it to the secondary device.

In theory, at some point somebody's going to have a top-to-bottom deduplication system and will be able to maintain a deduped format all the way through. There are several vendors talking about this, but nobody really has delivered it yet. What role does rehydration play in primary storage deduplication, and does that have an impact on backup data?

Crump: The primary thing is that it may actually slow down the backup process because [the data will need to be re-inflated, or undeduplicated, and users may not be expecting that]. [As for its impact on primary storage dedupe], in most cases primary storage dedupe doesn't really impact read performance. If you're doing a whole lot [of backups] at one time, which you would be doing with backup, it could negatively impact it, but I would say right now it would be something that I would measure and know before I go talk to customers about it. I think it will vary from solution to solution. Vendors in the deduplication space have different roadmaps for their technologies. A few vendors, notably Dell/Ocarina and IBM/Storwize, are looking to enable end-to-end deduplication from primary data all the way through backup data. Do you think they're going to be out there on their own with that strategy, or are others likely to come around to their way of thinking?

Crump: I think we'll see several vendors in that list. I think the other guys I'd add to that is Permabit, who makes an API set that three or four major storage vendors have stated adoption of. The other thing I'd say is [that] IBM Storwize is more of a compression technology than a dedupe [technology], and that doesn't necessarily make it bad or anything, in fact there are some cases where it might be more effective. But VARs need to be careful nowadays because these terms are being interchanged and not necessarily accurately. But I think that Dell has an interesting story to tell because in theory they could integrate the deduplication technology from a laptop to a server to storage to some sort of backup device, which could be fairly interesting. IBM can clearly do that, but the API approach that Permabit's doing could be interesting because if you have three vendors that are all using the same API set, in theory they should be able to carry the deduplication efficiencies throughout the process.

Dig Deeper on Storage Backup and Disaster Recovery Services