Data reduction for archive
Disk archiving can essentially be done in two ways: by either the archive software or the archive hardware. Each approach handles data reduction on that archived data differently.
When it comes to archive software, most products focus on a particular application. Exchange archiving dominates in this arena, with dozens of companies in the fray, but software tools for archiving other applications, such as SharePoint, are gaining momentum. For example, MetaLogix's Professional Archive Manager for SharePoint allows the archiving of attachments out of the SharePoint environment, which can deliver a tremendous data reduction payback.
These applications will reduce the size of the data store in these environments by archiving the file attachments in the stores and placing them on the less expensive disk archive. As part of this process, some applications will try to eliminate redundant data. Most often this comes in the form of single instancing, by which only one copy of a particular file is stored but two files that are slightly different will be stored as separate copies. Some applications have added a deduplication capability, which identifies the commonality between the files and stores a copy of the commonalities and then stores only the net changes.
The alternative to software-based archiving is to let the storage archive do all the data reduction work. Companies like Nexsan and Permabit have archival storage systems that can identify the duplicate data as it comes in and perform deduplication and compression on that data. Disk archiving systems typically look like a NAS head to the rest of the data center, so any software application that can write its data to a disk mount can take advantage of these solutions.
In archive, compression can be significantly more important than deduplication. Deduplication, after all, requires redundant data to achieve high storage efficiency rates. Redundancy like this is readily available in backup from the successive full backups customers typically perform. But with archived data, there's far less redundancy since data is moved to an archive as the result of a specific on-time action.
While there will almost always be some level of duplicate data, compression can be universally applied across all data, not just on redundant data. With archive in particular, compression might capture greater efficiencies than deduplication. Deduplication in archive is a nice-to-have, but compression that does not adversely affect performance is a must-have.
When determining whether software-based archive or hardware-based archive is a better fit for customers that need data reduction, consider these bottom-line differences: Software applications can write data to any disk, while the archive storage hardware can accept archive data from a variety of sources yet still leverage deduplication and compression across all of that data. In addition, archive software is still maturing and there are few "all-in-one" solutions. The multi-tenant nature of disk archiving may be a better foundational offering for a reseller; add to that the ability of most hardware-based archive solutions to scale and encrypt the data, and it is the ideal platform.
In our next article, we will discuss reducing data on primary storage.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here. This was first published in August 2009
This was first published in August 2009