Problem solve Get help with specific problems with your technologies, process and projects.

Data reduction for disk archiving: Hardware vs. software approaches

Learn techniques to perform data reduction for disk-based archiving, for customers interested in both archiving data off primary storage and reducing the amount of space it takes once it gets there.

In a recent tip, we discussed how to develop a data reduction strategy for the backup tier. Now we'll move on to doing the same for archive data.

Data reduction for archive is an interesting aspect of a global data reduction offering for a reseller to consider. Archiving is inherently a major component of a data reduction strategy, so why should we talk about reducing its storage requirement? As we discuss in our article "Archiving Basics," much of the industry is beginning to leverage disk as an archive target and so the wider the price delta between the disk archive and primary storage, the more likely your customers will be to invest in disk archiving as part of an overall data reduction strategy. And while tape archiving still plays a role at many customer sites, disk archiving is gaining ground.'s spring 2009 Purchasing Intentions survey showed that 74% of 502 respondents are using tape for archiving, versus 57% using disk and 14% using optical, but we're finding greater customer interest in disk archiving, especially as it becomes more economical than in the past.

Disk archiving can essentially be done in two ways: by either the archive software or the archive hardware. Each approach handles data reduction on that archived data differently.

When it comes to archive software, most products focus on a particular application. Exchange archiving dominates in this arena, with dozens of companies in the fray, but software tools for archiving other applications, such as SharePoint, are gaining momentum. For example, MetaLogix's Professional Archive Manager for SharePoint allows the archiving of attachments out of the SharePoint environment, which can deliver a tremendous data reduction payback.

These applications will reduce the size of the data store in these environments by archiving the file attachments in the stores and placing them on the less expensive disk archive. As part of this process, some applications will try to eliminate redundant data. Most often this comes in the form of single instancing, by which only one copy of a particular file is stored but two files that are slightly different will be stored as separate copies. Some applications have added a deduplication capability, which identifies the commonality between the files and stores a copy of the commonalities and then stores only the net changes.

The alternative to software-based archiving is to let the storage archive do all the data reduction work. Companies like Nexsan and Permabit have archival storage systems that can identify the duplicate data as it comes in and perform deduplication and compression on that data. Disk archiving systems typically look like a NAS head to the rest of the data center, so any software application that can write its data to a disk mount can take advantage of these solutions.

In archive, compression can be significantly more important than deduplication. Deduplication, after all, requires redundant data to achieve high storage efficiency rates. Redundancy like this is readily available in backup from the successive full backups customers typically perform. But with archived data, there's far less redundancy since data is moved to an archive as the result of a specific on-time action.

While there will almost always be some level of duplicate data, compression can be universally applied across all data, not just on redundant data. With archive in particular, compression might capture greater efficiencies than deduplication. Deduplication in archive is a nice-to-have, but compression that does not adversely affect performance is a must-have.

When determining whether software-based archive or hardware-based archive is a better fit for customers that need data reduction, consider these bottom-line differences: Software applications can write data to any disk, while the archive storage hardware can accept archive data from a variety of sources yet still leverage deduplication and compression across all of that data. In addition, archive software is still maturing and there are few "all-in-one" solutions. The multi-tenant nature of disk archiving may be a better foundational offering for a reseller; add to that the ability of most hardware-based archive solutions to scale and encrypt the data, and it is the ideal platform.

In our next article, we will discuss reducing data on primary storage.

About the author

George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.

Next Steps

 Learn data reduction tecniques to increase flash performance

Dig Deeper on Data Management Technology Services

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.