Most products that focus on reducing data on primary storage classify that data into two types: the very active data set (databases and files that are currently being edited), which constitutes a very small percentage of the overall storage environment; and data that is not in use, which constitutes the bulk of primary storage.
The only practical way to reduce the footprint of active data on primary storage today is to use an inline compression appliance, like the type offered by Storwize. Despite perception to the contrary, with inline compression, you can reduce the size of the data set with little or no performance impact on most operations.
Unlike with archive or backup data, primary storage has almost no duplication of data, so deduplication here has limited value, except on virtual machine OS images. (NetApp's deduplication extension, formerly called A-SIS, is effective at reducing the size of the virtual machine footprint.) Beyond that there are exceptions, but finding the duplicate data requires specific knowledge of file formats and processing time for analysis. Ocarina Networks' Optimizer appliances have a post-process crawl technique that works well here, without impacting performance. They can compress and deduplicate and store the reduced file in place or subsequently move it to a secondary tier of storage.
That brings us to the most effective means of data reduction on primary storage: Get rid of inactive data by archiving it. Seriously, if 80 to 90 percent of the data on primary storage remains unchanged and unaccessed, what is it doing on your customer's most expensive tier of storage? It's there because most users fear the process of moving it to a less expensive tier; that's where you come in with an effective data reduction strategy that covers all tiers of storage.
For those not using Ocarina's Optimizer, the identification of that data can be made easy with products from companies like Tek-Tools or Aptare.
Once identified, it can be manually moved to the secondary tier of storage,¬ such as a disk-based archive. This is very cost-effective and simple. That's because most disk archives show up as a NAS mount point, and moving data to and from them is as simple as a copy command. If a more automated move and recovery feature is required, there are plenty of mature tools to do that, and global file systems and archiving software have built-in retrieval capabilities.
Archiving has no impact on performance of the active data set. In fact, with less data on a storage system, its performance may improve. There's a chance that users will notice a performance loss when accessing data that has been migrated from primary storage to the archive tier, but that should happen rarely, and certainly that data reduction technique won't impact day-to-day work.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here. This was first published in August 2009
This was first published in August 2009