Tip

Data reduction techniques for primary storage

Over the past few weeks, we've addressed how to develop a data reduction strategy for your customers' backup tier and archive

    Requires Free Membership to View

tier. The final area of focus is primary storage, and this is where it really gets interesting. Performance matters here -- more than space savings. The benefits of data reduction on primary storage have to be carefully measured against its impact on application performance. With few exceptions -- which we talk about below -- most primary storage data reduction products impact performance.

Most products that focus on reducing data on primary storage classify that data into two types: the very active data set (databases and files that are currently being edited), which constitutes a very small percentage of the overall storage environment; and data that is not in use, which constitutes the bulk of primary storage.

More on data reduction
How to develop a backup data reduction strategy for customers

The only practical way to reduce the footprint of active data on primary storage today is to use an inline compression appliance, like the type offered by Storwize. Despite perception to the contrary, with inline compression, you can reduce the size of the data set with little or no performance impact on most operations.

Unlike with archive or backup data, primary storage has almost no duplication of data, so deduplication here has limited value, except on virtual machine OS images. (NetApp's deduplication extension, formerly called A-SIS, is effective at reducing the size of the virtual machine footprint.) Beyond that there are exceptions, but finding the duplicate data requires specific knowledge of file formats and processing time for analysis. Ocarina Networks' Optimizer appliances have a post-process crawl technique that works well here, without impacting performance. They can compress and deduplicate and store the reduced file in place or subsequently move it to a secondary tier of storage.

That brings us to the most effective means of data reduction on primary storage: Get rid of inactive data by archiving it. Seriously, if 80 to 90 percent of the data on primary storage remains unchanged and unaccessed, what is it doing on your customer's most expensive tier of storage? It's there because most users fear the process of moving it to a less expensive tier; that's where you come in with an effective data reduction strategy that covers all tiers of storage.

For those not using Ocarina's Optimizer, the identification of that data can be made easy with products from companies like Tek-Tools or Aptare.

Once identified, it can be manually moved to the secondary tier of storage,¬ such as a disk-based archive. This is very cost-effective and simple. That's because most disk archives show up as a NAS mount point, and moving data to and from them is as simple as a copy command. If a more automated move and recovery feature is required, there are plenty of mature tools to do that, and global file systems and archiving software have built-in retrieval capabilities.

Archiving has no impact on performance of the active data set. In fact, with less data on a storage system, its performance may improve. There's a chance that users will notice a performance loss when accessing data that has been migrated from primary storage to the archive tier, but that should happen rarely, and certainly that data reduction technique won't impact day-to-day work.

About the author

George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.

This was first published in August 2009

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.