Service provider takeaway: Service providers can build a suite of services around a cost-effective combination of data deduplication and disk-based archiving for customers who are eager to gain the efficiencies of tiered storage.
Tiered storage is a storage management approach that got a lot of press a few years ago but lost steam when storage managers and service providers realized it failed the pesky affordability test. Now it's back on the scene, but with a new companion that makes it more appealing from a financial standpoint: data deduplication. Service providers are in a good position to build a number of services around tiered storage and data deduplication.
Deduplicating data for a near-line archive -- whether via a data deduplication backup appliance with added archive services or a specialized disk archive appliance with added deduplication capabilities -- brings tiered storage within the reach of your customers. Such disk-based archiving has two main advantages: It's easy to access, and it can bring enormous reductions in the amount of data to be archived.
Deduplication isn't new. Disk backup appliances with deduplication capabilities have become almost a requirement of the backup process. In fact, deduplication has established itself as the preferred way to store backup information on disk. As a result, IT staffs are now looking to extend dedupe functionality beyond backup to archiving. While deduplication for archival won't reduce data as dramatically as it will for backup, since multiple full backups have a lot of duplicate data, the savings are still impressive. It's not uncommon for a deduplicated archive to achieve storage efficiencies of five to 10 times.
As an example of the power of data deduplication services for archiving, consider the following scenario: A customer of yours has 10 TB of primary storage. Of that 10 TB, you're able to determine that 8 TB is inactive data, perfect for archiving. If you reduce that 8 TB of inactive, primary-storage data to 100 GB of near-line archive data, you'll bring tremendous value to your customers. You would make 8 TB of primary storage available for new data, possibly delaying the purchase of new primary storage. And you would reduce the amount of secondary storage needed, from 8 TB to 800 GB, equating to big dollar savings. (Primary storage typically costs about $20 per gigabyte.)
In addition to the capital expense savings, your customers would also save money on powering and cooling storage since there's less resulting data that needs to be stored. The energy to power and cool primary storage is typically 100 watts per terabyte of data; by reducing the data to be stored, there's a corresponding reduction in the power and cooling costs.
In addition to space, cost and power efficiencies, deduplicated disk archiving also delivers better data integrity capabilities than primary storage. Its simplicity, ability to perform consistency checks on data and, most importantly, the ability to replicate that data are all part of the solution.
Service providers will find excellent opportunities to add value to this process in three ways. First, customers could use help identifying which data in the enterprise to archive. Being able to identify this data is critical.
Second, customers could use help choosing the right software to move the identified data to the archive. Unlike backup, there's no single tool that can move all the different forms of data; you might need one for file movement, another for database archive and still another for email archive. Being able to integrate all of these into a single system for a customer can provide a smart storage service provider with the differentiation they need.
Finally, customers need help deciding what type of near-line device to use for consolidation and data deduplication. Issues to consider at this stage focus on the customer's goals: Do they need performance, compliance (WORM and encryption), massive scalability or extremely high availability? In the enterprise, this near-line device and its replicated partner might very well be the only location of data that must be retained for a number of years; in a smaller environment, the customer's goal may be foremost to free up primary storage. Your customer's answer to these questions will drive the choice of device.
Tiered storage via disk-based archiving and data deduplication provides a tremendous opportunity for storage service providers to not only add incremental opportunities that highlight what they're best at -- selection, guidance and integration -- but to provide a solution that their customers desperately need and that can allow for tremendous ROI.
About the author
George Crump, founder of Storage Switzerland, is an independent storage analyst with more than 25 years of experience.
This was first published in February 2008