In our last article, "Addressing unstructured data growth with disk-based archiving," we talked about Brian Peterson's suggestions on SearchSMBStorage.com for controlling unstructured data growth, and explained how to assess whether your customer is a good candidate for disk-based archiving. It's important to know that disk-based archiving fits into the broader practice of primary storage optimization. It involves a variety of techniques and tools -- including archiving (whether disk or tape), compression and deduplication -- that can be used to lower current storage utilization rates and curtail future storage growth.
Companies like EMC, NetApp, Ocarina Networks and Storwize address primary storage optimization by compressing or compressing and deduplicating data on file servers or NAS file systems. Where and how they do this optimization differentiates the products. For instance, Storwize's STN 6000 and the STN 2000 products do only compression and do so in real time, operating on all the data on the file system, with no performance degradation documented so far. While the STN 6000 and STN 2000 don't have deduplication capability, the fact that they can compress all data may offset the lack of deduplication.
File system deduplication is handled by the likes of EMC, NetApp and Ocarina Networks. NetApp's and EMC's capability is essentially built into their file systems and is available at no or minimal cost, but they work only on storage that is attached to their systems. Ocarina's Optimizer appliance, meanwhile, will deduplicate data on any storage platform. Both Ocarina's and EMC's products can also compress data prior to deduplication. Unlike Storwize's STN 6000 and STN 2000, EMC's, NetApp's and Ocarina's products are not inline and as a result can't operate on live data. (Ocarina's product can also move data from one storage tier to another to allow a customer to ease into an unstructured data management process.)
The good news about both compression and deduplication on primary storage is that they can complement and enhance a move to an unstructured data management process.
But the reality is that compression and deduplication obscure the actual problem: There's too much data on primary storage that's not being accessed. To address that, you should help your customer move that data to some form of archive storage, whether disk- or tape-based archive systems. Essentially, while a disk target is more expensive, it is easier to interact with and since its recall times are so fast, customers are more likely to be aggressive in migrating older unstructured data to it.
The big decision for your customers around archived data is whether they want an automated movement and automated recall of archived data or whether they'll be comfortable with a manual move and recall. Automated techniques are more transparent; manual techniques are cheaper and don't require learning a tool: You simply identify the old data and start moving it. This is one situation where manual might be OK. But your customer will have to decide whether it makes sense to implement a software application to enable recall of data that is unlikely to be recalled.
Beyond this, you can help customers decide what should be migrated, when, and how long it should stay on the archive tier. This could be a valuable professional service engagement for a customer, one that can enhance their opinion of your primary storage optimization capabilities.
Here's what Brian Peterson had to say about unstructured data:
Controlling unstructured file data storage growth: Five storage reduction tips
By its nature, unstructured file data storage is uncontrolled and quite unruly. Unlike the more civilized nature of structured databases, the world of file servers is a free-for-all land grab. In file-server land, individual users can eat through storage space from the inside out without regard for the business value of the information they are storing or its cost. Here are five techniques to help you control unstructured file data storage growth:
- Implement quotas. Most file servers have user, group and tree quota functionality. Network-attached storage (NAS) appliances made by EMC Corp. and NetApp Inc. support quotas, as do Windows Storage servers and Windows 2003 R2 file servers. It's usually best to implement user or group quotas for home directories and tree quotas, which limit the size of a directory, for organization shares like the "HR" document storage location. Implementing quotas seems like an easy and obvious solution, but be prepared, taking disk space from your users is often politically charged and can be a difficult task.
Read the rest of Brian's Peterson's techniques for controlling unstructured data storage growth.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.