Your customers' unstructured data, essentially file-based data not in databases, is out of control. In a recent SearchSMBStorage.com story, Brian Peterson suggested five techniques for storage admins to get control of their unstructured data. And there are many vendors in the industry that want you to spend your precious engineering and sales time learning how to fix it. The question is, Is it worth your attention?
Let's look at the facts: Unstructured data has been an increasing problem for the past four or five years. During this time, there have been a number of potential solutions for you to learn and then offer to customers. Think back to hierarchical storage management (HSM) and information lifecycle management (ILM). Now we have disk-based archiving and storage optimization. The first two pretty much went nowhere, and most customers have simply expanded their primary storage in response to unstructured data growth. Do storage optimization and disk-based archiving make now the right time to help your customers control unstructured data? The basic answer to that question is yes. While some of the techniques Brian suggested can be applied by storage managers on their own, a few of them -- involve archiving and storage optimization -- present an excellent opportunity for you to step in and address the unstructured data problem with new hardware and software tools.
Let's talk about disk-based archiving first.
The first issue to consider is whether your customer's end users will let their data be archived or subsequently use the archived data. For a long time, the answer to this question has been no. That's because developing a process to manage unstructured data is a project that takes time, and accessing data once it's been archived has been slow. Disk as an archive target from companies like Permabit and Nexsan can change things in this area. Data from a disk archive can be moved to and recalled from almost as quickly as if it were on primary storage. As a result, there should be less resistance to the idea from users since the performance impact to them would be minimized.
Second, there's the issue of how your customer, the storage manager, handles the capacity problem. We may have finally reached a point where primary storage can no longer be expanded at a pace to satisfy the demands of the users and the reality of the budget. Primary, high-performance storage has hit a bit of a wall when it comes to capacity. Most drives are below 500 GB; secondary storage, on the other hand, is at 1 TB per drive and heading quickly to 2 TB. On a per-spindle basis, that is an incredible difference in the cost per gigabyte -- one that today's storage managers can't ignore.
Related to that, there is the financial impact -- and that reaches well beyond primary storage. If moving data from primary storage to secondary storage delays the purchase of additional primary storage, that can be used to cost-justify a disk-based archiving system. Beyond that, moving all this data off of primary storage can significantly reduce the cost of the backup infrastructure and lower the WAN bandwidth load during disaster recovery replication.
Fourth, there is almost always an improvement in staff productivity with an archive project. Unstructured data is typically scattered across many servers and several SANs. This data is backed up and protected each day, and each day, those files that have not been touched for months, if not years, have to be managed just like the files that were created yesterday. By moving all the old data out of the way, the storage administrator would only need to manage the system once a week or so. It is easier to manage one big archive than a hundred little NAS heads.
Beyond disk-based archiving, there are optimization techniques -- data deduplication and compression -- that address the problem of unstructured data by increasing the density of storage. This optimization can be used to lower the price per gigabyte on both primary and secondary storage. In our next article, we will dive deeper into what optimization options are available to you.
Here's what Brian Peterson had to say about unstructured data:
Controlling unstructured file data storage growth: Five storage reduction tips
By its nature, unstructured file data storage is uncontrolled and quite unruly. Unlike the more civilized nature of structured databases, the world of file servers is a free-for-all land grab. In file-server land, individual users can eat through storage space from the inside out without regard for the business value of the information they are storing or its cost. Here are five techniques to help you control unstructured file data storage growth:
- Implement quotas. Most file servers have user, group and tree quota functionality. Network-attached storage (NAS) appliances made by EMC Corp. and NetApp Inc. support quotas, as do Windows Storage servers and Windows 2003 R2 file servers. It's usually best to implement user or group quotas for home directories and tree quotas, which limit the size of a directory, for organization shares like the "HR" document storage location. Implementing quotas seems like an easy and obvious solution, but be prepared, taking disk space from your users is often politically charged and can be a difficult task.
Read the rest of Brian's Peterson's techniques for controlling unstructured data storage growth.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.