Service provider takeaway: Storage service providers need to plan carefully when recommending data management products to their clients, since the market is undergoing significant change and is headed toward integration.
Data management integration -- or, as I like to call it, data supervision -- is on the horizon. When it happens and precisely what it will look like is yet to be determined. But integration of now-discrete tools is definitely on the way, and smart storage service providers will keep an eye on this market space and consider vendors' road maps when making product recommendations to customers. You don't want to be the service provider who recommends a product that doesn't keep pace with industry standards.
So what, exactly, is data supervision? I use "data supervision" to describe the blending of file auditing and retention, version control, and
Elements of data supervision
Before we explain how an integrated data supervision tool will work, let's talk about the four functions that are now handled by separate tools. First there's file auditing, which logs the creation, modification and deletion of files on storage devices. Proactive file auditing can be used to achieve litigation readiness or to identify internal vandalism of a file. For example, many companies maintain a spreadsheet that contains a payroll summary. With file auditing, the creation, modification or even viewing of that file can be logged. For companies facing a lawsuit, being able to prove who modified a file and when it happened can be invaluable.
File retention, the second function, enables the building of policies for data retention. The policies can interact with prior versions or with the active final file. For example, some organizations might have policies around data retention for the HR directory. Retention also provides basic data movement functions like moving older data from primary storage to archive storage. This frees up primary disk storage space and reduces power and cooling costs.
The third function is version control, which maintains copies of data prior to modification, stored to a separate device. The copy could be sent to a data deduplication archive, allowing for every copy of a file to be retained without a significant additional storage investment. This, again, could help litigation readiness or simply be used to roll back to prior versions of a document in case of corruption or error in coding the newer document.
The fourth area, data leakage, has always been a concern in organizations, but the risks are greater today than ever before. USB thumb drives, laptops, iPods and other small storage devices are helping data walk out of an organization. Some organizations have taken steep measures to ensure data does not leave the organization, such as disabling all USB-attached drives. The challenge with this approach is that it also eliminates legitimate use of these devices. Rather than spending time fixing the wrong end of the problem, it makes more sense to specify certain files that can't be moved to any location or only to certain locations. But leakage is not just an outbound discussion; data leak prevention systems can also prohibit data from being put on the wrong servers. Certain data types can be restricted to placement on the servers only by users with a particular security clearance.
An integrated approach to data management
A holistic approach to these four functions is on the horizon as vendors are becoming aware of the need for greater communication among these tools. Such communication would allow the components to access a common metadata directory and work cooperatively. Using integrated metadata from these data management components, a data supervision system would be able to produce meaningful information about activity on the network, beyond what's now possible. A data supervision system could, for example, identify the owner and modifier of a certain version of a file. And any document created or modified by, for instance, an HR person could be retained in a special location for a preset period of time or be moved to a WORM (Write Once Read Many) device. Another benefit: A data supervision system could block data from leaving an enterprise; in fact, any attempt to do so can be flagged.
Data management integration would also prevent the building of conflicting policies, such as a policy to keep every version of a file forever and a retention policy of shredding a file one year after the last update. An integrated data supervision system would avoid this conflict.
A side benefit to the integrated approach is in disk capacity. Separate metadata databases each take up about 3% to 5% of disk capacity. If four separate tools are used without integration, the hit to disk capacity can reach as high as 20%. With data management integration, there's a single database, saving significant capacity.
Being in the active path will allow a data supervision tool to take action the moment an event triggers the policy condition. The alternative is to wait for some type of crawl of the environment. Because of the complexity of most IT environments, this holdup could cause a delay in adhering to a retention or versioning policy.
The bottom line
With change ahead, integrators need to be aware of how these tools are evolving. If your customers are actively looking to buy one of these components, you should talk to the vendors they're considering and make sure you understand what the vendors' future plans are. Nudge your customers toward the vendors that are clearly headed for integration, since the capabilities of an integrated product will grow increasingly important as the market develops. Don't allow your customers to hitch their wagon to a company that's not working toward data management integration.
About the author
George Crump, founder of Storage Switzerland, is an independent storage analyst with more than 25 years of experience.
This was first published in April 2008