Data classification: On the road to ILM

Learn how implementing data classification for your customers can clear obstacles on their way to information lifecycle management and create consulting opportunities for you.

Information lifecycle management, or ILM, is an appealing concept, promising to help control storage costs by migrating

data to appropriate media based on its value to an organization. Your customers have heard the hype about ILM and are probably clamoring for it.

The problem, of course, is that implementing an ILM strategy is not a simple process. ILM is more philosophy than product, so it's not something with a part number that you can sell to your customers. What you can do, however, is educate them about ILM and help them get on the right path, by first undertaking the job of data classification. That's a true value-added opportunity -- and one that doesn't require you to reduce margins by cutting storage pricing, as you might otherwise be inclined to do to stay competitive.

Data classification can be a realistic goal for customers looking to reduce storage costs. Data must first be divided into structured and unstructured pools and then classified into tiers. Building a tiered storage infrastructure by moving data onto the correct type of storage clears one obstacle on the way to ILM. This tip defines both structured and unstructured data and the mechanisms available for classifying it.

More on information lifecycle management (ILM)

ILM vs. DLM: The importance of data management

Tiered storage gone wrong

The business value of information lifecycle management

Structured data refers to content that is stored on disk in a format that is generally proprietary to the application that wrote it, inaccessible except through the application or its application programming interface (API). The data structure within such an application is highly organized and relational. An example of an application that produces structured data is a database whose tables work together to host a connected realm of data. The closed nature of structured data makes it difficult to classify individual elements within the structure. It is even more challenging to relocate these individual segments of targeted data to separate tiered storage technologies.

A single structured data application may contain several tiers of data. Unless the application supports native or third-party archiving tools, it will be difficult to separate the single application instance into multiple tiers of storage. To achieve 80% of the gain with 20% of the effort, it is often best just to move the entire application onto the appropriate tier of storage. The process of data classification for an entire application is very broad and will require an application/storage alignment study to evaluate the performance, availability and recoverability requirements of the application's data. This information will determine on which storage tier the data should reside. The largest challenge in maintaining the structured application data to storage tier alignment is low-impact data migrations. You should invest effort in data migration tools and processes to ensure applications can change storage tiers with minimal or no outage. Storage virtualization or host-based migration services may provide significant value to your customers.

Unstructured data refers to relatively small files disconnected from one another in both purpose and format. User network file shares, content management and email systems containing office productivity documents, emails, images, audio and video files are great examples of this type of data. Merrill Lynch estimates that more than 85% of all business information exists as unstructured data. Fortunately, data classification mechanisms can be most easily deployed and automated with unstructured data. Unstructured data has a pretty simple ILM model. Generally, storage performance does not change much across disk tiers for unstructured data; therefore, data is in an active state, archived or deleted. If unstructured data classification is executed effectively, it will be easier to maintain compliance with records retention policies going forward.

To categorize the unstructured data, first define the subtypes of data that might exist in the data center, such as communications data, human resources information or clinical data for the healthcare industry (see Figure 1). Then define the retention policy for that data subtype based on the client's regulatory or business needs. For example, the law may require that patient clinical data be retained for a patient's lifetime plus eight years. The retention rules should define a timeline for when to archive and when to delete the file data.

Finally, establish the criteria for identifying the data type. For example, if a Social Security number is found in a file, it might automatically be classified as human resources data. Select a data classification tool that can accommodate the file types that store the data (file shares, email, file systems) and the mechanism by which the data will be searched (by filename, directory structure and/or file contents). The selected data classification tool should also support deletion and migration of the files to the targeted storage media, or at least integrate with the planned process. There are many vendors that help with data classification and archiving, including Kazeon, Abrevity, Symantec (with Enterprise Vault) and EMC (with the Xtender product line).

From the outset, true ILM seems like an elusive goal, but with a methodical and realistic approach, simple data classification is well within reach. Once implemented, your customers will be one step closer to realizing the value of ILM. And, if you follow these steps, you might be positioned to guide them through that process as well.

Figure 1 Unstructured data retention policy by type

Data Type Communications data Human resources data Financial data Clinical data
Policy Store for 60 days after reciept Store for 13 years after financial year closes Store for 7 years after financial year close Store for 8 years after patient's life ends
Exchange email   x x x  
PeopleSoft HR     x x  
Fileshare   x x x  
PAX medical imaging         x
Oracle Financials   x   x  

This was first published in September 2007

Dig deeper on Data Storage Management



Enjoy the benefits of Pro+ membership, learn more and join.



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: