Best practices for managing archived data

Archival data management solutions require long-term management strategies, which considers the differences between types of data stored, how that impacts storage venues and how to create easy access for your customers. This tip provides a launching pad for your archival data management strategy.

If you've become responsible for providing archival data management solutions, it helps to have good long-term

management strategies that are more nuanced than "Archive everything two years old, and delete everything six years old." This article offers a few high-level guidelines that you can employ when making decisions about long-term archiving, retention and disposal of customer data.

Keep multiple definitions of "archiving" in mind. Archiving doesn't always mean moving data entirely offline. Sometimes it simply means moving it to a form where access is slower, but still possible -- a tape library, for instance, where requests for specific things can be queued up and delivered. The exact implementation is entirely up to you and your customers, and will depend on what you can use to put it together, but there are half-steps between immediate access and putting in a request to have disks pulled from file boxes stored in a warehouse somewhere.

Not all data is alike, and not all of it should be archived the same way. Archived data should be classified in some way before being archived, which not only makes it easier to add different retrieval and archiving methods, but also adds another variety of metadata that can be combined with gathering live statistics.

More on data archiving
Data destruction: Your data will haunt you

Email archiving software adoption risks and challenges for VARs

Digital archiving helps avoid ediscovery litigation

Also keep in mind that a given piece of data can be classified in more than one way -- more akin to the tagging metaphor we're now familiar with instead of static categories. But whether customers want to implement data classification is going to depend on how useful it is to them, and how much work would be involved for resellers to classify and tag data that thoroughly.

For further reading, Gartner Research's papers "Data Classification Is a Vital First Step in Information Life Cycle Management" (G00149459) and "Value Content Based on Risks and Rewards" (G00148408) have some useful ideas in how to implement data classification and valuation.

Find a way to put live access statistics for archived data into your customers' hands. Live usage statistics for your data will give a better picture of what's actually being used -- and what needs to be archived -- than arbitrary cutoffs like dates. The results can be surprising and defy common sense.

Here's an example. Not long ago, film critic Roger Ebert wrote about the way NetFlix and Amazon.com are changing how people obtain and watch movies. He found, to his surprise, that the top 50 or so movies on NetFlix comprise only a very small fraction of total rentals. All the rest of the rental activity comes from back-catalog titles -- this has been called the "Long Tail" effect. This doesn't mean that the top sellers are strategically unimportant, but that all the rest of that sales data is just as potentially useful.

In the same way, mining access data to see what people are really using can be more powerful than using a simple date cutoff -- especially if the value of the data you're retaining is not exclusively ranked by its age. Implementing something like this will take more work, but you'll have that much more efficient a set of criteria for customers to employ when determining what to keep current and what to move into low-priority or offline storage.

If you can't do this yourself, at least give your customers the tools to get that information so they can make decisions about what's worth archiving.

Delete if you must, but delete with care. Storage has gotten astonishingly cheap, but the amount of data to be socked away has also exploded -- sometimes making it impractical to keep everything. (There's also the cost of real-world storage space, which is not something to take lightly over time.) This is where most customers typically employ some kind of cutoff date -- i.e., all records more than X years old are being archived offline permanently or simply deleted. Sometimes this is a matter of legal compliance, too, although that will depend on what kind of work the customer is doing.

This is actually where using cutoff dates is useful: If customers know everything more than X years old is to be destroyed or put out of casual reach, that makes it easier for both you and the customer to plan ahead -- provided they know the data is going to be deleted. If your customers are going to be deleting data, how they inform their clients is up to them, but if you're doing it for your customers, make sure they completely understand what will be kept and for how long, and give them as much flexibility as you can afford. (You may want to consider tiered pricing for different retention policies, for instance.)

Finally, if you do offer deleting as a service for your customers, make sure the actual delete process is carried out as securely as possible. Documents should be shredded, and digital data should either be securely deleted or stored with native encryption. The latter is not very hard to implement, and it makes the decommissioning process that much easier -- if the encryption has been properly implemented, once the media is removed from its host environment, it's effectively been erased.


This was first published in August 2007

Dig deeper on Data Storage Management

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

MicroscopeUK

SearchCloudProvider

SearchSecurity

SearchStorage

SearchNetworking

SearchCloudComputing

SearchConsumerization

SearchDataManagement

SearchBusinessAnalytics

Close