Even before technology developed to the point that we could easily virtualize data sets, the concept of tiered storage has been appealing to IT managers. Putting less important data on less expensive storage has always made
Tiering solutions virtualize data and move it between pools of storage manually or according to predefined policies. Whether at the block or file level, tiering abstracts users (or other applications) from the physical location of data, which can be storage arrays, file servers, NAS devices or a combination of these.
We’ll focus on automated tiered storage as primarily a cost optimization strategy, instead of a performance strategy. While automated tiered storage can provide some performance enhancement, that is probably more often implemented with a caching technology, in which copies of certain data subsets are put onto a faster, cache tier (such as solid-state drives, or SSDs) while the superset is maintained on a lower tier.
Why use tiered storage?
As mentioned above, tiered storage just makes sense, given the differences in cost among high-performance drives, capacity drives, tape and now the cloud. Since data usage falls proportionately with age (usually), putting data onto an archive tier as it ages out is a fairly simple way to save money. Effective storage tiering can improve storage density, addressing environmental and resource issues like power consumption, floor space and management overhead, on a per-gigabyte basis. Also, storage tiering provides a way to store appropriate data sets in storage pools configured with better availability or protection.
Create, classify, move, repeat
While the rationale for tiered storage is fairly consistent, the implementation is not. The process involves setting up the tiers themselves, classifying data sets, moving them to the appropriate tier at the appropriate time and then repeating this last step as often as practical. But there’s a wrinkle. Typically, it’s easier to create storage tiers than it is to accurately classify data and move it to the right tier. Determining business value, common for classification, can be difficult for IT departments, since they typically don’t own the data. Even getting accurate costs for each storage tier is complicated, since TCO includes management, maintenance, data protection and environmental factors in addition to acquisition, installation and possibly data migration.
For some early tiered storage solutions, this created a “tail wagging the dog” scenario, as storage tiers were frequently available before different data sets could be defined to store on them. Also, the actual transfer of data was often labor- and resource-intensive as files were copied between storage systems on the network or between volumes on the same system.
Automated tiered storage systems speed classification, movement
Automated tiered storage solutions can provide an answer by automating the classification and movement processes while simplifying expansion and management. As a software function of the storage array controller, array-based automated tiered storage systems subdivide data volumes into block segments and move each segment to the appropriate storage tier. At a fundamental level, they move data between tiers faster and more frequently than an external tiered storage system does. Also, most vendors allow further classification of data based on application, volume or type of data, etc. In most cases, automated tiered storage levels are internal to the storage system and limited to the storage type that’s available from the array manufacturer, as all capacity must be purchased from that vendor. They’re also block-based, not file-based.
Most storage array-based automated tiered storage solutions can be configured with performance- and capacity-centric hard disk drives (HDDs) and tape, as well as SSDs. Typically, new data is written to the highest HDD tier and migrated to lower tiers as it ages out, but it can also be moved up to an SSD tier. To do migration between tiers, these systems have a warm-up period of several hours to a day or more in which they accumulate information -- typically, usage patterns -- about the data they’re storing. Although this period does represent a performance delay, the information is required to make valid placement decisions. For movement of data down in tier, for cost optimization, this delay is fairly insignificant. For the upward movement of data (like to SSD), the warm-up period is effectively a write latency and one of the reasons that caching instead of tiering is often used for performance enhancement strategies.
Storage array-based automated tiered storage systems are typically more efficient than external software tiering solutions or appliances. They usually create smaller block segments, providing more granularity than external block- or file-based systems. This means they’re less likely to move data unnecessarily. They can also move these segments without impacting the network. The ability of array-based automated tiered storage systems to move sub-volume data segments between tiers automatically can be especially effective at optimizing the most expensive storage resources.
As part of the storage controller, the tiering function can be integrated with other storage functions that involve block data movement, like snapshots and replication. This can reduce system overhead and ensure these features aren’t purchased a second time, as they could be with an external automated tiering storage system. Capacity expansion and storage management are also simpler as this platform-based virtualization enables each tier to expand independently, without taking the system down.
Array-based automated tiered storage is typically limited to storage within the array itself and can’t be used to consolidate external storage platforms without migrating data. It’s block-based, which means it can’t support applications requiring file services natively, a significant detail since the majority of data growth is unstructured (file-based). Compared with file virtualization systems and external block-based tiered storage solutions, a storage array-based automated tiered storage system can also be less flexible since it can’t usually tier data between different storage platforms or offer a way to repurpose existing storage assets. And, as is the case with other external storage systems, these block-based systems can’t easily integrate with a cloud tier, which typically requires file-based data.
In the final analysis of automated tiered storage, a pragmatic approach is best. The idealism that accompanied the information lifecycle management (ILM) movement several years ago was part of the reason it didn’t catch on. Like a sports team that takes what the defense gives them, optimizing a storage environment with an automated tiered storage solution should focus on the gains that can be made easily and acknowledge that not every piece of data will be classified and stored according to a grand plan.
Array-based automated tiered storage is pretty much set-and-forget, not the ongoing hands-on experience of earlier tiered storage and ILM products. But it’s still an integrated solution, usually involving a proprietary system. Using automated tiered storage as a justification to replace an array before it’s time may be difficult. But if a block-based storage system is in your customer’s immediate future, it would make sense to have the customer consider one with an automated tiering function. For the most part it runs in the background and improves storage density, optimization and cost.
For a customer looking for a way to consolidate existing storage, especially with multiple platforms, an external tiering solution may be the right choice. If the customer has a lot of NAS systems or is thinking about making the cloud a storage tier, they should consider a file-based tiering solution.
This was first published in February 2011