So what is thin provisioning? Thin provisioning is a function being incorporated into storage systems that addresses a specific problem in today's data center: capacity that has been allocated to a server but is unused. For example, your customer's database administrator may ask the storage administrator for 500 GB of capacity to house the new Oracle application that he is going to develop and roll out to the organization. The problem is that this request is likely based on a rough guess of what the database will eventually grow to. Before that growth, the application has to be developed, tested, rolled out and adopted -- a process that might take years. The initial capacity actually used is significantly smaller than what was requested, and the remaining space goes unused for the years that it takes the database to grow to its projected highest level.
This phenomenon creates several problems for the storage manager. First, the unused but spoken-for capacity oftentimes could be better used elsewhere. Second, that capacity has to be bought at today's prices even though it will be significantly less expensive next year. Third, the unused capacity has to be racked, powered and cooled, wasting power and data center floor space; it's likely that in the future that capacity could be delivered on fewer, higher-capacity, more power-efficient drives. Finally, this free space cannot be optimized. There is nothing to deduplicate or compress since it is just empty free space.
Thin provisioning solves these problems by allocating capacity only as the application actually consumes it. The storage administrator can grant the DBA the capacity that he requests and the application thinks it has that capacity, but the capacity is only consumed as data is written.
Thin provisioning problems and how to address them
Despite its merits, thin provisioning is not without drawbacks. First, there is the obvious concern that a thinly provisioned storage system could run out of disk space since it allows overallocation of actual capacity. (For instance, say that a 5 TB system has allocated 10 TB of virtual capacity, of which only 3 TB is presently in use. As the use of the system grows, that 3 TB will slowly grow, threatening to exceed the 5 TB physical limit.) This perception is easily addressed: Running out of physical capacity is highly unlikely. Every system on the market today that does thin provisioning has extensive reporting and alerting of out-of-space conditions long before they actually occur. And these alerts actually provide a way for you to add value; most systems allow the reports to be automatically emailed to you, and you can take on responsibility around monitoring storage use of the system, helping to assure customers they won't run out of space.
The second problem in thin provisioning: concerns about performance. It does indeed take extra processing power on the part of the storage system to manage an "allocation as it happens" scenario. One solution to this is to thinly provision in a not-so-thin fashion. Some products will allocate relatively large chunks of data, ranging from a few megabytes to several gigabytes, so that they are not constantly allocating new capacity. While this may reduce some of the performance impact, it also lessons the efficiency of space utilization. Other products have overallocated the storage processing capabilities of their systems to handle the thin provisioned volumes; the manufacturers deliver systems with more storage compute performance that what they would normally need. Still others will suggest a limit on how many thin-provisioned volumes you create to restrict the processing overhead impact on the system. Finally, some companies are building special ASICs to help offload the work from the storage processor itself.
The third problem in thin provisioning relates to data migration. Thin provisioning works best on net-new data. When migrating data from an existing system to a thin-provisioned system, the customer will often want to use a SAN copy utility to quickly copy the old data to the new system. But this type of copy function copies everything on the old volume block by block to the new thin volume. The thin volume does not typically have the intelligence to determine if the old block contains actual data, deleted data or free space. A related problem is aging data. As a volume ages, there's more and more deleted data to contend with. Like with migrated data, the thin volume does not have native intelligence to understand if blocks on those volumes have deleted data on them. So while the file system frees up the space, the actual free space is not returned to the storage system.
There are two potential ways to address both of these problems: either the thin provisioned system needs to communicate with the file system via an API, or the storage system needs to perform zero space detection. In the API example, the file system will communicate with the storage system during a migration or deletion to let the storage system know what blocks do not have valid data on them. Symantec's Storage Foundation includes such an API, and Microsoft is reportedly working on one as well. A zero space detection strategy, on the other hand, requires a home-grown utility to run on the file system to zero out the deleted data. The thinly provisioned volume can then determine where to look when reclaiming space; it can examine blocks of data and determine whether they have real data or whether the data has been zeroed out. Both the API and the zero space detection utility are time-consuming endeavors, examples of where specialized processors may be of value in a storage system because this additional workload may greatly impact performance.
Even with these disadvantages and perceptions of problems, thin provisioning brings great value to your customers. Being able to articulate the pros and cons of the technology provides you with a key opportunity to add value. Most of the negatives to thin provisioning can be worked around with selection of the proper system. Your job is to walk the customer through that selection process.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.
This was first published in December 2009