Server virtualization in general, and VMware in particular, is supposed to help your customers reduce costs and save money by increasing the utilization of physical server hardware. For the most part, it works. Tales of successful and good-ROI server virtualization projects are commonplace. In the projects' wake, however, is a storage disaster area that your customers need help resolving.
The solution is a relatively simple one -- a tool for better management of customers' storage IO bandwidth combined with higher utilization rates. But you need to know how to use the tool and at what frequency.
Server virtualization brings unique challenges when it comes to storage. First, multiple workloads coming from a single physical host simultaneously cause random I/O -- a problem since it can't be stored in cache memory and therefore its performance is very difficult to enhance. And these workloads, via virtual machine migration, can shift from one physical host to another, so identifying which server is using which storage device is very challenging.
Beyond those two problems, there's also the issue of wasted capacity. Most virtual machines are created from templates that allow for rapid virtual machine deployment. One of the template settings controls the default size for the virtual machine disk image (VMDK). Most customers set this at a relatively high number -- 50 GB to 100 GB; the initial space required for the OS and application of the virtual machine is typically more than 20 GB. The problem, of course, is that, while similar, no two virtual machines are identical; the space required for each will vary greatly. Setting the default VMDK size for the worst-case scenario leads to a lot of wasted storage capacity. It's not uncommon to find that less than 50% of the space allocated to VMDKs is actually used. The opposite scenario is also a reality: Sometimes, the preallocated space is too small, causing virtual machine panic.
To give you an idea of what this wasted space might translate to at a customer site, assume that your customer created their virtual machine templates and set a default size of 100 GB per VMDK. If their environment consisted of a virtual infrastructure that has five physical machines, each of those with 10 virtual machines, that would amount to 5 TB of space allocated to virtual machines. A 50% utilization rate means 2.5 TB of wasted capacity. It's worth noting that the example here is on the low end. It's not at all uncommon for a virtual infrastructure to have 10, 15 or more physical machines with 20-plus virtual machines. The higher the number of virtual machines, the more critical the problem of wasted space becomes.
You generally can't address these problems without a tool. Software applications from companies like Akorri, Tek-Tools and Aptare can help increase customers' average utilization rates by assessing, monitoring and reporting on storage utilization in a virtualized world. These tools enable you to show your customer their environment's storage utilization from several angles: virtual cluster to the storage, physical server to its assigned storage and virtual machine to its assigned storage.
Without one of the above tools, it's difficult to see what storage is attached to what virtual machine. But once that view is enabled, your customers can move on to inventorying their environment by virtual machine.
Enabling all three of these views will allow your customer to "see" what virtual machines are the most demanding from a storage IO perspective and what physical hosts have excess storage IO bandwidth available to them. VMs that are 6 months old and still have used less than 20% of their allocation are ideal candidates for resizing. VMs that are at a critical state or are trending that way can be identified and have their allocation increased.
Some of these tools even let you play "what if" and simulate moving one virtual machine to another physical host; you can see the performance impact of rebalancing without taking the performance risk until you know it's safe.
With a tool that can capture IO statistics, you can also predict the impact of upgrading the SAN infrastructure to 8 GB Fibre Channel or moving low-priority VMs to SATA-based storage.
Quarterly assessment vs. active monitoring
While these tools are not new, their specialization in the server virtualization environment is, and that environment changes the frequency at which utilization assessments need to be done. In the past, it may have been acceptable to use these tools as part of a quarterly assessment that you would offer as part of your services. The problem with today's data centers and virtual servers is that, by their nature, they're very dynamic. Data centers need to be run more efficiently and closer to capacity. Doing so requires real-time monitoring by you via a remote connection or by your customer (and they may not have the time or expertise to execute on that). So while these tools can be used as part of a periodic assessment service, they're really much more useful when used for active monitoring.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.