Your customer's vSphere environment may seem healthy enough at first glance, but external appearances can be deceiving. Hidden problems may lurk under the cover that can eventually cause the environment to become unhealthy.
Not all problems that can occur are obvious to VARs because virtualization is much more complicated than traditional computing environments. There are many more moving parts that can lead to more complications. Seemingly trivial things such as simple configuration settings can have a ripple effect on the environment if they aren’t set correctly. Virtualization is all about sharing a limited set of resources amongst many virtual machines (VMs) and ensuring the optimum performance and availability you need to ensure the environment stays healthy.
A vSphere environment health check is one of the most valuable services that you can offer to customers because vSphere needs constant maintenance to keep it operating efficiently and problem-free. A good health check should be done on a periodic basis and is similar to a home inspection in that it documents the environment and checks the many different vSphere components to identify any existing or potential problems that can occur. Health check results can also be used to identify potential optimization and performance improvement opportunities that VARs can turn into value-adds.
Performing a vSphere health check: The basics
A health check mainly consists of generating reports to gather detailed information on your customer’s vSphere environment. The two key areas of focus are configuration and performance. With this in mind, VMware created a tool that can be deployed as a virtual appliance that has pre-built reporting templates to automatically gather information. This tool, however, is only available to VMware partners and not to customers. A health check can be customized as you see fit and there are many free tools and scripts that you can use to perform one. A VAR’s health check should include the following:
- Check for any snapshots that are active and note create date and size.
- Check for any orphaned VMs.
- Check vCenter Server and host versions and patches to see if they are up to date.
- Check that VMware Tools is installed and up to date on all VMs.
- Check the health and size of the vCenter Server database.
- Check for VMs with connected removable devices.
- Check host and vCenter Server log files for errors.
- Check vCenter Server licensing.
- Dump the configuration of the host to a text file.
- General host configuration -- DNS, time, memory, VM startup/shutdown, VM swap files, power management, console partition size/free space and vCPUs per core.
- VMotion -- Compatibility, EVC configuration and network configuration.
- VM configuration: CPU/memory/disk sizing, vNIC types, unnecessary hardware and guest OS setting.
- High Availability (HA) configuration -- Node states, slot sizes and admission control.
- Distributed Resource Scheduler (DRS) configuration --Rules, power management, thresholds, deviations.
- Resource controls -- Shares/limits/reservations an d resource pools
- Network configuration -- Load balancing, redundancy, NIC speed/duplex, failover order and Network I/O control.
- Storage configuration -- Multi-pathing, free space, VM-datastore ratio, zoning, LUN size, Storage I/O control, volume/block sizes, extents and RDMs.
- CPU statistics: -- % Ready, used and usage.
- Memory statistics -- Active, swapped and ballooned.
- Disk statistics -- Guest and device latency, queued, commands (IOPS) and usage.
- Network statistics -- Transmit and receive packets dropped, usage.
- ESXi -- Tech Support Mode enabled, root password set, AD authentication and lockdown mode.
- ESX -- Firewall, AD authentication, sudo setup, audit root access, root allowed via SSH and Web access disabled.
- Storage -- Physical LUN security and masking, CHAP authentication for iSCSI and datastore browser access.
- Network -- VSwitch/VM placement, vSwitch configurations and isolated storage/management traffic.
- VCenter Server -- Permissions, roles, database, SSL certificates.
- VM -- Remote console access, configuration and operation privileges.
Security isn’t typically included in a health check, but security is just as important as health in your customer’s environment. You should include it if you want to provide some value-added services to a health check engagement.
Value-adds for VARs: Tools, applications and scripts
There are also a wide range of tools that you can use for a health check at a customer site. Some tools are free and others are paid products that have free evaluation periods, which can also represent an opportunity for VARs to sell the product to interested customers. Additionally there are some free pre-built scripts that you can run to collect a wealth of information from your customer’s vSphere environment.
- VMware Health Analyzer Utility –VMware developed this utility and it’s only available to VMware Partners. It deploys as a virtual appliance, uses pre-built reporting templates to gather information about the environment and combs through log files to identify any errors that may be occurring.
- VMware SiteSurvey - Produces a hardware compatibility and software configuration compatibility report with advanced VMware features such as fault tolerance.
- RV Tools – Leverages the VMware SDKs to collect information from vCenter Servers and ESX/ESXi hosts. Supports both VI3 and vSphere and displays a wide variety of valuable information in a simple spreadsheet-like interface. Information collected can be sorted and filtered into rows and columns and contains data not found in vCenter Server like the number of VMs per core and the number of vCPUs per core on a host.
- Esxcfg-info – Built into the ESX/ESXi management console and also the management assistant (vMA). Outputs a wealth of information about the host that it’s run on. Can be redirected to a text file to document the host configuration.
- Vm-support - Built into the ESX/ESXi management console and also the vMA. Gathers up a large amount of configuration info, log files and the output from many commands into a single .tgz archive file.
- vKernel – Provides reporting information and also has a performance analyzer that can help identify and resolve performance-related issues.
- Hyper9 – A multi-function reporting tool that comes equipped with pre-built dashboards to quickly spot problem areas in a vSphere environm ent.
- Veeam – Monitor is good for viewing performance data and seeing which hosts have the highest workloads. Reporter is good for documenting and analyzing an entire vSphere infrastructure.
- HealthCheck Script (PowerShell)
- vSphere Health Check Report (Perl or vMA)
- vCheck Daily Report (PowerShell)
- vSphere Security Hardening Report (Perl or vMA)
When performing a health check don’t forget that one of the main deliverables is the gathered information that can go into a simple assessment report for your customer. Health checks are a vital service to ensure that your customer knows their vSphere environment and it performs well. You may not always uncover big problems, but at least you can deliver the peace of mind to a customer that they have a healthy vSphere environment.
About the expert
Eric Siebert is a 25-year IT veteran whose primary focus is VMware virtualization and Windows server administration. He is one of the 300 vExperts named by VMware Inc. for 2009. He is the author of the book VI3 Implementation and Administration and a frequent TechTarget contributor. In addition, he maintains vSphere-land.com, a VMware information site.