High-availability and clustering solutions for vSphere VMs

Eric Siebert

Clustering applications running on physical servers can be complicated and costly. For VARs looking to provide customers with clustering solutions, virtualization can help.

    Requires Free Membership to View

A great way for solution providers to create new business opportunities is to implement the advanced features in VMware vSphere for your customers. Advanced high-availability and fault-tolerance (FT) features in vSphere use the virtualization architecture to provide an easy clustering solution for virtual machines (VMs).

Before you can help customers decide which of these features best fits their environment, you need to define clustering at different layers in the computing stack, where it can protect against different types of failures:

Application clustering — This is typically built into applications that handle replication and failover to another server on its own without involving the operating system (OS). By installing the application on two different servers, the clustering feature can be configured in the application. An example of this type of clustering is Lotus Domino, which allows administrators to cluster multiple Domino servers.

OS clustering — This type of clustering is handled by the OS that’s responsible for syncing and cutting over an application from one server to another. The typical architecture for this is two servers that use a shared disk the application resides on. An example of this type of clustering is Microsoft Cluster Server (MSCS).

Hardware clustering — This type of clustering is done at the hardware layer and can consist of different hardware components inside and outside a server that prevent a single hardware component failure from crashing a server. Examples include RAID, redundant power supplies, multiple NICs, CPUs and memory dual in-line memory modules.

HA clustering for vSphere
VSphere is able to provide both HA and clustering at each of these layers using built-in features. Although these features require some of the more expensive vSphere licenses, they provide an easy way to implement inexpensive, simple clustering and HA solutions for your customers. Let’s take a look at how the vSphere features can help provide HA and clustering at the different computing layers.

Application clustering — VSphere cannot provide true application clustering, but it can provide HA for applications. HA Application Monitoring is a feature introduced in vSphere 4.1. It enables the HA feature to monitor the heartbeat of applications that have been modified to transmit a heartbeat vSphere can detect and restart a VM if the application is unresponsive. This adds another layer of the stack for which HA can monitor uptime (Host, OS and application). There are currently no applications that support this,, but there will probably be some in the future.

OS clustering — VSphere cannot provide true OS clustering, but it can provide HA for operating systems. A feature that was introduced to vCenter Server 2.5 called Virtual Machine Monitoring (VMM ) extends the HA feature to be able to detect guest OS failures by monitoring a heartbeat provided by VMware Tools. If a guest OS failure is detected—such as a Windows blue screen—the heartbeat would stop being received, and the VM would be restarted on the same host. VMware went a step further with this to try and prevent false restarts and enabled VMM to check for any VM disk and network to be certain that the OS was truly unresponsive. This feature works with any OS as long as VMware Tools is installed.

Hardware clustering — Although the underlying physical hardware of a vSphere host may be redundant to protect against a single hardware component failure, vSphere also provides some virtual hardware redundancy at the networking and storage layers. Virtual Switches, or vSwitches, can be configured with multiple physical NICs so VMs don’t lose network connectivity if a failure with the connectivity of a physical NIC occurs. With storage adapters, vSphere supports multi-pathing—except for NFS devices—so if a single path to a storage device fails, alternate paths can be used.

FT clustering in vSphere
The true clustering feature in vSphere is FT, which maintains an identical copy of a VM on a second host. This clustering is done at the virtualization layer, and the guest OS is unaware of it.

FT is designed to protect only against a host failure where you would normally lose all the VMs on a host until the host was brought back up or the VMs were started on other hosts with the HA feature. It does not protect against an OS failure or other hardware failures or those with a shared storage device. The primary VM on one host and the secondary VM on another host stay in sync by using a technology called Record/Replay.

FT works by creating a secondary VM on another host that shares the same virtual disk file as the primary VM and then transferring the CPU and virtual device inputs from the primary VM (record) to the secondary VM (replay) through a FT-logging NIC so it’s in sync with the primary and ready to take over in case of a failure. Although both the primary and secondary VMs receive the same inputs, only the primary VM produces output such as disk writes and network transmits.

Because the secondary VM’s output is suppressed by the hypervisor and is not on the network until it becomes a primary VM, both VMs function as a single VM. Because both the primary and secondary VMs are identical copies, if a failure such as a Windows BSOD occurs in the primary VM, it will also occur in the secondary VM. So even though FT does provide additional protection for a VM, it doesn’t provide total protection for it.

Using MSCS in vSphere
To achieve maximum protection, solution providers need to use a clustering solution such as MSCS. Virtualization makes implementing MSCS easy and affordable because solution providers create just two VMs without the need to purchase an additional physical server.

There are two methods for implementing MSCS in vSphere: Putting both VMs on the same host (cluster in a box) or having the VMs on separate hosts (cluster across boxes). The cluster in a box protects against application and OS failures, and the cluster across boxes provides the additional protection of host hardware failure.

Both solutions require the VMs to access the same virtual disk file. With the cluster in a box, the disk can reside either on local disk or on storage area network (SAN) disk.  But with the cluster across boxes, it must reside on SAN disk.

The requirements for using MSCS on vSphere are fairly straightforward and mostly storage related. For a cluster in a box, solution providers can use standard virtual disks—which is recommended—or a Raw Device Mapping (RDM) to a SAN disk in virtual compatibility mode. For a cluster across boxes, solution providers cannot use standard virtual disks. They must use RDMs either in physical —also recommended—or virtual compatibility mode.

Only Fibre Channel SAN disk is supported for use with the RDM disks, iSCSI and network file system, while the Fibre Channel over Ethernet disk is not supported because of latency that may occur with those protocols. Although MSCS can provide maximum protection for critical applications running on VMs, it does have some limitations. The FT, Distributed Resource Scheduler, VMotion and HA features are not supported on VMs using MSCS. The loss of these features isn’t a huge a deal because they are mostly used to provide availability, which MSCS is already providing.

As you can see, VARs have many availability options they can offer their customers for VMs running on vSphere. Your selection will be dependent on your customers’ requirements.  Some may need the most protection that MSCS offers, and others might be OK with the limited protection that FT and HA provide.

Some things to consider when choosing a solution for your customers is that the FT feature has some strict requirements and limitations that may not be a good fit for everyone. Make sure you are aware of them before implementing it.

Not all the vSphere advanced features are included in certain vSphere editions, so your customers may have to upgrade their vSphere licenses to make use of the features. Additionally, FT and HA require shared storage. You might have to make some architecture changes to implement them properly.

If you plan on implementing MSCS, be sure to read the VMware guide on how to properly implement it. No matter what availability solution you choose for your customers, make sure you understand its capabilities and limitations. Don’t forget to test it thoroughly to ensure that it performs as expected.

About the expert
Eric Siebert is a 25-year IT veteran whose primary focus is VMware virtualization and Windows server administration. He is one of the 300 vExperts named by VMware Inc. for 2009. He is the author of the book
VI3 Implementation and Administration and a frequent TechTarget contributor. In addition, he maintains, a VMware information site.

There are Comments. Add yours.

TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: