Solution provider's takeaway: By learning the various VSphere Fault Tolerance (FT) requirements and knowing how FT logging works, you will be able to help protect your customer's environment in the instance of a host failure.
Fault Tolerance (FT)
Fault Tolerance (FT) was introduced as a new feature in vSphere to provide something that was missing in VI3: continuous availability for a VM in case of a host failure. HA was introduced in VI3 to protect against host failures, but it caused the VM to be down for a short period of time while it was restarted on another host. FT takes that to the next level and guarantees that the VM stays operational during a host failure by keeping a secondary copy of it running on another host server; in case of a host failure, that VM then becomes the primary VM and a new secondary is created on another functional host. The primary VM and secondary VM stay in sync with each other by using a technology called Record/Replay that was first introduced with VMware Workstation. Record/Replay works by recording the computer execution on a VM and saving it into a logfile; it can then take that recorded information and replay it on another VM to have a copy that is a duplicate of the original VM.
The technology behind the Record/Replay functionality is built into certain models of Intel and AMD processors, and is called vLockstep by VMware. This technology required Intel and AMD to make changes to both the performance counter architecture and virtualization hardware assists (Intel VT and AMD-V) that are inside their physical processors. Because of this, only newer processors support the FT feature; this includes the third-generation AMD Opteron based on the AMD Barcelona, Budapest, and Shanghai processor families; and Intel Xeon processors based on the Core 2 and Core i7 micro architectures and their successors. VMware has published a Knowledge Base article (http://kb.vmware.com/kb/1008027) that provides more details on this.
How FT works
FT works by creating a secondary VM on another ESX host that shares the same virtual disk file as the primary VM, and then transfers the CPU and virtual device inputs from the primary VM (record) to the secondary VM (replay) via an FT logging NIC so that it is in sync with the primary and ready to take over in case of a failure. Although both the primary and secondary VMs receive the same inputs, only the primary VM produces output such as disk writes and network transmits. The secondary VM's output is suppressed by the hypervisor and is not on the network until it becomes a primary VM, so essentially both VMs function as a single VM. It's important to note that not everything that happens on the primary VM is copied to the secondary; certain actions and instructions are not relevant to the secondary VM, and to record everything would take up a huge amount of disk space and processing power. Instead, only nondeterministic events which include inputs to the VM (disk reads, received network traffic, keystrokes, mouse clicks, etc.) and certain CPU events (RDTSC, interrupts, etc.) are recorded. Inputs are then fed to the secondary VM at the same execution point so that it is in exactly the same state as the primary VM.
The information from the primary VM is copied to the secondary VM using a special logging network that is configured on each host server. It is highly recommended that you use a dedicated gigabit or higher NIC for the FT logging traffic; using slower-speed NICs is not recommended. You could use a shared NIC for FT logging for small or dev/test environments and for testing the feature. The information that is sent over the FT logging network between the two hosts can be very intensive depending on the operation of the VM. VMware has a formula that you can use to determine the FT Logging bandwidth requirements:
VMware FT logging bandwidth = (Avg disk reads (MB/s) ×
8 + Avg network input (Mbps)) × 1.2 [20% headroom ]
To get the VM statistics needed for this formula you must use the performance metrics that are supplied in the vSphere Client. The 20% headroom is to allow for CPU events that also need to be transmitted and are not included in the formula. Note that disk or network writes are not used by FT, as these do not factor into the state of the VM. As you can see, disk reads will typically take up the most bandwidth, and if you have a VM that does a lot of disk reading, you can reduce the amount of disk read traffic across the FT logging network by adding a special VM parameter, replay.logReadData = checksum, to the VMX file of the VM; this will cause the secondary VM to read data directly from the shared disk instead of having it transmitted over the FT logging network. For more information on this, see the Knowledge Base article at http://kb.vmware.com/kb/1011965.
It is important to note that if you experience an OS failure on the primary VM, such as a Windows BSOD, the secondary VM will also experience the failure, as it is an identical copy of the primary. However, the HA VM monitor feature will detect this, and will restart the primary VM and then respawn a new secondary VM. Also note that FT does not protect against a storage failure; since the VMs on both hosts use the same storage and virtual disk file, it is a single point of failure. Therefore, it's important to have as much redundancy as possible, such as dual storage adapters in your host servers attached to separate switches (multipathing), to prevent this. If a path to the SAN fails on the primary host, the FT feature will detect this and switch over to the secondary VM, but this is not a desirable situation. Furthermore, if there was a complete SAN failure or problem with the LUN that the VM was on, the FT feature would not protect against this.
Because of the high overhead and limitations of FT, you will want to use it sparingly. FT could be used in some cases to replace existing Microsoft Cluster Server (MSCS) implementations, but it's important to note what FT does not do, which is to protect against application failure on a VM; it only protects against a host failure. If protection for application failure is something you need, a solution such as MSCS would be better for you. FT is only meant to keep a VM running if there is a problem with the underlying host hardware. If you want to protect against an operating system failure, the VMware HA feature can provide this also, as it can detect unresponsive VMs and restart them on the same host server. You can use FT and HA together to provide maximum protection; if both the primary and secondary hosts failed at the same time, HA would restart the VM on another operable host and respawn a new secondary VM.
Although FT is a great feature, it does have many requirements and limitations that you should be aware of. Perhaps the biggest is that it currently only supports single vCPU VMs, which is unfortunate, as many big enterprise applications that would benefit from FT usually need multiple vCPUs (e.g., vSMP). But don't let this discourage you from running FT, as you may find that some applications will run just fine with one vCPU on some of the newer, faster processors that are available. VMware has mentioned that support for vSMP will come in a future release. Trying to keep a single vCPU in lockstep between hosts is no easy task, and VMware needs more time to develop methods to try to keep multiple vCPUs in lockstep between hosts.
Here are the requirements for the host.
- The vLockstep technology used by FT requires the physical processor extensions added to the latest processors from Intel and AMD. In order to run FT, a host must have an FT-capable processor, and both hosts running an FT VM pair must be in the same processor family.
- CPU clock speeds between the two hosts must be within 400MHz of each other to ensure that the hosts can stay in sync.
- All hosts must be running the same build of ESX or ESXi and be licensed for FT, which is only included in the Advanced, Enterprise, and Enterprise Plus editions of vSphere.
- Hosts used together as an FT cluster must share storage for the protected VMs (FC, iSCSI, or NAS).
- Hosts must be in an HA-enabled cluster.
- Network and storage redundancy is recommended to improve reliability; use NIC teaming and storage multipathing for maximum reliability.
- Each host must have a dedicated NIC for FT logging and one for VMotion with speeds of at least 1Gbps. Each NIC must also be on the same network.
- Host certificate checking must be enabled in vCenter Server (configured in vCenter Server Settings → SSL Settings).
Here are the requirements for the VMs.
- The VMs must be single-processor (no vSMPs).
- All VM disks must be "thick" (fully allocated) and not "thin." If a VM has a thin disk, it will be converted to thick when FT is enabled.
- There can be no nonreplayable devices (USB devices, serial/parallel ports, sound cards, a physical CD-ROM, a physical floppy drive, physical RDMs) on the VM.
- Most guest OSs are supported, with the following exceptions that apply only to hosts with third-generation AMD Opteron processors (i.e., Barcelona, Budapest, Shanghai): Windows XP (32-bit), Windows 2000, and Solaris 10 (32-bit). See VMware Knowledge Base article 1008027 (http://kb.vmware.com/kb/1008027) for more details.
In addition to these requirements, there are also many limitations when using FT, and they are as follows.
- Snapshots must be removed before FT can be enabled on a VM. In addition, it is not possible to take snapshots of VMs on which FT is enabled.
- N_Port ID Virtualization (NPIV) is not supported with FT. To use FT with a VM you must disable the NPIV configuration.
- Paravirtualized adapters are not supported with FT.
- Physical RDM is not supported with FT. You may only use virtual RDMs.
- FT is not supported with VMs that have CD-ROM or floppy virtual devices connected to a physical or remote device. To use FT with a VM with this issue, remove the CD-ROM or floppy virtual device or reconfigure the backing with an ISO installed on shared storage.
- The hot-plug feature is automatically disabled for fault tolerant VMs. To hot-plug devices (when either adding or removing them), you must momentarily turn off FT, perform the hot plug, and then turn FT back on.
- EPT/RVI is automatically disabled for VMs with FT turned on.
- IPv6 is not supported; you must use IPv4 addresses with FT.
- You can only use FT on a vCenter Server running as a VM if it is running with a single vCPU.
- VMotion is supported on FT-enabled VMs, but you cannot VMotion both the primary and secondary VMs at the same time. SVMotion is not supported on FT-enabled VMs.
- In vSphere 4.0, FT was compatible with DRS, but the automation level was disabled for FT-enabled VMs. Starting in vSphere 4.1, you can use FT with DRS when the EVC feature is enabled. DRS will perform initial placement on FT-enabled VMs and also will include them in the cluster's load-balancing calculations. If EVC in the cluster is disabled, the FT-enabled VMs are given a DRS automation level of "disabled". When a primary VM is powered on, its secondary VM is automatically placed, and neither VM is moved for load-balancing purposes.
You might be wondering whether you meet the many requirements to use FT in your own environment. Fortunately, VMware has made this easy for you to determine by providing a utility called SiteSurvey (www.vmware.com/download/ shared_utilities.html) that will look at your infrastructure and see if it is capable of running FT. It is available as either a Windows or a Linux download, and once you install and run it, you will be prompted to connect to a vCenter Server. Once it connects to the vCenter Server, you can choose from your available clusters to generate a SiteSurvey report that shows whether your hosts support FT and if the hosts and VMs meet the individual prerequisites to use the feature. You can also click on links in the report that will give you detailed information about all the prerequisites along with compatible CPU charts. These links go to VMware's website and display the help document for the SiteSurvey utility, which is full of great information about the prerequisites for FT. In vSphere 4.1, you can also click the blue caption icon next to the Host Configured for FT field on the Host Summary tab to see a list of FT requirements that the host does not meet. If you do this in vSphere 4.0, it shows general requirements that are not specific to the host.
Another method for checking to see if your hosts meet the FT requirements is to use the vCenter Server Profile Compliance tool. To check using this method just select your cluster in the left pane of the vSphere Client, and then in the right pane select the Profile Compliance tab. Click the Check Compliance Now link and it will check your hosts for compliance, including FT. Before you enable FT, be aware of one important limitation: VMware currently recommends that you do not use FT in a cluster that consists of a mix of ESX and ESXi hosts. This is because ESX hosts might become incompatible with ESXi hosts for FT purposes after they are patched, even when patched to the same level. This is a result of the patching process and will be resolved in a future release so that compatible ESX and ESXi versions are able to interoperate with FT even though patch numbers do not match exactly. Until this is resolved, you will need to take this into consideration if you plan to use FT, and make sure you adjust your clusters that will have FT-enabled VMs so that they consist of only ESX or ESXi hosts and not both. See VMware Knowledge Base article 1013637 (http://kb.vmware.com/kb/1013637) for more information on this.
Implementing FT is fairly simple and straightforward once you meet the requirements for using it. The first step is to configure the networking needed for FT on the host servers. You must configure two separate vSwitches on each host: one for VMotion and one for FT logging. Each vSwitch must have at least one 1Gbps NIC, but at least two are recommended for redundancy. The VMotion and FT logging NICs must be on different network subnets. You can do this by creating a VMkernel interface on each vSwitch, and selecting "Use this port group for VMotion" on one of them and "Use this port group for Fault Tolerance logging" on the other. You can confirm that the networking is configured by selecting the Summary tab for the host; the VMotion Enabled and Fault Tolerance Enabled fields should both say Yes. Once the networking is configured, you can enable FT on a VM by right-clicking on it and choosing the Fault Tolerance item, and then Turn On Fault Tolerance.
Once enabled, a secondary VM will be created on another host; at that point, you will see a new Fault Tolerance section on the Summary tab of the VM that will display information including the FT status, secondary VM location (host), CPU and memory in use by the secondary VM, secondary VM lag time (how far behind it is from the primary, in seconds), and bandwidth in use for FT logging. Once you have enabled FT, alarms are available that you can use to check for specific conditions such as FT state, latency, secondary VM status, and more.
Here is some additional information that will help you understand and implement FT.
- VMware spent a lot of time working with Intel and AMD to refine their physical processors so that VMware could implement its vLockstep technology, which replicates nondeterministic transactions between the processors by reproducing their CPU instructions. All data is synchronized, so there is no loss of data or transactions between the two systems. In the event of a hardware failure, you may have an IP packet retransmitted, but there is no interruption in service or data loss, as the secondary VM can always reproduce execution of the primary VM up to its last output.
- FT does not use a specific CPU feature, but requires specific CPU families to function. vLockstep is more of a software solution that relies on some of the underlying functionality of the processors. The software level records the CPU instructions at the VM level and relies on the processor to do so; it has to be very accurate in terms of timing, and VMware needed the processors to be modified by Intel and AMD to ensure complete accuracy. The SiteSurvey utility simply looks for certain CPU models and families, but not specific CPU features, to determine whether a CPU is compatible with FT. In the future, VMware may update its CPU ID utility to also report whether a CPU is FT-capable.
- In the case of split-brain scenarios (i.e., loss of network connectivity between hosts), the secondary VM may try to become the primary, resulting in two primary VMs running at the same time. This is prevented by using a lock on a special FT file; once a failure is detected, both VMs will try to rename this file, and if the secondary succeeds it becomes the primary and spawns a new secondary. If the secondary fails because the primary is still running and already has the file locked, the secondary VM is killed and a new secondary is spawned on another host.
- There is no limit to the number of FT-enabled hosts in a cluster, but you cannot have FT-enabled VMs span clusters. A future release may support FT-enabled VMs spanning clusters.
- There is an API for FT that provides the ability to script certain actions, such as disabling/enabling FT using PowerShell.
- There is a limit of four FT-enabled VMs per host (not per cluster); this is not a hard limit, but is recommended for optimal performance.
- The current version of FT is designed to be used between hosts in the same datacenter, and is not designed to work over WAN links between datacenters due to latency issues and failover complications between sites. Future versions may be engineered to allow for FT usage between external datacenters.
- Be aware that the secondary VM can slow down the primary VM if it is not getting enough CPU resources to keep up. This is noticeable by a lag time of several seconds or more. To resolve this, try setting a CPU reservation on the primary VM which will also be applied to the secondary VM and will ensure that both VMs will run at the same CPU speed. If the secondary VM slows down to the point that it is severely impacting the performance of the primary VM, FT between the two will cease and a new secondary will be created on another host.
- Patching hosts can be tricky when using the FT feature because of the requirement that the hosts have the same build level, but it is doable, and you can choose between two methods to accomplish this. The simplest method is to temporarily disable FT on any VMs that are using it, update all the hosts in the cluster to the same build level, and then reenable FT on the VMs. This method requires FT to be disabled for a longer period of time; a workaround if you have four or more hosts in your cluster is to VMotion your FT-enabled VMs so that they are all on half of your ESX hosts. Then update the hosts without the FT VMs so that they are the same build levels; once that is complete, disable FT on the VMs, VMotion the primary VMs to one of the updated hosts, reenable FT, and a new secondary will be spawned on one of the updated hosts that has the same build level. Once all the FT VMs are moved and reenabled, update the remaining hosts so that they are the same build level and then VMotion the VMs around so that they are balanced among all the hosts.
- FT can be enabled and disabled easily at any time; often this is necessary when you need to do something that is not supported when using FT, such as an SVMotion, snapshot, or hot-add of hardware to the VM. In addition, if there are specific time periods when VM availability is critical, such as when a monthly process is running, you can enable it for that time frame to ensure that it stays up while the process is running, and disable it afterward.
- When FT is enabled, any memory limits on the primary VM will be removed and a memory reservation will be set equal to the amount of RAM assigned to the VM. You will be unable to change memory limits, shares, or reservations on the primary VM while FT is enabled.
For more information on FT, check out VMware's Availability Guide that is included as part of the vSphere documentation (http://vmware.com/pdf/ vsphere4/r40_u1/vsp_40_u1_availability.pdf).
In this chapter, we covered some of the more popular advanced features in vSphere. There is a lot to learn about these features, so make sure you read through the documentation and get as much hands-on experience with them as you can before implementing them in a production environment. VMware's Knowledge Base has a great deal of articles specifically about these features, so make sure you look there for any gotchas or compatibility issues as well tips for troubleshooting problems.
Advanced vSphere Features
Enabling VMware HA, DRS: Advanced vSphere features
Configuring VMotion SVMotion: Requirements for VARs
VSphere Fault Tolerance requirements and FT logging
Eric Siebert is a 25-year IT veteran whose primary focus is VMware virtualization and Windows server administration. He is one of the 300 vExperts named by VMware Inc. for 2009. He is the author of the book VI3 Implementation and Administration and a frequent TechTarget contributor. In addition, he maintains vSphere-land.com, a VMware information site.
Printed with permission from Pearson Publishing. Copyright 2010. Maximum vSphere: Tips, How-Tos, and Best Practices for Working with VMware vSphere 4 by Eric Siebert. For more information about this title and other similar books, please visit http://www.pearsonhighered.com.