Business continuity

This section of the chapter excerpt will focus on using the VMware ESX server to provide business continuity solutions and options after a disaster.

By: Edward L. Haletky

Solution provider takeaway: This section of the chapter excerpt from the book VMware ESX Server in the Enterprise: Planning and Securing Virtualization Servers will focus on using the platform for business continuity.

Download the .pdf of the chapter here.

Business continuity with ESX can be accomplished in several fashions. Just like backups, BC has multiple paths that it can take. Some of these paths are automated, whereas others require human intervention. One of the ideas behind BC is to provide a way for the business to continue uninterrupted and the VMware cluster technology provides a part of this by the implementation of VMware DRS and VMware HA. Where DR-level backups are generally geared toward the re-creation of the environment, BC is the application of clustering technology to prevent the need for such time-consuming restoration. This is achieved using VMware HA but also through the use of preconfigured hot sites that can come online in a fraction of the full restoration time. Our methods discussed earlier implement hot sites by using backup means. However, hot servers or servers in the datacenter that do not do anything until they are needed are other options. VMware HA covers the latter case. VMware HA is a high-availability-based solution that will, when an ESX Server crashes, boot the VMs running there onto other ESX Servers either randomly (according to VMware DRS rules) or by using defined placement and boot order rules. The HPSIM VMM plug-in module also provides a similar HA capability by specifying an alternative host on which to run the VMs. Although VMware HA is only available for ESX 3, all versions can benefit from the HPSIM VMM alternative host method.

Outside of VMware HA, a myriad of other BC options are available. These include having as many redundant components as possible in different places within the datacenter, building, or campus, and there are multiple paths to all these devices from the ESX Servers. This leads to a much higher cost in and availability of hardware, but it will be the difference between a short service interruption and an absolute disaster. Consider the case of an ESX Server crashing with smoke pouring out of a vent. If you had invested in VMware HA, the software would automatically boot the VMs on another system. However, if you purchased a HP C-class blade in a RAID blade configuration, the ESX Server would fail over using a complete hardware solution. This leads to the question of which is better, hardware or software solutions. And the answer is, as always, it depends on the cost benefit. This same HP C-class blade has one limitation, the designated RAID blade must be in the same chassis of the failing node, and they must both share disks on a disk blade. This limits the amount of processing power to the blade chassis; and what happens if the chassis itself fails?

Many sites keep identical preinstalled hardware locked in a closet to solve some of these problems. However, it is your disaster to recover from, so think of all the solutions and draft the plan for both DR and BC appropriate for you.

The Tools

Now that the theory is explained, what tools are available for performing the tasks? Although each family of enterprise class remote storage has its own names for the capability to make LUN-to-LUN copies, refer to the SAN compatibility documentation to determine what is and is not supported, because it might turn out that the hot-copy mechanism for your SAN is not supported by any version of ESX. For example, HP SANs Business Copy and Continuous Access are supported, as is EMC SANs Hot Copy.

Beyond the remote storage--to--remote storage copies, many other tools are available from VMware and various vendors. These, in most cases, require some form of agent to be used to create the backup and place it on a data store associated with the tool. All these tools must first place a running VM in a delta mode (snapshot or REDO), which changes the file to which disk writes occur so that the backup software can make a copy of the primary VMDK, as in Figure 12.4. When the backup finishes, the delta is committed, and once more the primary VMDK is now used. The delta file grows over time as more data is written to it and grows in 15MB chunks to reduce SCSI-2 Reservation conflicts. However, because the delta file is really a FIFO and not a true VMDK, it is much slower and therefore dictates that a backup should occur when the VM is relatively inactive. And finally, the longer a VM is in a delta mode, the slower it will run, and the larger the delta file, which implies more locks. Each SCSI disk associated with a VMDK should be backed up separately to reduce the overall time spent in delta mode. Now, because we discussed with snapshots in Chapter 8, "Configuring ESX from a Host
Connection," it is possible to have a tree of delta files for every VM. In this case, most tools will not work, including VMware Consolidated Backup and Vizioncore's ESXRanger Pro products. They require all deltas or snapshots to be deleted first. These situations may require a specialized backup script to be used. In addition, these tools will not back up templates, and those also require a specialized backup script.

Figure 12.4 depicts delta mode processing for ESX version 3 and earlier versions. In previous versions, delta mode was really REDO mode. Deltas in ESX 3 are now created by the snapshot mechanism.

Once in delta mode, if the disk was not quiesced, a crash-consistent backup is created, which implies that a boot of a restored VMDK will boot as if the VM had crashed. Quiescing disks is limited to Windows VMs and only those that have snapshots. In addition, LUN-to-LUN or remote storage--to--remote storage copies also produce crash-consistent backups. A crash-consistent backup, depending on how it was achieved, should not be restored to anything other than a VMFS because the VMDKs are sparse files or monolithic files that have a beginning marker, some data, and an end marker and not much else. Restoration of such files to non-VMFS can result in loss of data. However, if the VM is first exported, resides on an NFS data store, or is in 2Gb sparse disk format, it can be restored to any file system and imported back into ESX with no possibility of data loss. With Fibre connections, it is not even necessary for the host that places the VM into delta mode to be the host that actually does the backup. This generally requires extra scripting, but it is possible for a host that is running the VM to place the VM into delta mode and then signal a backup host that the backup can be made. When the backup is completed, another signal is made to the running host to commit the backup. In this way, the backup host offloads the work from the running ESX Server. The signals could even be reversed so that the backup host does all the work and calls the running host only as necessary. Following here is an example script of this behavior for version 2.5. As for ESX version 3, this script is necessary only when the VMware Consolidated Backup proxy server is not in use or other tools such as Vizioncore's ESXRanger Pro or HPSIM's VMM are not in use.

Simple Backup Scripts

The following are by far the simplest backup scripts that can be written using the tools intrinsic to ESX. For ESX version 3, the script uses VCB, and for ESX version 2, the script uses Although the latter script exists in ESX 3, it points the user to the VCB functionality and does nothing else. Some VM configurations, however, will still need similar functionality. These are for those VMs that have snapshots disabled.

ESX version 3

The VCB tool has a command-line component for the ESX service console and a Windows 2003 Enterprise server. The VCB command-line tools will export VMs, create snapshots, and access the contents of and complete virtual disks for a VM. VCB provides a way to access and backup all VMs in an organized fashion for all ESX Servers and VMs using the VCB proxy server and any version 3 ESX Server.

In addition to implementing VCB, ESX version 3 implements a form of
VMware Workstation version 5 snapshot capability. VCB makes exclusive use of the snapshot functionality to make copies of VMDKs before a backup of the snapshot occurs. Under the hood, this keeps delta mode to a minimum and results in a copy of the VMDK that can then be exported or mounted to produce a valuable backup from what is now referred to as a proxy server. The proxy server in the preceding figures is not only the backup server but also forwards the backup images to another server, which talks to a tape or hot site. For example, in Figure 12.1, the proxy server is the second machine from the top on the right side of the image. This proxy server is a Windows machine or another ESX Server that has access to any of the storage devices used by ESX so that it can use the VCB commands to either mount the VMDK to the proxy server to aid in backup or export the VMDK to another location to create a backup.

If the VMDK is a physical mode raw disk map (RDM), a snapshot cannot be made, and the only method of backup of the RDM is to use the traditional methods available: backup from within the host or SAN copies. A virtual mode RDM still works as expected. If the VMDK is an independent disk, snapshots will not work, and the traditional methods to make backups must be used. ESX version 2.5.x and earlier use the independent disk VMDK format, which requires the REDO mode capabilities provided by Because snapshots are used by VCB, the disabling of the snapshot features will keep VCB from working.

ESX version 3 also includes a method of scheduling automatic snapshots to make a point-in-time backup of the guest that can then be explicitly exported using VCB, and then dropped to tape, media, or copied across to a hot site after the snapshot is fully built. This functionality will work with running and stopped VMs, which is an advance over ESX version 2.5.x. Automating a snapshot offers a method to save old data from being overwritten until the snapshot is deleted or merged with the previous snapshot. Currently, VCB-based backup tools will not work if there are existing snapshots, so this functionality has limited usage, except perhaps to automatically trigger an action to occur. No alarm will trigger when a snapshot is made, but using the open source VI Perl interface it is possible to poll for this state and take an appropriate action. It is a complex method that most tools that hook into VCB already do for you.

The other option is to use VCB with the provided scripts or write your own to back up the VMs using VCB exclusively. The scripts provided are for TSM and Veritas Networker and BackupExec. The scripts run from a Windows proxy server to access the VMs using remote VCB tools that also exist on every ESX Server. The VCB scripts provide pre- and post-backup work to be done in relation to the backup of the VM in question. However, these scripts are not useful if the VM's VMDKs are in independent mode.

VCB consolidates the need for licenses to just the proxy server. That way, if there are 300 VMs running, there is no need for 300 VM or 10 or so ESX Server licenses of the backup client. This cost savings can be significant, and the single location to produce backups, the proxy server, aids in operational issues by placing the burden of backup there so that backups no longer impact the performance of the VM or ESX Server and cuts down locking to a minimum.

The following script works from the ESX 3 service console. This code locks the
BACKUP volume, and if the lock exists, does not perform the backup. When the lock exists, it uses the service console vcbMounter command to run the backup:


export PATH


if [ -e $BACKUP/.lock ]
for x in `/usr/bin/vmware-cmd -l | sort -r`

/bin/touch $BACKUP/.lock
y=`/bin/basename $x .vmx`
#### Keep two backups, uncomment the following two lines
#/bin/rm -rf $BACKUP/${y}.bak
#/bin/mv $BACKUP/$y $BACKUP/${y}.bak
/usr/sbin/vcbMounter -a name:$y -r $BACKUP/$y

/bin/touch $BACKUP/.done

By uncommenting the two lines inside the for loop, it is possible to keep two copies of the backups. This is a simplistic approach to backups using two common
ESX commands (vmware-cmd, vcbMounter). Note that normally the vcbMounter command takes a host, username, and password, but these are hard-coded into the
/etc/vmware/backuptools.conf file, which for security reasons should just have read-only permissions for the root user.

VMware ESX Server in the Enterprise: Planning and Securing Virtualization Servers
  Disaster recovery and backup - introduction
  Business continuity
  ESX Version 2
  Vendor tools

About the book

VMware ESX Server in the Enterprise: Planning and Securing Virtualization Servers is the definitive, real-world guide to planning, deploying, and managing today's leading virtual infrastructure platform in mission-critical environments.. Purchase the book from Prentice Hall.

Reproduced from the book VMware ESX Server in the Enterprise. Copyright 2008, Prentice Hall. Reproduced by permission of Pearson Education, Inc., 800 East 96th Street, Indianapolis, IN 46240. Written permission from Pearson Education, Inc. is required for all other uses.

Dig Deeper on Storage Backup and Disaster Recovery Services

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.