Implementing and supporting a fault-tolerant, high-availability system for a customer is a complex endeavor, requiring...
the right software for the job as well as the right skill set within your team of employees. As a value-added reseller (VAR), you'll want to assign equal importance to both requirements as you prepare for and move through a high-availability project.
Veritas is the most widely recognized maker of high-availability software; its Data Center family includes Cluster Server, Configuration Manager and Volume Replicator, each of which tackles a different aspect of high availability.
Cluster Server aims to enable a graceful application restart on a backup server in the event of a main server failure; it even supports virtual machine architectures and replication across wide-area networks (WANs).
If disk-to-disk replication is more of a concern for your customers than application failover, Volume Replicator, which provides remote data replication capabilities, would be more appropriate.
Configuration Manager automatically discovers any changes that might be made to a monitored system's hardware or software inventory, so any changes that might affect downtime (or the status of a high-availability subsystem, like disks) are reported back to the administrator.
Veritas's approach is aimed at customers looking for a total solution and not simply one to fix a few specific problems; it gives you freedom to pick and choose the applications that best address the problem.
R1Soft's CDP solution, on the other hand, is designed for hosting companies. Its scope is not as broad as the Veritas suite's, but that's mostly by design. CDP is meant to do one job and do it well: make continuous real-time replicas of on-disk data to a backup server. It's available for both Windows and Linux; on Windows, CDP integrates with the Volume Shadow Copy service, so programs that are aware of Shadow Copies (SQL Server and Exchange Server, for example) can be synchronized transparently through CDP. Note that people who are not themselves hosting providers may still be interested in the product.
As you consider which tools might be best for you to gain expertise in, keep in mind the different target markets of the two tools. The Veritas product line is comprehensive, but also pricey in comparison to CDP, which has a narrow focus and requires a little more "heavy lifting" by administrators.
As an example of how these product differences play out, consider bare-metal recovery, which is possible with both the R1Soft product and the Veritas suite. To do this with the CDP app, you need additional software -- a Windows pre-installation (PE) environment or a Linux boot CD. Veritas's NetBackup, on the other hand, has built-in bare-metal recovery features. That difference in functionality has a corresponding difference in price: A five-client Windows license for NetBackup alone is $3,995, while a single CDP server license is $600 plus $150 for annual maintenance and $25 to $75 per Linux or Windows client.
Now that you're clear on the software issues, it's time to address the skills issue. Before you take on a new high-availability project, you need to devote serious time to making sure that your staff knows how to use the software and how it should be implemented at client sites. Make sure to set aside money for training, possibly as much as several thousand dollars for each of your staff members working on the project.
Preparing to implement and support high-availability software involves more than a simple walkthrough of the interface -- what a colleague of mine calls a "point and drool" operation. Your staff needs to understand how to plan for a fault-tolerant setup. The software, whatever tool is being used, is simply an implementation of that plan. Even seasoned administrators won't always be able to just pick up the product and run with it; even if they have plenty of expertise in working with one server at a time, they may not have an understanding of how multiple components tie together in a fault-tolerant environment. Your customers should be able to rely on your expertise as a VAR, not just for implementation, but also for ongoing support by well-trained staff.
Here's why your staff needs to have complete proficiency within a high-availability environment. Say you have two database servers, A and B, both of which have load-balanced network connections and shared access to a disk array. If A suddenly reports that it can't find the disk server or can't talk to B anymore, B picks up A's IP address and resumes processing where the original server left off.
Outwardly, this sort of operation doesn't seem difficult to set up, but there can be "outrider" scenarios, or extreme cases. For instance, if the LLTP (low-level transport) check between the two machines begins "flapping" -- going online and offline cyclically -- the failover system will kick in again and again, creating another kind of instability. What should have been a high-availability system will quickly become a no-availability system.
Unless customers devote staff members to supporting high-availability systems problems, it's unlikely that they'll develop enough expertise to fix the problem; it takes experience to understand that once a failover takes place, the failed system should remain offline until a living human being can ensure that it is safe to bring back online. If you approach high-availability projects with a keen eye toward understanding how to plan for fault tolerance and boosting the skill set of your employees, you'll be well-positioned to keep your customers' systems up and running.