Despite shrinking IT budgets across the country, disaster recovery remains an important concern for customers. And it's clear it will garner a healthy chunk of spending dollars this year -- DR was chosen as a top-three priority by about 80% of 208 respondents to a fall 2008 survey conducted by Storage Magazine. So your customers are looking for ways to implement and/or improve their current DR strategy. And since Step 1 in any DR strategy is determining how to get the data to the secondary site, one of the first topics to come up in conversations with customers will be remote data replication.
Hardware-based replication is typically handled by the storage system itself. A software module is activated that allows the storage system to replicate data as it changes, asynchronously, to another, often-similar storage system from the same manufacturer. This type of deployment is ideal for environments with large server counts since all of the replication is controlled by a single entity: the storage system. While hardware-based replication used to be difficult to configure, it's now simpler since most storage systems have some sort of built-in IP connectivity, required for WAN connections.
On the downside, if they don't already have one, your customers will need to buy an advanced storage system, like a SAN or NAS, plus a software license and, most often, the exact same or similar hardware/software system in the remote facility; as a result, hardware-based replication is more expensive than a host-based approach. In addition, only the data on those storage systems will be replicated; if your customer is not booting directly off of those systems, they will need to deploy host-based replication to capture the boot information. But virtualization solutions like those from VMware and Citrix have solved much of this problem: While many customers may boot the VMware host locally, the virtual machines are all encapsulated and will boot from the storage system in a recovery process
An alternative approach is to use a storage virtualization appliance like those available from DataCore and FalconStor to handle the remote data replication services. A storage virtualization appliance allows the customer to replicate from anything to anything. This is an ideal solution to recommend to customers if you don't happen to provide the storage that they already have in place. They can keep their current solution, and you can provide the virtualization appliance as well as the target storage on the remote site.
Using a storage virtualization appliance also drives down the storage costs in the remote DR site. You can provide a lower-performing system with fewer higher-capacity drives, for example. But be careful not to go to cheap. In the event of an actual disaster, your customer may have to run their production environment on this storage. Find out what the customer's tolerance is for running the production environment in degraded mode, and make hardware choices based on their level of comfort.
Finally, a storage virtualization appliance gives you the option to participate in the build-out of storage in the production site. If the customer wants to extend the use of the storage virtualization appliance beyond just remote data replication, the appliance often can provide the same functionality as the storage system itself. Features like logical unit number (LUN) allocation, snapshots and thin provisioning can all be handled by the appliance, freeing the customer to use hardware of their choosing.
Typically, with host-based replication, a software agent is placed on each of the servers to be protected and that server then replicates itself to a target server in a remote location. For virtual environments, most of the suppliers in the host-based replication space, like Double-Take Software and Steeleye Technology, don't require an agent in each virtual machine -- lowering cost and increasing simplicity.
Most host-based replication suppliers have also added application intelligence to their agents to make sure the replica copies of Exchange, SQL, etc., are in a usable state; hardware-based systems, on the other hand, just see blocks and don't typically understand what type of data is in the blocks. That means that in a hardware-based system, you might need to do reindexing or rebuild work in the event of a failure.
Other pluses: Host-based systems do not require a centralized storage system, and there is no concern about how the servers boot. Especially in environments with small server counts, these solutions can be very cost-effective.
As the server count grows, however, the purchasing cost and complexity of managing multiple agents grows as well. Also, the larger the environment, the more likely that you will have already recommended a hardware-based solution that has its own replication capability.
A few years ago, replication of the backup job was an impossibility for most customers; the amount of data created each night by the backup process was too large to be effectively replicated. Now, with backup deduplication systems from companies like Data Domain and Sepaton, replication of these data sets is indeed a reality. Backup replication is also one of the most cost-effective and simplest ways to get data to a remote site. It can serve as either an alternative to or an adjunct to hardware- and host-based replication. Since many customers are interested in deduplication for backups, extending the conversation to replication of that data to the DR location by the deduplication device is a natural. It brings an attractive economy of scale: Capacity at the remote site benefits from the same optimization as the backup at the primary site.
Before you rush to deduplication as the answer to your customer's remote data replication challenge, however, it's critical for you to understand what your customer's recovery expectation is at the DR site. While backup replication is obviously a disk-based solution, unlike standard replication, the data is stored in the backup application's format; a recovery to a standard disk system will be required. This will impact planning so make sure that the backup application is in place at the DR site and ready to do restores; you'll also need to determine how much time it will take to transfer your customer's data from the backup application to the production storage.
SAN or NAS consolidated storage systems are more affordable and, because of server virtualization, more cost-justifiable than ever; they make sense for all but the smallest of customers, so the cost of hardware-based replication is within range for most customers. Host-based replication, on the other hand, doesn't have a long-term role at bigger sites because the cost and complexity increases with server count. But no matter which option is chosen, look to leverage data deduplication to drive down costs.
For servers that need to be recovered in less than 2 hours, either hardware- or host-based replication is a more logical candidate than backup replication, but typically only a very small percentage of the servers in your customer's data center will have that requirement. And even for sites that do decide to go the traditional route across the board, using deduplication to reduce data before replication is an ideal way to bring down the cost of maintaining a DR site while resolving another project your customer is sure to ask you about: disk-to-disk backup.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.