Data replication for disaster recovery is an ideal market for storage solution providers to be in, but you should...
be careful about how you approach it since there are many variables to understand. Disk array-based data replication, which Pierre Dorion wrote about recently for SearchDisasterRecovery.com, is very often a second phase in a storage system deployment; a customer will first install one array at the primary site and then will think about replicating to another array at a secondary site. If you're asked to be involved in the replication project, there are a series of questions that you should ask prior to moving forward. Here, we lay out the questions to ask and explain why they're important.
1. Does your customer like its current storage system?
Any customer that's investigating array-based replication likely already has a storage system in place; in fact, it may have been there for a while. The first question to ask your customer is whether they like their current storage system. This might seem overly basic, but it's critical that you not assume the customer likes the storage system since if you follow the normal path here, you're going to sell them an almost-identical array for the secondary site. If your customer doesn't like their current storage system, rather than installing another array at a second site, it might make sense to talk to them about software-based replication.
2. Is the customer's current storage system up-to-date with software and firmware?
Assuming that they do want to continue to use their existing storage system, it is also important -- again, especially if it has been in place for a while -- to make sure the existing array has the most recent software and firmware. Since you will be providing a new version of what they have at the secondary site, a mismatch in firmware can cause the replication to not work correctly.
3. What data will need to be replicated?
The next step is to identify what data the customer wants to replicate. Since backup devices -- especially those that can do deduplication and replicate to a DR site, like those from Data Domain and ExaGrid -- are able to handle a speedy restoration of data, SAN-level replication requirements should drop considerably. Many customers no longer need to replicate all the data on the SAN, and they shouldn't want to. It adds unnecessary expense to the remote storage target and WAN bandwidth.
If these deduplication devices are in place, the customer should replicate only the data that must be instantly available at the remote site in the event of a disaster. Many customers will have data that fits into that category, but it is a very small percentage of the overall SAN capacity. For customers that don't have a deduplication system in place, the savings on primary storage replication could easily justify their purchase. Customers that don't have a dedupe system and don't want one could decide to replicate all the data on the SAN, or they could choose tape-based backup, instead of replication, for disaster recovery.
4. What applications need to be replicated?
The next step is to identify what applications need to be replicated -- especially those that will be active while the replication process is underway. These applications will require specific intelligence from the storage system to enable them to be in a consistent state at the DR site. Make sure that the customer's storage system has this functionality; if it doesn't, the customer needs to understand what kind of rebuild will be required at the DR site, and that needs to be factored into their recovery plan.
5. Is the WAN reliable?
Finally, confirm the reliability of the customer's WAN segment. It may work fine for the occasional FTP or Wide Area File System (WAFS) transmission, but replication may bring it to its knees. If you don't have a tester to place a load on the WAN to simulate traffic, it might be a good idea to invest in one. If the test shows the customer has a WAN connection that won't support replication, they'll need to upgrade the WAN.
Once all these questions are asked and answered, you'll have the information you need to determine which replication software is right for the customer and whether they have the underlying WAN foundation to support the replication process. The risk and importance associated with array-based replication means that upfront planning for it needs to be better than in a typical storage project. Make sure you ask the right questions upfront to stay out of trouble.
Here's the advice Pierre Dorion had for end users around array-based data replication:
Disk array-based data replication: The pros and cons
At the highest level, there are three main types of data replication commonly used: application-based, host-based and storage array-based data replication. In fact, we could get even more granular and further subdivide array-based replication depending on whether it takes place at the controller level, the storage area network (SAN) level or is controlled by an appliance.
Purists could argue that SAN and appliance-based replication are not "true" array-based replication because they are independent of the disk array, but for the purpose of this article, we can agree that replication takes place at the storage level rather than being host or application based. What distinguishes storage-level replication is that it relieves the application and server resources from the processing overhead associated with replication.
The downsides of array-based replication
A few years ago, it was a lot easier to outline the downsides of array-based replication; it was a very low-level technology that replicated blocks of data without much ceremony. Many times you had to take applications down to preserve data integrity because the application was not aware of the replication process. The technology offered very little support for heterogeneous storage hardware, which made it pricey.
Read the rest of the story on array-based data replication by Pierre Dorion.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection. Find Storage Switzerland's disclosure statement here.