By Stephen J. Bigelow, Senior Technology Writer
Network disaster recovery (DR) relies on software products that can send critical data to the remote site or recover that remote data for return to the client. Once implemented and configured, the disaster recovery software system needs to be routinely tested to ensure that all of the parts involved work properly. All of this activity needs to happen while maintaining the client's regulatory compliance or other corporate governance position, further complicating network DR planning for solution providers.
The first part of this Hot Spot Tutorial introduced critical WAN issues and site planning points for DR. The second chapter detailed WAN bandwidth factors, redundant connectivity concepts and the use of other technologies like VPNs and virtualization for disaster recovery. This third installment discusses changing trends in disaster recovery software and highlights the importance of regulatory compliance.
Changes in disaster recovery software
Disaster recovery software should satisfy the
data protection needs of the client and their business. The most important consideration in
disaster recovery software selection is the recovery time objective (RTO) -- understanding how
quickly the DR software can retrieve and restore data from the remote site. The product should
accommodate the customer's data load and change rate in synchronous or asynchronous mode, pass that
data within the available effective WAN bandwidth, support all of the client's mission-critical
applications or data types, and still fit within the client's budget.
"Time to recovery is the main goal, and then balancing against cost," said Dave Sobel, CEO of Evolve Technologies, a solution provider located in Fairfax, Va.
In the past, DR software solutions often proved complex and difficult to configure fully. But disaster recovery software is changing. "The trend I can clearly see is simplification of the infrastructure," Sobel said. "Customers are looking for fewer tools in the environment." Solution providers can teach clients how to use their existing infrastructure and tools for new tasks wherever possible, rather than changing or adding to their infrastructure. In other cases, solution providers must help the client ensure that any new iteration of tools they already have will solve emerging problems or changing DR needs.
The push toward simplification is reflected in a trend away from third-party products. "Movement is away from snapshots and third-party replication products such as SAN Snapshots or DoubleTake, or VMware VMotion," said Rand Morimoto, president of Convergent Computing, a network solution provider in Oakland, Calif. "The movement is to built-in replication like SQL 2005 Mirroring, or Exchange 2007 Stretch Cluster Continuous Replication, or DFS-R -- where the replication is in the application, thus failover and failback is native to the app and fully vendor- and auditor-supported." This approach reduces the number of DR tools in the environment and eliminates vendor finger-pointing when replication doesn't work as expected.
While virtualization tools like VMware may not be desirable for disaster recovery alone, clients that already employ virtualization for consolidation or management purposes may also see benefits in DR.
"Another application that we see playing a bigger role in DR … is VMware," said Bob Laliberte, analyst with the Enterprise Strategy Group in Milford, Mass. Laliberte noted that virtualization enables far greater flexibility in site design and equipment, allowing for cost savings, which some clients may leverage to establish a third DR site (such as a restoration site in addition to a traditional DR site). Additional tools like VMware's VMotion enable the migration of one virtual machine to another, allowing failover between host servers without disruption. Similarly, VMware's Site Recovery Manager automates the recovery of virtualized environments for SMB/SME clients.
Testing and documentation in the DR environment
Testing is critical for any network disaster recovery plan, but the frequency and extent of testing depends on the client's recovery objectives. Clients with tighter recovery objectives must test more frequently. For example, a client environment with 20 users and several servers with an RTO of 24 hours may require little (if any) actual recovery testing. It may be adequate to verify several times a year that the client's data is available to be recovered. Conversely, a client that relies on recovery times of mere seconds in a critical disaster may demand much more frequent testing. No two clients are the same -- experts like Morimoto, Sobel and Laliberte cite testing rates ranging from once per year to once per quarter to once per month and even more frequently for busy enterprises.
When advising a client on DR testing frequency, solution providers will need to weigh the costs of labor and potential network disruption, as well as the risks of testing, such as potential restoration errors. "If you want [reliability] to be 100%, I need to continually be testing it daily," Sobel said. "That may not be what you really need." Sobel suggests that solution providers first examine the risks and costs of the client's RTO and then evaluate a testing schedule that can minimize those risks.
A major concern with any DR plan is the effect of change. New applications, new servers, additional storage resources, patches, updates and any changes to the physical network infrastructure or WAN can all adversely affect the DR plan, requiring an updated plan and new testing. Changes are often overlooked until testing reveals missing or inaccessible data, so keeping the client's DR plan current can be a significant opportunity for solution providers.
DR validation software such as Onaro's Replication Assurance, CA XOsoft Assured Recovery, Continuity Software's RecoverGuard, and Symantec's Fire Drill feature in Veritas Storage Foundation 5.0 High Availability (HA) for Windows all monitor and test the DR system by reporting or accommodating changes as they're found. Laliberte said that as these validation tools mature, testing requirements will ease, because validation tools will be constantly monitoring DR readiness.
Solution providers should also be concerned with DR documentation, which includes all of the policies and procedures needed to execute the DR plan. The documentation is light for most client organizations. "Clear documentation is important, but with HA/DR combo solutions the [high availability] doc, DR doc, and general patching and updating docs are all the same … very simple," Morimoto said. There may be separate documentation for the client and the solution provider depending on who is actually performing DR activity. Solution providers can potentially generate added revenue by offering routine testing and documentation review/update services.
Compliance considerations for disaster recovery
Most client organizations must preserve data in a manner consistent with regulatory compliance rules appropriate for their business or vertical. For example, any publicly traded U.S. company is governed by Sarbanes-Oxley (SOX) compliance, while U.S. healthcare businesses are also affected by Health Insurance Portability and Accountability Act (HIPAA) regulations. Solution providers that develop, implement or change DR postures will need to ensure that the changes still meet compliance objectives that may overlap multiple regulations. This can complicate DR design and testing because client data must be preserved in a manner not only to accommodate the client's business objectives (such as recovery time), but also to satisfy one or more compliance auditors.
Take the time to understand the compliance needs of the client's industry, as well as the unique requirements of the particular client. Success depends on close work with the client beyond just IT tasks -- often involving input from human resources, finance, legal and other departments. "Then the solution provider can help implement the DR solution, test it and make sure it all works," Morimoto said, noting that DR features integrated into each application often pose little problem for compliance.
This was first published in September 2008