By Stephen J. Bigelow, Senior Technology Writer
Disaster recovery (DR) planning isn't just a matter of storage. It's about identifying the data that your client requires for daily operations, and then duplicating that data across the wide area network (WAN) in a timely and cost-effective manner -- striking a balance between WAN speed and connectivity costs. However, deciding just what data needs to be handled, how much WAN bandwidth is needed to accomplish that replication and how to maintain security across remote sites can be difficult for even experienced solution providers. The first installment of this Hot Spot Tutorial introduced critical WAN issues and site planning considerations for disaster recovery. This second chapter details WAN bandwidth factors, redundant connectivity concepts and the use of other technologies, like VPNs and virtualization, in disaster recovery.
WAN bandwidth requirement issues
In simplest terms, the WAN bandwidth needed for disaster recovery is the amount of data that needs to be moved divided by the time available to move it. For example, if your client must move 1,000 MB of data in 10 seconds, they would theoretically need 100 MBps (about 800 Mbps) of bandwidth. Consequently, more bandwidth is needed to accommodate greater data volumes or smaller timeframes. The trick for solution providers is to establish both numbers accurately.
File sizes are fairly easy to determine through an assessment of the client's business applications, but remember that not all changing data is equally important. Different data types can be protected in different ways to reduce data loads and corresponding bandwidth needs. For example, a client may accumulate 100 MB of new or changed data each day, but if 75 MB of that consists of noncritical data that doesn't really require DR protection, only concern yourself with the remaining 25 MB of "important" data. Noncritical data that is not protected by the DR site can still be backed up for later recovery.
Similarly, it is often possible to split DR protection by data type. Suppose that 25 MB of important business data includes 10 MB of transactional data and another 15 MB of email, documents and other business communication. A solution provider can architect a DR plan that supports the transactional data with real-time synchronous replication, while protecting the remaining data with asynchronous coverage over 30 to 60 minutes or some other appropriate timeframe.
Data volumes can be greatly affected by data reduction technologies like data deduplication -- removing redundant files or blocks from data. For example, a client's mission-critical database may change by 20 MB each hour, but data reduction techniques can drop the effective volume to 8-10 MB each hour. WAN optimization appliances can also be deployed to apply compression for smaller file sizes and TCP/IP traffic assistance (e.g., fewer handshakes and jumbo packets) for lower latency and better bandwidth utilization. Solution providers may need to monitor or track application activity on the client's network to gauge the actual data types and volumes that require protection.
Next is the issue of time. "If you want 100% data replicated 24/7, then you need high bandwidth," said Rand Morimoto, president of Convergent Computing, a network solution provider in Oakland, Calif. "But if you are okay with a two- to four-hour delay or even one-day delay on DR, then you can get away with really cheap and simple WAN bandwidth."
Real-time synchronous data replication will demand bandwidth that matches the peak file change activity. For example, if data within the client's organization is changing at 1 MB each second during a normal business day, expect to provide at least 8 Mbps of bandwidth in order to move those changes in real-time. Asynchronous data replication needs can dramatically reduce these demands by spreading out data changes over a longer period. So if that same client organization changes 50 MB of data in the space of an eight-hour workday, and the recovery point objective (RPO) allows that data to be replicated over 16 overnight hours, they would only need about 0.007 Mbps for the overnight job -- an almost negligible amount of bandwidth. If that same 50 MB replication job had to be completed in three overnight hours, the client would need 0.037 Mbps of bandwidth.
There are other considerations when determining WAN bandwidth needs. First, don't figure the initial data load into bandwidth calculations. The initial transfer of data always takes a significant amount of time. For example, moving 10 or 20 or 50 TB of business data to a new DR site is universally asynchronous -- possibly taking several days to complete. Second, always consider that the client uses WAN bandwidth for everyday business activity, so any WAN bandwidth for disaster recovery should be added to the current business WAN bandwidth.
"The bandwidth required for DR dwarfs any communications for pure networking," said Bob Laliberte, analyst with the Enterprise Strategy Group in Milford, Mass. "We'd go from a GigE link [for networking] to an OC48 link [for DR]." Don't include the current WAN bandwidth in the total predicted for DR -- otherwise the DR activity may impinge on regular WAN activity and possibly result in poor network performance or access problems for the client.
Some clients with asynchronous replication needs may save money on bandwidth by throttling bandwidth up (e.g., buying more bandwidth from their provider) during replication periods or opting to replicate during the evenings or other periods of off-peak user demand on the network. Finally, figure in some added WAN bandwidth for future growth. This provides the client with a small buffer that ensures they can still meet data movement objectives into the future even as data loads grow.
WAN connectivity and provider involvement
In many cases, the WAN link itself presents a single point of failure that can cripple a disaster recovery plan, so solution providers are sometimes challenged to overcome this potential weakness by using multiple WAN providers. "There's a clear cost/benefit scenario that needs to be calculated out at this point," said Dave Sobel, CEO of Evolve Technologies, a solution provider located in Fairfax, Va. The goal is to determine client uptime requirements and how much data loss will affect them due to WAN connectivity problems. Ultimately, a solution provider needs to help the client determine if the cost of a second ISP is less than the cost (and risk) of downtime. It's not appropriate for every organization.
While a second ISP can help clients achieve a level of redundancy in their WAN connectivity, experts like Laliberte note that ISPs normally use local telecom providers for the "last mile" connection between the ISP and the client, and even multiple ISPs may ultimately use the same local provider's cabling and other infrastructure. "You really want to find out what the carrier has -- is it aerial fiber, is it buried, what COs are they going through?" Laliberte said. Consequently, the advice is to "do your homework." Understand each provider's cabling and infrastructure and opt for providers that do not share any common cabling or other resources if possible. This kind of separation may not always be possible, limiting your available site selections.
But the issue isn't just limited to local carriers. "We've seen WAN providers hub out of San Francisco or out of the World Trade Center locations go down and bring down an entire region, so it is important that the WAN provider be evaluated for their basic resiliency," Morimoto said, noting that potential outages can affect multiple data centers within that same region. Review each SLA carefully and be sure that the WAN provider is able to guarantee the level of service that your client needs.
Multiple WAN connections are often used simultaneously, aggregating the available bandwidth for improved throughput. But it's important to balance network traffic across aggregated WAN connections just as you would with aggregated (redundant) LAN connections. Otherwise one or more WAN links may be underutilized, resulting in wasted bandwidth. WAN link balancers like the Link LB family of appliances from Elfiq Networks or the Edge series of appliances from XRoads Networks are designed to accommodate multiple WAN links, but solution providers must weigh their cost to the client against the cost of additional bandwidth.
VPNs, virtualization and hosted services in disaster recovery
Any DR plan should include some consideration of data security. Client data must often be moved and stored in a secure manner. Virtual private networks (VPNs) ensure secure data transfers between two points, but they are primarily an end-user technology and not widely used to synchronize data between DR sites. For example, an end user may employ a VPN to recover a lost file from backup storage at a secondary data center or out-of-region recovery site, but solution providers won't connect the main data center and DR site across a VPN. "VPNs are one solution to that problem," Sobel said, noting that SSL encryption for on-the-fly security or other encryption products to secure data before it's sent to the DR site are often more effective alternatives to VPNs.
Virtualization also plays an indirect role in disaster recovery by simplifying the hardware requirements at both the data center and the DR site. "It's easier to DR 25 servers than 100 servers, so if you can consolidate servers and then virtualize them, the recovery process is greatly simplified," Morimoto said. DR hardware is also simplified. For example, DR sites traditionally had to duplicate hardware found at the main site. Virtualization removed this requirement, allowing protected data to reside on a diverse range of hardware -- reducing costs and providing the client with dramatically more deployment flexibility.
Finally, solution providers may wish to consider recommending hosted DR services for smaller clients. Solution providers that already have a DR services infrastructure -- or resell hosting services for a larger provider -- may have a revenue advantage over other providers that would have to refer hosting services to a third party.
Still, the use of hosted DR services can alleviate significant costs for the client. "It plays a big role if the org doesn't already have two or more internal data centers to be used, so it's good for small businesses," Morimoto said. Hosted services are not necessarily a good fit for organizations that already have two or more staffed data centers. In this case, it's probably better to implement high-availability DR, since the basic infrastructure is already in place.