If anything can keep an IT manager up at night, it’s the lack of confidence in the company’s disaster recovery plan. Will the investment in backup, replication and server availability work when it matters most? According to a January Snapshot survey by Storage magazine, most of your customers would answer that question in the negative. Only 46% of 116 survey respondents said they’re very confident that their current storage DR plan would allow them to weather a disaster without a significant business impact. When it comes to disaster readiness, if you are not very confident in it, it really has no value.
Customers appear to be working toward remedying that problem. Forty-three percent of 359 respondents to Storage magazine’s spring 2011 storage Purchasing Intentions survey said that they would increase their spending on DR products and services in 2011, compared with only 7% who said they would decrease it. As a storage VAR, one way to take advantage of this increased spending is to understand what can go wrong when a disaster is declared and then provide the tools that will monitor for those conditions.
What can go wrong
The foundation of any disaster readiness plan, other than the people needed to execute it, is data. The right data needs to be in place at the remote location in order for any recovery to take place. The state of that data determines how quickly the recovery will occur. Data that needs to be immediately accessible typically needs to be replicated to a live file system at the remote site. Data that can be unavailable for a few hours can be replicated via a disk backup system, and data that can be unavailable for a day or more can be moved to the DR site via tape and truck. Replication, disk backup and tape each have roles in an effective and affordable DR plan. But it’s always necessary to balance availability and cost containment; if money were not an issue, you could replicate everything. Unfortunately, money is always an issue.
Each of these processes—remote replication, disk backup and tape transport--need to be monitored to make sure that the data at the remote site matches the data that is in production at the primary data center. Without the data, no customer RTO will be met. Monitoring is important because a mismatch of data between the primary site and the recovery site is common, typically because new servers or new volume are brought online on the primary site but not the secondary one. For example, in almost every data center we have studied, there are volumes at the primary site that have not been added to the DR process and therefore don’t exist at the recovery site.
The most common scenario where this happens is with replication. While most backup environments are set to "back up all," replication environments are set up to replicate only specific volumes within a server or application. That’s because the cost of the bandwidth to perform real-time replication is high, as is the cost of the mirror copy of data required at the DR site. But since specific volumes are selected for replication, when new volumes are added to a server or application, those volumes need to be manually added to the replication process.
Besides this mismatch problem, disaster readiness plans can fail if backup data--usually tape-based data--does not arrive in the remote location. With replication, specific error messages are generated when the replication process is interrupted; most replication products have a consistency checking routine that executes when the communication link is re-established. Backup processes, however, seldom include a check-in routine, and so there’s no easy way to ensure that the backup data has made it to the remote location.
The reseller opportunity
Resellers have an opportunity to show significant value when confronted with a customer or prospect who does not have confidence in their ability to recover from a disaster. In the pre-virtualized data center, the first step was often to design some sort of service, like a disaster recovery audit, to determine DR preparedness. But in today's highly dynamic virtualized data center, your customers need real-time or near-real-time tools to know the moment their DR site is out of sync with their production systems. In a virtualized environment, by the time you finish the audit, the customer environment may have already changed.
Today’s DR monitoring tools analyze the server for new volumes being created, as well as other change conditions, and alert the user to them. The user or you can then acknowledge the alert as a volume that does not need to be replicated or as one that does and add it to the replication process.
In a similar fashion, some backup reporting tools can monitor tapes that have left the primary data center and then “watch” for them to be inserted into a library at the remote DR location. This confirms that the tape is available in the remote location and that tape-based recoveries at least have the media needed for data recovery.
George Crump is president of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments.