Offering remote backup services can be a great way for a storage solution provider to take advantage of a customer's growth. If you plan to offer remote backup services as part of your business portfolio, make sure you know the best course of action for your customer. In this Project FAQ, James Brissenden, senior consultant for storage and data protection at GlassHouse Technologies in Framingham, Mass., examines the most frequently asked questions about offering remote backup services, including current challenges, choosing products for a customer and seeing the design limitations for a customer's data storage.
•What are the current challenges with traditional backup for remote or
branch office (ROBO) data protection?
•How can I reduce the equipment and management costs of backup for my customers in their ROBO sites?
•What are my customer's design options for ROBO data protection?
•What are hash collisions, and should my customer and I be worried about corruption due to hash collisions?
•What are the design limitations for ROBO data protection on my customer's network?
•What about protecting my customer's corporate laptops?
•What should I look for in a vendor when comparing products?
•There are a number of software and hardware solutions designed for ROBO data protection. With many solutions comes a lot of FUD thrown around in the industry by competing vendors. How do I get honest information about them that will help me and my customer through the product selection process?
•Many of these solutions seem to be a departure from tape backup. What if my customer is wary of a tapeless solution?
Data protection becomes challenging as companies become more stratified. The original approach to ROBO backup was to install traditional backup equipment (e.g., backup servers, tape libraries and tape drives) and perform backup. Often in the beginning, this was moderately manageable through remote console services. But as these environments age and grow in size and in number, the ever-increasing backup management and IT costs associated with ROBO backup become a nightmare. Backup in these sites can often be characterized as aging hardware, high error rates, limited resources, and low network bandwidth to a hub or central site. Given the lack of qualified onsite resources, risk for unrecoverability increase not only due to failed backups, but media mismanagement as well.
Where network bandwidth permits, a centralized data protection model is the key to eliminating ROBO backup costs. There are many options like backing up directly across the metropolitan area network (MAN)/WAN to a hub or central site. But if your connectivity lacks the bandwidth necessary for backup traffic, a fairly new option has become available. Data deduplication paired up with replication allows you to perform a backup of all your data once, and then subsequent backups only send blocks or chunks of data that changes from day to day. The result is that just a fraction of data is sent over the MAN/WAN to the central site each night.
Given the nature of ROBO sites, traditional backup to tape across the MAN/WAN is usually not an option. Target deduplication via an intelligent disk target or virtual tape library (IDT/VTL). This option is a costly one but provides a high level of restorability and performance. Yet another solution available is source deduplication software. This is typically a low-cost and highly workable solution that can integrate nicely with your backup software. Be sure to test because not all source deduplication applications are created equal. There are plenty of other approaches to ROBO backup, but these are the new and interesting solutions on the horizon.
Some deduplication software and hardware use what's call hashing to identify data that is duplicate data within the system. If the system finds a duplicate chunk of data, the duplicate is discarded and a small pointer is put in place. A hash collision occurs when a new chunk of data comes into the system and the hashing algorithm (typically SHA-1-based) finds a match and discards the data, even though there really was no match. With some really complex math, the probability turns out to be so infinitesimally small that you have a better chance that a cyclic redundancy check (CRC) sum will cause data to be stored incorrectly on disk than you are to have a hash collision. But I guess someone eventually wins the lottery. That being said, I'm not worried.
Networking limitations may require deduplication in order to back up to a central site. This in combination with dataset size limitations will be a factor in determining whether source or target deduplication with replication is a viable solution. Another consideration is server-level recovery. Be sure you account for the need to restore. If an entire server needs to be restored and you don't have the bandwidth to run the restore from the hub site in a reasonable amount of time, you'll need to have either a local copy in the ROBO site to restore from, or you'll need to rebuild a server at the hub site and ship it off to the ROBO site.
Most enterprise backup software has laptop backup agents available. There is also the option for online backups so that whenever you're online, you're backing up. Finally, source deduplication software is well equipped to protect your laptops across the Internet. Deciding which is the right solution is a balance between licensing costs, whether you are comfortable with outsourcing your laptop backups, and embracing another technology like source deduplication software.
Whenever you're looking at a relatively new technology like deduplication, market share is always a key consideration. That said, there are a few competitors in the deduplication space that have been around for a while now. This is one of those technologies where market share highlights those vendors with the best products. Another indicator to look for is how well a deduplication product integrates with the major backup applications (Symantec NetBackup, EMC NetWorker, IBM TSM,…). If your goal is to minimize management costs of ROBO backup, the solutions that dovetail nicely into your existing backup solution (API integration -- think OST and GUI integration) will make your life much easier. Finally, look to see which solution breaks easiest and also which one is easiest to fix. Remember, if the only way to get to your data is through the product's deduplication engine, it better be robust.
There are a number of software and hardware
solutions designed for ROBO data protection. With many solutions comes a lot of FUD thrown around
in the industry by competing vendors. How do I get honest information about them that will help me
and my customer through the product selection process?
These products change quickly. Tracking the products is a difficult task. One thing is certain: You can't learn what you need to know by listening to competitors positioning against one another. Take advantage of the vendor's competitive nature and compare their products side by side with your data. There are also some great resources out there to compare products, like www.backupcentral.com.
Tape backup isn't going away anytime soon. Given that many of these deduplication vendors are new to the market, be sure to test their solution. If you're looking at post-process deduplication, you can reap the benefit of fast backups to disk, then copy to tape, and finally run your deduplication. You may have some challenges in scheduling this, but you'll have your backups to tape just like you did before. The solution you put in place should allow you to make tape copies of your backup data including after it has been deduplicated. A solution that doesn't lend itself to making tape copies or is slow to copy to tape has a severe flaw in design.
About the author
James Brissenden is an expert in backup systems supporting disaster recovery and data protection strategies, with a specific focus on aligning technical design to business continuance needs. James has a long history in assessing, designing and implementing data storage and backup systems, leveraging deduplication and other leading edge technologies. James joined GlassHouse in 2003 and has recently been leading disaster recovery, data classification and information lifecycle management (ILM) projects.
This was first published in December 2008