Service provider takeaway: Service providers can help customers choose from three methods for cutting data at the...
source to reduce their backup windows: incremental backup, data deduplication and block-level incremental backup.
With the ever-increasing volume of data and the need for 24/7 access to company resources, the time your customers have available to perform backups is becoming more scarce. So most, if not all, of them are looking for ways to reduce the backup window. Many are turning to disk-to-disk backup as a solution. And while disk-to-disk backup can improve overall reliability, speed of recovery and electronic vaulting of backup data, reducing the backup window requires changing more variables than just the medium. That's because reducing backup windows is an infrastructure issue, not a backup target issue.
The problem is that data is growing at a much faster rate than infrastructure. With an uncertain economy, few IT budgets have room for a total upgrade to SAN-based backups or a 10 Gigabit Ethernet network.
But even in the absence of big allocations on infrastructure, there are still techniques your customers can implement to cut their backup windows.
Source-side data reduction
Short of an infrastructure upgrade, one of the more practical ways to reduce the backup window is source-side data reduction, which essentially sends less redundant data across the network. Doing so reduces both network and server resources needed. Source-side data reduction and target-side data reduction strategies should work hand-in-hand and complement each other; target-side consolidation through backup virtualization is an effective approach to the latter.
To cut the backup window by reducing data at its source, there are three primary methods. We'll examine those methods and point you toward the one that's likely to give your customers the best return.
- Incremental backup: This type of backup job differs from differential backup, which will back up anything that has changed since the last full backup. Incremental backup essentially backs up everything that has changed since the last backup, no matter whether it was a full backup or not. Over the years, many backup applications have added the ability to create a new updated full backup by comparing the incremental backups to the previous full backup. Files that have not changed are moved from the old full to the new full, and files that have changed are moved from the appropriate incremental backup to the new full. This produces a new consolidated full backup without needing to bring all of the data across the network again.
While this capability certainly has its merits, especially since it's included with many backup applications, it also has limitations. For example, incremental backup is a file-level technology. If you have a large database or email environment, the large files that represent that environment change throughout the day. When the incremental backup process protects these environments, it copies all the changed files that make it up, even though these changes may represent only a small percentage of the files. Also, the consolidation process is time-consuming, especially if it occurs to tape. Even with disk, it can take hours to ascertain what data from what incremental backups should be written to the new full. And the original baseline full must always be part of the recovery set, which complicates implementations of incremental backup.
- Data deduplication: This method uses a new backup agent at the host. EMC's Avamar is an example of a source-side deduplication product. The agent on each host connects to a centralized consolidated store of previously backed-up data. The agent makes comparisons of what's on the local server and what's on the backed-up store. It then sends only the unique blocks across the network. While this works well in remote office backup situations, it does not scale well in the data center, especially if there is a moderate-sized server count, say, more than 20. Having that many servers doing the redundant data analysis consumes time and server resources. In some cases, if the change rate is high -- such as in a database -- the amount of additional resources required by the deduplication process can render the application itself unusable while it is being scanned for duplicate data.
Because source-side data deduplication is a new type of backup application, make sure that the system you recommend can protect your customer's environment as effectively as its current software solution. You should examine, for example, whether the product has support for the messaging, database and virtualization platforms your customer uses.
- Block-level incremental backup: This type of backup functionality is available from Syncsort through its Backup Express Advanced Recovery product and from NetApp in the form of Open System SnapVault. These tools create a block-level image of each protected server's disk on a secondary disk. When a subsequent backup of the protected server's disk is performed, only the blocks that have changed will be transferred across the network. Unlike the above example with consolidated fulls, when the backup occurs on a database, VMware image or email system, only the blocks of data that have changed since the last backup will be transferred across the network -- and not the full files. This represents a significant reduction in network utilization.
Because block-level incrementals work at a predefined increment, identifying duplicate data is substantially easier, which means there's less impact on the server but still a significant reduction in the data footprint on the storage device. From a software perspective, it is much less resource-intensive to identify the changed blocks between two volumes than to identify identical byte stream patterns across the enterprise.
In fact, block-level incrementals are so resource-efficient that multiple backups can be performed throughout the day with little to no disruption of the application servers being protected. This expands the backup application's abilities into the realm of continuous data protection (CDP) or near-CDP functionality. Like CDP, block-level incrementals make use of snapshots: Before each backup, a snapshot is taken of the backup storage area to preserve different versions of files.
One downside to block-level incrementals compared to source-side data deduplication is that the redundancy is only on a volume-by-volume basis, not across the enterprise. Your customers may not see the reduction that some of the deduplication suppliers report, but the savings are still substantial.
For this capacity tradeoff, the block-level incremental technology creates an active backup target. This target is a real file system; data isn't stored in a proprietary backup format, it's in its native form and is capable of direct interaction. In the case of an array failure, the backed-up copy of that array can be mounted directly via an iSCSI mount and the application can rapidly be back in production, with no data movement for recovery. The active backup target can be used beyond the backup process, as well. For example, you can take a snapshot of data on this volume and mount it to a test development server.
Of the three methods for reducing data at its source to meet your customer's backup window, block-level incremental backups provide the most substantial improvement in the backup process that we have seen in years. They not only reduce network resource requirements, but they also decrease the server resources, creating a near-CDP level of protection. By providing a unique presentation of the backup target, block-level incremental allows for new capabilities like zero data movement recoveries and active targets to expand the use of the backup process beyond just backup and recovery.
About the author
George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for data centers across the United States, he has seen the birth of such technologies as RAID, NAS and SAN. Prior to founding Storage Switzerland, George was chief technology officer at one of the nation's largest storage integrators, where he was in charge of technology testing, integration and product selection.