Continuous data protection (CDP) is a topic that often comes up for VARs during discussions with customers that
are having backup window problems. Developed initially as a replacement for existing backup products, CDP solutions garnered a lot of interest, especially in the enterprise space. Now mostly implemented as an option within existing backup products, CDP can provide real benefits in the right use case. This tip will focus on some of those use cases and provide details around recovering data backed up with CDP.
Most CDP solutions use a process that captures data writes and records them to a second location, where they’re applied to a recovery image to keep it in sync with the primary data set. In case of a failure in the primary data set, this recovery image can be mounted and used as a failover copy. The CDP system also keeps a log of these changes and saves each one so that a current image can be rolled back to any previous point in time. Since it results in two synchronized copies of a given data set, CDP eliminates the need for a separate backup cycle, making the backup window a thing of the past. And, since the recovery image is always up to date with the primary, CDP can support a recovery point objective (RPO) of near zero, meaning there will be essentially no gap between the time a failure occurs and the last known good backup.
Originally, CDP solutions were developed and marketed largely by startup companies as a replacement or augmentation for traditional backup applications, but many iterations of CDP technology have since been acquired and integrated into existing backup products. Snapshot and replication technologies have also been used to provide a similar function to true continuous data protection, referred to as “near-CDP.” Snapshots involve capturing the status of a data set at predetermined times, for the purpose of re-creating that state, whereas CDP records write operations continuously. For the purposes of this tip, we’ll focus on “true” CDP solutions.
Which customers are a good fit for true CDP?
True CDP’s primary use case is mostly in larger environments where an always up-to-date backup is critical or where the IT organization needs the ability to dial back to the right moment in time before an event, such as data corruption, occurred. Many of these IT shops are using traditional backup methods but are running out of time to complete regular backups. As data explodes, global companies find themselves in a perfect-storm situation, with an increasing volume of data to back up and no window in which to do it. CDP can essentially remove the requirement for a backup window and, if the CDP volume can be accessed live, it also provides a near-instant recovery capability.
Another use case for CDP technology is as a standalone backup application. Compared with traditional backup solutions, these products can be simpler to implement and easier to use. This fits with the smaller-customer market segments this use case is focused on; they can also be less averse to replacing their existing backup infrastructure. However, some of these products actually use the near-CDP process of successive snapshots and replication.
How does true CDP recovery work?
Depending on the use case, recovery with CDP can be instantaneous, where applications or customers simply mount a LUN or file system and start using it as the primary data set. As described above, this is a common scenario for large customers that have no backup window and an RPO of near zero. For file data, recovery is pretty straightforward, and file versions from almost any point in time can be accessed. Once the primary system has been repaired, the recovery data set is used to reverse the process, resynchronizing the primary and “failing back” users and applications.
For databases, recovery is trickier, as they usually must remain “application consistent” to be used in place of the primary data set during a failure. Sometimes this includes synchronizing with multiple applications and data volumes. Some CDP solutions leverage APIs such as Microsoft’s Volume Shadow Copy Service (VSS) to create recovery points by putting the database into a consistent state prior to capturing changes. This is essentially a snapshot process, which is triggered by the CDP software, although not every second. Other products track events such as file save operations and database startup and shutdown to indicate consistent recovery points. For both methods, these consistent-state recovery points are marked in the change journal. The downside is that fewer recovery points are available for database applications, compared with restoring files. This makes CDP recovery for databases similar to regular snapshot-based recovery.
With most CDP solutions, recovery of files is easy: Simply browse the directory of recovery images, locate the file versions needed and copy them back to the primary storage location. For the standalone backup use case, this can even be a self-service operation, one that’s especially appealing when compared with traditional products’ restores, which required IT intervention.
On the downside, CDP solutions capture every change made to a data set, which means that they basically never delete anything. The potential capacity of storage consumed can make CDP feasible only as a short- to medium-term solution. To address this, some products reduce the frequency of recovery points as backups age; others combine all recovery points older than a certain time into a “synthetic full” backup. A legacy backup application that integrated CDP companies’ technology may have your customers’ best solution by using CDP for only the most recent time frame and then saving the data for the long term by backing the CDP data store up as it would any another client.
Eric Slack is a senior analyst with Storage Switzerland.