This varies by CDP product. As a rule of thumb, network-based products are volume- and data-agnostic so it is safe to assume products such as Revivio Continuous will handle system
Network-based providers like Revivio developed specific algorithms to reduce physical I/O for this purpose. Host-based providers also do but buffering these writes delays the writes themselves. In host-based CDP, for data to be replicated off site, does the CDP system need to be taken off line or can it continue to function and be replicated at the same time?
In nearly every case, the CDP product can continue to function and replicate at the same time. Mendocino software performs data collection for the protected server while at the same time asynchronously replicating data to a remote site over IP. It uses two separate and independent processes within the management server to perform these tasks. How does CDP manage flow control for peak load?
Host-based products typically allow the CDP send queue to back up and be queued to local storage without interruption and they currently lack any method to respond to peak write I/O periods. Also, if the local staging resources for these I/O's is depleted, CDP stops. Revivio claims to allow users to provision as much CDP resources as needed for peak periods and gives administrators the flexibility to expand that capability when peak loads increase over time. It also has QoS provisions that ensure resources are devoted to processing incoming writes as a priority over other processes. When is the I/O complete status passed to write?
This will depend on whether the approach is in-band or out-of-band, and whether the writes are occurring synchronously or asynchronously. For host-based products, the write acknowledgement occurs when the write hits the secondary storage (the CDP management server). For network-based products, a write acknowledgement is returned from both the primary storage and the CDP appliance. The question of I/O complete status matters for performance, synchronous mirrors and if there is any question of data being lost. In the first case, network-based CDP appliances tend to be as fast as primary storage since writes are cached on most CDP appliances before being written to disk. Host-based CDP solutions like Storactive and Mendocino run in asynchronous mode so the issue is performance impact, not data loss. In regards to data loss, even in the event of a catastrophic appliance failure of either the network-based appliance or the management appliance with which the host-based CDP agents communicate, there is no data loss on the primary storage. What are "side files"? Is this just a facility for "break" of process to allow backup of files, LUNs to tape, etc and resync to continue?
"Side files" go by different names from the different CDP vendors. Revivio calls them TimeImages while Mendocino Software refers to them as simply snapshots. Regardless of what the CDP vendor calls them, most support them and cite their ability to create them as one of their primary value adds. Taking a snapshot with Mendocino allows administrators to present them to another server. When this snapshot is presented to another server, the snapshot is neither attached to the protected server nor does the other server access it through the management appliance. From these snapshots, backups can be run without affecting the protected server's data in any way. This feature does not work the same on all CDP products. For example, on Storactive's Liveserv, for example, when this "break" occurs, it will halt CDP and force a resync on restart due to the tight coupling the exists between the Liveserv and Exchange. What is the overhead associated with the agent installation on each production host?
Most of the vendors in this space reported an average of 2-3% overhead. In reality, read intensive applications will consume much less than that while write intensive applications will likely see a greater overhead.