Data deduplication in the storage channel
What role will data deduplication play in backup?
The role data deduplication, also known as single instancing, data differencing and commonalty factoring, plays with backup is to reduce the amount of data backed up and stored for recovery. Reducing the amount of data has two impacts. The amount of data that must be collected and moved from a server to another storage medium for backup is reduced, or, reducing the amount of storage space used for the backup. Some data deduplication solutions address how data is gathered (pre-processing) at the source of the backups to eliminate duplicate or common files from having to be repeatedly copied to a backup device.
Some data deduplication solutions focus on reducing the amount of data ingested (inline) or after it is stored (post processing) as part of a backup stream to reduce the amount of storage capacity for storing backups. Some virtual tape libraries (VTLs) have implemented some form of data deduplication to reduce the amount of storage capacity to hold backups in addition to traditional data compression and compaction techniques. VTLs with deduplication should be compatible with existing host server backup software. On the other hand, server based software to perform data deduplication would require new software to installed or existing software updated.
This was first published in January 2007