Data de-duplication prevents CrashPlan from backing up duplicate data when two or more files in your backup selection include the same information. This reduces the amount of storage space needed for your backup and speeds up the back up process.
What is de-duplication?
Data de-duplication is a form of single-instance storage, which means that we do not store the same information twice, even if you have the exact same file (or part of a file) duplicated in two or more places on your computer.
For instance, your email may contain many instances of the signature file that is attached to your emails. Without de-duplication, each instance of your signature would be saved independently when backing up your email, taking up additional space with multiple duplicate copies. With CrashPlan's data de-duplication, only one instance of the signature is actually stored in your backup, and additional instances point to the stored copy.
How does CrashPlan de-duplication work?
CrashPlan uses block-level de-duplication when backing up your files, which splits the files into smaller blocks of data before sending them to your backup destination. During the initial backup of your files, all of the unique blocks of data are transferred to the destination.
If there are duplicate versions of the same file on your computer, CrashPlan detects the duplicate blocks of data and does not send them again. If the file changes, only the changed blocks are transferred. In the example below, only the shaded blocks of data would be sent to the destination.
This makes block de-duplication very efficient. The CrashPlan app uses block-level data de-duplication in conjunction with compression to optimize storage space at each destination and reduce the bandwidth required for your backup.
When does de-duplication occur?
CrashPlan de-duplicates your data any time a backup occurs. In addition, there are times when de-duplication is triggered as part of a file verification scan:
- Changing the file selection: If you update your backup file selection, either to add or remove files, the scan runs to look for new, changed, or deleted files.
- Computer adoption: After adopting a previous archive (for example, if you recently changed computers), the scan runs to compare the files on your computer to the files in your existing backup archive.
- Clearing your cache: The cache includes information about your destinations and the data on your computer. When the cache is cleared, a file verification scan initiates to help rebuild this information.
- Manually: You can start the file verification scan at any time from Settings > Backup > Now.
Is my backup starting over?
Occasionally, CrashPlan's data de-duplication needs to re-scan your files to see what's already been backed up. When this happens, it may look like CrashPlan is backing up all your files from the beginning, but it is actually reviewing each block to see what's been backed up already. If CrashPlan is re-scanning your files, you may see one or more of the following:
- Progress is much, much faster than a full initial backup because information that has already been backed up is not re-sent.
- All your files are available for restore during this process.
- The amount of space used by your backed up files at the destination is consistent with the size of your file selection and backup completion percentage. To verify the amount of space used:
- Select Destinations and choose a destination type (for example, Cloud)
- Select a destination and note the Space used.
CrashPlan's cache includes information on de-duplicated data. You'll experience the above behavior if CrashPlan needs to rebuild its cache for any reason. This is something that happens on occasion under normal use.