Data deduplication prevents the Code42 app from backing up duplicate data when two or more files in your backup selection include the same information. This reduces the amount of storage space needed for your backup and speeds up the back up process.
What is deduplication?
Data deduplication is a form of single-instance storage, which means that we do not store the same information twice, even if you have the exact same file (or part of a file) duplicated in two or more places on your computer.
For instance, your email may contain many instances of the signature file that is attached to your emails. Without deduplication, each instance of your signature would be saved independently when backing up your email, taking up additional space with multiple copies. With data deduplication, only one instance of the signature is actually stored in your backup, and additional instances point to the stored copy.
Watch the short video below to learn more about file deduplication.
How does deduplication work?
The Code42 app uses block-level deduplication when backing up your files, which splits the files into smaller blocks of data before sending them to your backup destination. During the initial backup of your files, all of the unique blocks of data are transferred to the destination.
If there are duplicate versions of the same file on your computer, the Code42 app detects the duplicate blocks of data and does not send them again. If the file changes, only the changed blocks are transferred. In the example below, only the shaded blocks of data would be sent to the destination.
This makes block deduplication very efficient. The Code42 app uses block-level data deduplication in conjunction with compression to optimize storage space at each destination and reduce the bandwidth required for your backup.
When does deduplication occur?
The Code42 app deduplicates your data any time a backup occurs. In addition, there are times when deduplication is triggered as part of a file verification scan.
By default, the scan is set to automatically run every day, but this schedule can be changed by updating the Preferred time for verification scan setting in device backup settings. The Code42 app periodically runs additional verification scans to detect data corruption, purge files that are no longer selected for backup, and prune file versions and deleted files according to your frequency and version settings.
In addition, there are times when a file verification scan is automatically triggered:
- Changing the file selection: If you update your backup file selection, either to add or remove files, the scan runs to look for new, changed, or deleted files.
- Replacing your device: After replacing a device (for example, if you recently changed devices), the scan runs to compare the files on your device to the files in your existing backup archive.
- Clearing your cache: The cache includes information about your destinations and the data on your device. When the cache is cleared, a file verification scan initiates to help rebuild this information.
- Attaching an external drive: When you attach an external drive, the scan runs to compare the files on the drive to the files in your existing backup archive.
- At device reconnection: If the device is powered off or asleep at the scheduled scan time, the scan runs 15 minutes after the device reconnects.
- Manually: The file verification scan can be triggered at any time from the Backup Set Settings Menu.
Does my backup start over?
Occasionally, the Code42 app needs to re-scan your files to see what's already backed up, for data deduplication. When this happens, it may look like the Code42 app is backing up all your files from the beginning, but it is actually reviewing each block to see what has been backed up already. If the Code42 app is re-scanning your files, you may see one or more of the following:
- Progress is much, much faster than a full initial backup because information that is already backed up is not re-sent.
- All your files are available for download during this process.
- The amount of space used by your backed up files is consistent with the size of your file selection and backup completion percentage. To verify the amount of space used:
- Open the Code42 app.
- Select Settings > Destinations to navigate to the list of destinations.
- From the list of existing destinations, select the destination containing the archive you are verifying.
- Verify that the Space used is reasonable for your file selection size and previous backup completion.
The Code42 app's cache includes information on deduplicated data. You'll experience the above behavior if the Code42 app needs to rebuild its cache for any reason. This is something that happens on occasion under normal use.