Deduplication, compression, and your backup
Who is this article for?
Instructor, no.
Incydr Professional, Enterprise, Horizon, and Gov F2, no.
Incydr Basic, Advanced, and Gov F1, yes.
Overview
This article describes how data deduplication and data compression increase the efficiency of the Code42 agent's backup process and decrease the amount of storage space needed while still preserving your ability to restore the files to their original state.
Data deduplication
Data deduplication is a form of single-instance storage, which means that we do not store the same information twice, even if you have the exact same file (or part of a file) duplicated in two or more places on your computer.
For instance, your email may contain many instances of the signature file that is attached to your emails. Without deduplication, each instance of your signature would be saved independently when backing up your email, taking up additional space with multiple copies. With data deduplication, only one instance of the signature is actually stored in your backup, and additional instances point to the stored copy.
Video
Watch the short video below to learn more about file deduplication. For more videos, visit the Code42 University.
How does deduplication work?
The Code42 agent uses block-level deduplication when backing up your files, which splits the files into smaller blocks of data before sending them to your backup destination. During the initial backup of your files, all of the unique blocks of data are transferred to the destination.
If there are duplicate versions of the same file on your computer, the Code42 agent detects the duplicate blocks of data and does not send them again. If the file changes, only the changed blocks are transferred. In the example below, only the shaded blocks of data would be sent to the destination.
This makes block deduplication very efficient. The Code42 agent uses block-level data deduplication in conjunction with compression to optimize storage space at each destination and reduce the bandwidth required for your backup.
When does deduplication occur?
The Code42 agent deduplicates your data any time a backup occurs. In addition, there are times when deduplication is triggered as part of a file verification scan.
Does my backup start over?
Data compression
The Code42 agent analyzes, compresses, and encrypts your data before sending it to your backup destinations. Data compression is the process of reducing file size by encoding the data in a more efficient way. There are many algorithms that can be used to compress and decompress data, which fall in to two categories: lossless, and lossy (used to make MP3s and other media smaller). The Code42 agent losslessly compresses your data before sending it for backup.
How does compression help?
When the Code42 agent identifies new or changed data in a file, it breaks the data into blocks and compresses each block. The smaller file size increases your effective transfer rate, which makes both backing up and restoring the files faster.
By reducing the file size, compression also reduces the amount of storage space needed at the destination. Between compression and deduplication, Code42 agent can save a significant amount of disk space. However, data savings can vary greatly based on the type of files being backed up. For example, text documents compress extremely well, but movies do not. Typically, we observe 10-30% savings in disk space as a result of compression and deduplication.
Does compression harm my data?
The Code42 agent uses lossless compression when backing up your data, so your data can be restored to the state that it was in prior to being backed up. This differs from lossy compression:
- Lossless compression reduces file size by identifying and eliminating any redundant data within the file and minimizing wasted space. No data is lost in lossless compression.
- Lossy compression uses partial data discarding to represent the content being encoded. Discarding the data reduces the quality of images, videos, and music. Using this compression type reduces the size of the file by discarding some details of the file that were in the original, such as reducing the number of colors displayed in an image.
External resources
- Wikipedia: Data deduplication
- Computer Weekly: How data deduplication works
- How to Geek: Understanding lossy and lossless compression
- Wikipedia: Lossless compression
- Wikipedia: Lossy compression