Skip to main content
Code42 Support

File Search

Available in:

  • CrashPlan PRO
    • Standard
    • Premium
    • Enterprise
Applies to:

Overview

Use file search to search a user's backed up files based on file name, file content, and file metadata. File search allows authorized security and legal personnel to determine if an employee:

  • Had access to files containing sensitive information
  • Obtained unauthorized access to confidential information
  • Has data that should be subject to legal hold

How file search works

File search relies on indexing users' files. When indexing is enabled, each storage server processes the archives in its store points to generate searchable indexes. These indexes are stored inside each archive.

Types of indexing

The Code42 platform supports two types of indexing:

  • Metadata indexing: Information about the file is indexed, such as filename, created date, and modified date. File contents are not indexed.
  • Content indexing: File contents are indexed in addition to file metadata.

Your Code42 servers automatically perform full content indexing for supported file types.

File types supported for content indexing

The Code42 platform can perform content indexing for the following file types:

  • TXT
  • HTML
  • XML
  • PDF
  • Microsoft Word (DOC, DOCX)
  • Microsoft Excel (XLS, XLSX)
  • OpenDocument (ODT)
  • RTF
  • EPUB
  • iWork
  • Most plain text files, such as source code

Requirements

To use file search, your Code42 environment must meet the following requirements:

Component Or Configuration Requirements
Authority server
Storage server
  • Located on-premises
  • Runs version 5.1 or later of the Code42 server software
  • Meets the recommended system specifications
  • Has store points with enough free space to accommodate 10–20% archive size expansion (for archive indexes)
Backup encryption key policy Users' archives must use the Standard archive encryption key policy.
Archive key password and Custom key are not supported.

Performance optimization recommendations

Indexing consumes Code42 server system resources. Do not enable indexing if your Code42 servers have average load that is 50% or greater.

We recommend the following storage server configuration to optimize indexing performance:

  • At least one CPU core per store point
  • As much RAM as possible

Ideally, your Code42 servers should be able to index inbound files in real time or catch up over a 24-hour period by indexing during off-peak hours. See our performance testing results for information about expected performance.

Add store points to disk-bound servers
For storage servers that are disk bound (have underutilized CPU cores), add more store points to increase indexing performance.

Enable and use file search

Setting up indexing and file search involves:

  1. Enabling file search functionality.
  2. Configuring a destination to allow indexing.
  3. Enabling indexing for an organization or for specific users.
  4. Granting authorized users access to search users' files.

After indexing is configured, an administrator or security professional uses the Code42 File Search web app to search users' backed up files.

Best practices for indexing

To avoid overloading your Code42 servers, we recommend a phased approach when you enable indexing in your Code42 environment:

  1. Review our performance testing results to understand how Code42 server configuration and file types impact indexing performance.
  2. Enable indexing for one organization.
  3. Use the administration console to monitor indexing performance for one week.
  4. If the organization is fully indexed and indexing keeps up with inbound backup data, enable indexing for an additional organization.

Detailed configuration and usage instructions

The following articles describe how to configure indexing and use the File Search web app in detail:

How Code42 server configuration and file types impact performance

Code42 tested specific file types with baseline hardware to determine expectations for indexing performance. Use this data to understand the variables that impact indexing performance.

Test configuration

This section summarizes the backup rate, storage server hardware, and file types that Code42 used for performance testing.

Backup rate

For testing, 2,000 devices backed up 50 files every 15 minutes. As a result, approximately 6,500 files backed up per minute. Code42 based this configuration on the typical inbound backup load for a Code42 server in the Code42 cloud.

Storage server server hardware

Code42 conducted indexing performance baseline testing with the following storage server hardware:

Component Configuration
CPU AMD Opteron 6212 (8 cores)
RAM 32 GB (8 GB allocated to the Code42 server software)
Database Hosted on a dedicated volume
Archive storage 5.5 TB volume
Code42 performed testing with a single store point and with four store points on a single volume.

Test file types

Code42 simulated three types of files to test indexing performance:

File Type Description Performance Impact Examples
Metadata only Binary files that have contents that cannot be examined for keywords. Low
  • PNG
  • WMV
  • MP3

Plain text

ASCII files that do not need to be parsed to extract key words from their contents.

Medium TXT
Content indexable

Files that must be parsed to extract key words.

High
  • PDF
  • RTF
  • DOC

File sets used for the tests

Code42 tested indexing performance using three specific sets of test files:

File Set File Types
Mostly metadata only files
  • 70% metadata only
  • 10% plain text
  • 20% content indexable
Even mix of files
  • 50% metadata only
  • 10% plain text
  • 40% content indexable
Mostly content indexable files
  • 30% metadata only
  • 10% plain text
  • 60% content indexable

Observed indexing performance

The following table summarized the observed indexing performance for each file set:

File Set Single Store point Four Store points
Index Rate During Backup Activity1 Index Rate Without Backup Activity Index Rate During Backup Activity1 Index Rate Without Backup Activity
Mostly metadata only files ~2,000–3,000 files per minute ~3,500 files per minute ~6,500 files per minute
(Keeping up with the backup rate)
~6,500 files per minute

Even mix of files

~2,000 files per minute ~2,000 files per minute ~4,000–5,000 files per minute ~4,000–5,000 files per minute

Mostly content indexable files

~1,000 files per minute ~1,500 files per minute ~3,500–4,000 files per minute ~4,000 files per minute

1 The backup rate for these tests was approximately 6,500 files per minute.

Single store point test analysis

The single store point configuration offers lower indexing performance because all archives are assigned to a single store point and one CPU core.

  • This configuration cannot index backed up files in real time for any tested file set.
  • Assuming each 24-hour day has 12 hours of backup activity (8-hour work day across 4 time zones) and 12 hours without backup activity, this configuration cannot index backed-up files within the same day. It is unlikely that this configuration will ever finish indexing all backed-up files during off-peak hours.
  • Based on observed performance, the single store point configuration is not appropriate for this scenario.
File Set Files Backed Up In 12 Hours Files Indexed Over 24 Hours New Files Not Indexed Each Day
Mostly metadata only files 4,680,000

4,320,000

360,000
Even mix of files 4,680,000

2,880,000

1,800,000
Mostly content indexable files 4,680,000

1,800,000

2,880,000

Four store point test analysis

The four store point configuration performs better because the archives are spread across multiple store points and CPU cores.

  • This configuration can index backed up files in real time for the mostly metadata only files file set.
  • Assuming each 24-hour day has 12 hours of backup activity (8-hour work day across 4 time zones) and 12 hours without backup activity, this configuration can index backed-up files within the same day. Indexing during off-peak hours makes this possible for the file sets that cannot be indexed in real time.
  • Based observed performance, the four store point configuration is appropriate for this scenario.
File Set Files Backed Up In 12 Hours Files Indexed Over 24 Hours New Files Not Indexed Each Day
Mostly metadata only files 4,680,000

9,360,000

N/A
Even mix of files 4,680,000

6,480,000

N/A
Mostly content indexable files 4,680,000

5,040,000

N/A