Skip to main content
Code42 Support

eDiscovery integration guide

Applies to:
  • Code42 CrashPlan (previously CrashPlan PROe)

Overview

The Code42 platform provides powerful tools for performing eDiscovery. This article explains the conceptual foundations of eDiscovery, how the Code42 platform can be leveraged to support it, and then guides you through concrete examples that you can adapt to your needs.

Considerations

Requirements

You must have access to a user account with the necessary privileges for the task or information that you need.

Recommended skills

We recommend that you have the following knowledge and skills (or be willing to learn):

The most important skill is the desire to experiment with the powerful and flexible Code42 API and to integrate it into your environment in order to leverage the powerful Code42 platform.

Data-related tasks supported by this article

There are overlaps between eDiscovery, data governance, analytics, and data visualization. Since these tasks have similarities and support each other, it is important to understand the following definitions:

  • eDiscovery refers to the process of discovery in legal cases when the information is in electronic format.
  • Data governance refers to the ways in which an organization attempts to minimize its compliance risk as well as to make sure that data is properly managed, kept secure, utilized effectively, etc.
  • Analytics is the search for and presentation of useful patterns and information in data. Business intelligence, planning, metrics, and many other business activities are supported by analytics.
  • Data visualization is one of the ways to present the findings and data gathered by analytics. It uses graphs, charts and other visual aids to communicate the significance of patterns in data.

The definitions given above should not be taken as strict lines of separation. For example:

  • Analytics may be used in the process of eDiscovery.
  • Data visualization may be used to plan and implement a data governance project.
  • Data governance policies can affect eDiscovery.

This article provides the conceptual tools and examples needed to leverage the Code42 platform for eDiscovery. For information on legal holds, see our article Putting A Hold On Users and Collecting Data For eDiscovery.

How the Code42 platform supports eDiscovery

The Code42 platform supports eDiscovery with the following components:

  • Code42 CrashPlan app
  • CrashPlan web app
  • Administration console
  • LDAP user management
  • Code42 API

EDiscovery summary

eDiscovery consists of a number of steps and functions. The following diagram depicts the general workflow:

E-discovery process chart

You may engage in some but not all of the steps, elect to carry out the steps in a different order, or cycle back to earlier steps.

Here is a list of the steps with associated sub-goals, for easy review:

  • Identification:
    • Beginning the legal hold process
    • Locating and verifying potential sources.
  • Preservation: ensuring protection against inappropriate alteration or destruction.
  • Collection: gathering for further use in the eDiscovery process.
  • Processing: searching and converting into forms more suitable for review and analysis.
  • Review: evaluating for relevance and privilege.
  • Analysis: evaluating for content and context, including key patterns.
  • Production: delivering data in appropriate forms.
  • Presentation: displaying results / reports.

EDiscovery functional mapping

The following table explains how the Code42 platform's features can be used to accomplish tasks, listed as functional requirements, for each of the steps in the eDiscovery process.

Step Functional Requirement Code42 Feature Or Resource
Identification Verify or find users
  • User resource of the Code42 API
  • Search using administration console
  Move users into a "legal hold" organization
Preservation / Collection Change deleted file retention period
  • Inherit "legal hold" organization settings
  • Computer resource of the Code42 API
    • PUT method used to change settings
  Change archive retention period (also known as cold storage)
  Change user roles
  Change file inclusions/exclusions
  Deactivate users
  Deactivate computer
  • ComputerDeactivation resource of the Code42 API
  • Manually via administration console
  Deactivate plans
  • PlanDeactivation resource of the Code42 API
  • Manually via administration console
  Deauthorize computer (require user to sign in again)
  • ComputerDeauthorization resource of the Code42 API
  • Manually via administration console
Process / Review / Analyze Search for filename or folders of interest across archives

The following Code42 API resources:

  • ArchiveMetadata
  • WebRestoreSearch
  • PlanEvent
  • PlanSummary
  Search for MD5 across archives

The following Code42 API resources:

  • ArchiveMetadata (with custom script)

  • PlanEvent
  • PlanSummary
  • FileInfo
  Analyze restore activity

The following Code42 API resources:

  Analyze version history

The following Code42 API resources:

  • ArchiveMetadata
  • WebRestoreSearch
  • PlanEvent
  • PlanSummary
Production Restore files
  • CrashPlan app
  • Administration console
  • The following Code42 API resources:
    • PushRestoreJob
    • WebRestore
    • File
  Restore archives
  • CrashPlan app
  • Administration console
  • The following Code42 API resources:
    • PushRestoreJob
    • File (GET method)
  • pushRestore.sh script using the Code42 API
  Restore versions
  • CrashPlan app
  • Administration console
  • The following Code42 API resources:
    • PushRestoreJob
    • File (GET method)
Presentation Generate MD5 report
  • Integrated product
  • The following Code42 API resources:
    • ArchiveMetadata
    • FileInfo
  Generate files and versions report
  • Integrated product
  • The following Code42 API resources:
    • ArchiveMetadata
    • PlanEvent
    • PlanSummary
  View user restore history
  • Integrated product
  • The following Code42 API resources:
After Hold Release from legal hold
  DoD-wipe/shred (secure delete)

Set system property using the Code42 API:

  • C42.shred.enabled
  Purge Archive
  • Add to file exclusions
  • ComputerDeactivation resource of the Code42 API

Code42 API resources

The entire list of Code42 API resources are potentially useful for eDiscovery, but the following subset are known to be particularly useful in eDiscovery tasks:

  • ArchiveMetadata
  • Computer
    • Retrieve all endpoint device information.
    • Configure particular devices with new settings needed to implement eDiscovery processes.
  • DeviceBackupReport
  • PushRestoreJob
  • RestoreHistory
    • Retrieve histories for all restore jobs that occurred during a defined period of time.
    • Retrieve histories for all restore jobs involving a specific user, device, organization, etc.
    • Requires an ID parameter, such as orgId, userId, or computerId in addition to the 'days' parameter.
  • RestoreRecord
    • Query an Code42 server for information about restore jobs.
    • Query can be used to determined who performed a restore, when it was performed, what the source and destination devices were, etc.
    • See a Data Leak Prevention solution that uses this resource.
  • User
    • Retrieve information about users.
    • Perform actions on users, such as deactivation.
  • WebRestoreInfo
    • Provides important metadata about archives, such as user info, source device, destination identity, security type used to protect archive, etc.
  • ComputerDeactivation, UserDeactivation
    • Purge archives.
  • PlanSummary, PlanEvent
    • Information about plans and activity occurring with plans.

Additional API information

  • Code42 API Documentation Viewer
    • The online API Documentation Viewer provides you with the latest documentation.
    • All resources are described in detail, including methods, arguments, parameters, and examples.
  • Sample Code on the Code42 GitHub site
    • These code examples can provide useful examples that you can adapt to your needs.
    • Please contact sales about engaging our PRO Services team for help with adapting code examples, or for the creation of customized scripts.
  • Code42 API Overview

Examples

The following examples are meant to provide insight into how the Code42 platform can be integrated with eDiscovery functions. As examples, they are not guaranteed to be suitable for any eDiscovery process without modification, review, and approval by your organization's compliance officer.

Restore history report with the administration console

As part of the eDiscovery process, you may need to determine who has restored files from a particular organization and when the restores occurred. To do this, perform the following steps:

  1. Sign in to the administration console.
  2. Select Organizations.
  3. Select an organization.
    The Organization Details appear.
  4. Click the number of Restores to view the Restore History page.
  5. From the action menu, select Export All to download the restore history as a CSV file.

Search logs

As part of the eDiscovery process or other forensics needs, you may need to search the system logs of your Code42 server or the logs stored on your endpoint devices running the CrashPlan app.

CrashPlan app logs

You can access CrashPlan app logs in the following ways:

CrashPlan app log example

The endpoint file system is the only place to find a persistent copy of the path names of the files restored by a CrashPlan app-initiated restore. The information is stored in the file restore_files.log.*, which can be retrieved using the administration console as described above, or by accessing the endpoint device file system. Here is an example of the information available about the path names of restored files:

I 03/05/14 06:01AM 622091232443159553 Starting restore from CrashPlan PROe Server: 1 file (80KB)
I 03/05/14 06:01AM 622091232443159553 Restoring files to /Users/joe.johnson/Desktop
I 03/05/14 06:01AM 622091232443159553 /Users/joe.johnson/Desktop/test.pdf 
I 03/05/14 06:01AM 622091232443159553 Restore from CrashPlan PROe Server completed: 1 file restored  @ 26.6Kbps

Search logs from the command line

Search the logs using sed, grep, egrep, or another utility.

Advanced log file analytics with third party tools

Log files also can provide a source of data for powerful third-party data analysis and visualization tools such as Splunk. We provide instructions on integrating your Code42 platform with Splunk. You can even forward your Code42 server's logs to a Splunk server for powerful data analytics capability.

Devices report with the administration console

You may need to produce a list of all active (or deactivated) devices as part of the eDiscovery process.

To create and download a list of all active devices in your Code42 environment:

  1. Sign in to the administration console.
  2. Select Devices.
  3. From the action menu, select Export All.

To download a CSV list of all deactivated devices:

  1. Sign in to the administration console.
  2. Select Devices.
  3. From the action menu, select Show Deactivated.
  4. From the action menu, select Export All.

Custom scripts

The code examples below are meant to provide examples of the use of the Code42 API in ways that can support eDiscovery. Code42 does not provide any guarantee on the suitability of any script or code example for any particular application.

Script 1: Find archive files based on MD5 hash signature

Purpose

This script searches CrashPlan archives for directories or files with a specific MD5 hash (signature). You can limit the search to one device or org, or you can search all archives that are managed by your master server. This script can be run from any device that is able to communicate with the master server via the required ports.

In depth

Since each version of a file has a unique MD5 signature, it can be used to find any copies of a specific file version in any archive.

Technical notes
  • This script is compatible with any device or workstation that supports the bash shell environment.
  • The script requires you to enter certain parameters:
    • The username and password of a user with the necessary roles and permissions. A user with the SYSADMIN role is recommended.
    • The IP address or fully qualified domain name of your master server.
    • Whether you wish to use http or https for communication with the master server. HTTPS is recommended for security.
    • The MD5 checksum to search for. You can calculate this with a built-in or external tool or utility. For example, on a Mac, you can use the "md5" command from a terminal window.
    • The scope of your search:
      • All: search all archives managed by the master server, including archives in cold storage.
      • Org: search all archives managed by a particular organization. In this case, you must provide the organization's registration key.
      • Device: search all the archives associated with a particular endpoint device. In this case, you must provide the device's GUID.
  • The script outputs a results file in the same directory that the script is run from.
Source code

You can download latest version of the MD5 hash script from the Code42 GitHub site.

Script 2: Traverse the restore tree for a specific restore session

Purpose

This script can be used to traverse or "walk" a restore tree for a particular restore session that is located on a storage server, in order to gather metadata for any file that was restored during the session.

In depth

The technique used in this script could be used as part of an effort to investigate which files have been restored from the archives of a particular storage server. This could be useful in an eDiscovery effort to find out which files and versions of files have been accessed by initiating a restore. Restores initiated by a user using the CrashPlan app, as well as restores initiated by an admin from the administration console (web restores and push restores), can be analyzed in this manner.

The process of traversing the restore tree is summarized in the outline below. Please see the actual script for the full details. This script is a good basis for an application for searching the restore tree for a particular destination or storage server.

  1. Enter the master server's URL and your admin credentials.
  2. Enter the source GUID of the device you wish to investigate.
    • The source device GUID must be known, but you can retrieve that using the search feature of the administration console, or use the "User" resource of the Code42 API to get a list of devices belonging to a user.
  3. Enter the GUID of the destination you are searching. This can be found on the Destinations Details of the administration console.
  4. The script outputs the data in JSON format for the storage server that contains the source device's archive.
  5. Enter the URL of the storage server you wish to search.
  6. Enter the authentication token if necessary.
  7. Enter the data key token for the session, to allow the storage server to use the data encryption key. The encryption key is necessary in order for the storage server to retrieve the desired metadata.
  8. From the list of restore sessions, choose and enter a particular restore session ID.
  9. The script outputs data in JSON format that includes the file ID for the root level folder.
  10. Enter the file ID for the root level folder of the restore session.
  11. The script outputs data in JSON format that includes all of the files and folders under the root folder.
  12. Choose and enter a file ID for a folder or file of interest.
  13. If you want to view metadata for a subfolder, simply repeat step 12 until you reach the level of the directory tree you wish to view.
    • The file metadata in JSON format includes all available information for the file or folder, including:
      • Creation date
      • Last modification date
      • Checksum
      • Full path
      • File size
Source code

You can download the latest version of the restore tree script from the Code42 GitHub site.

An in-depth example with sample output is also available.

Script 3: Automate push restores

Purpose

It may be necessary to perform push restores of select files or user data, in order to secure and preserve the data for eDiscovery.

In depth

Please read our article on automating push restores, which includes a sample script, detailed examples, and sample output with explanations.

Source code

You can download the latest version of the push restore script from the Code42 GitHub site.

Script 4: Data leak prevention and detection

Purpose

This script monitors and protects the CrashPlan archives of selected users in your Code42 environment against unauthorized or suspicious restore activity.

In depth

Read the detailed article.

Source code

You can download the latest version of the restore watch script from the Code42 GitHub site.

Other Code42 API examples

Please browse the Code42 API examples on our GitHub site for more examples of ways to use the Code42 API in your eDiscovery projects.

Powerful data analysis with the Code42 platform

Please read about how the can work with Splunk for powerful data analysis.

Still unsure?

Please contact Sales for information on our consulting options.