Who is this article for?
CrashPlan for Enterprise, no.
Code42 for Enterprise, yes.
CrashPlan for Small Business, no.
This article applies to Code42 cloud environments.
When you enable file metadata collection, Code42 scans and indexes all files on endpoints and in any monitored cloud data sources. This article answers frequently asked questions about this process, including:
- Endpoint data source: A single user device running the Code42 app.
- Cloud data source: A corporate cloud service you authorize Code42 to monitor. Examples include Box, Google Drive, and Microsoft OneDrive for Business. This does not include email data sources (such as Gmail and Office 365 email) because there is no initial file scan for email services.
- Initial ingest: The file scan process that indexes all files on an endpoint data source. This scan is performed by the Code42 app installed on each device.
- Initial index: The file scan process that indexes all files in a cloud data source. This scan is performed via a direct connection between Code42 and the cloud service and does not involve the Code42 app on user devices.
What is the initial scan?
Enabling File Metadata Collection initiates different scans for endpoint and cloud data sources:
- Endpoint data source: The initial ingest scans and indexes all files on the device.
- This ingest creates a record of all files on the device at the time file metadata collection was enabled. As a result, user devices might temporarily use a high percentage of CPU resources. Once the initial scan is complete, Code42 only monitors new and incremental file changes, using significantly fewer CPU resources.
- If you disable and then re-enable File Metadata Collection, the initial ingest file scan on the device starts over.
- Cloud data sources: The initial index scans and indexes all files in your organization's cloud drives. This scan does not affect user devices at all. Code42 connects directly to the cloud service to capture this data.
- After the initial index of your Google Drive environment, Code42 processes new files in existing drives immediately, and looks for new drives every 24 hours.
- For Box and Microsoft OneDrive, Code42 simultaneously starts monitoring file activity right away while completing the initial index. New files are typically discovered a few minutes after they are created.
Shared libraries in Microsoft are not indexed, discovered, or monitored
Code42 can only monitor drives in Microsoft OneDrive. While you can create a shared library within OneDrive, such libraries are actually created as Team Sites in SharePoint. Because Code42 cannot monitor sites in SharePoint, any shared libraries listed in your OneDrive environment are excluded.
How long does the scan take?
- Endpoint data source: Ingest times vary based on the number and size of files on each device. The processing power of the device and your Code42 CPU usage settings also affect how long it takes to complete. With the recommended CPU usage settings of 50% When user is away and 20% When user is present, devices can take several hours to several days to scan every file on the device.
- Cloud data sources: The length of time it takes for initial indexing to complete is dependent on the number of drives in your environment.
- For environments that contain hundreds of drives, initial indexing may take between 24 and 72 hours depending on the number of files in each user's drive.
- For environments that contain thousands of drives, initial indexing completes over a longer period. Typically, drives in larger environments complete the indexing process over these time frames:
- 60% of total drives complete between 24 and 72 hours
- 25% of total drives complete between 3 and 5 days
- 15% of total drives complete between 6 and 10 days
Can I start viewing file activity before the scan is fully complete?
- Endpoint data sources: Yes. As soon as a file is scanned and indexed, file events for that file are visible in Code42. In addition, file activity that may indicate an exposure risk (such as moving files to removable media, uploading to personal cloud services or email) are given priority over indexing all files on the device and are reported in near real-time.
- Cloud data sources: Yes.
- In Google Drive environments, file events become visible on a drive-by-drive basis. As soon as scanning completes for a drive, file events for files in that drive are reported in Code42.
- Code42 starts monitoring file activity in Box and Microsoft OneDrive environments right away while scanning and indexing drives. File events for all drives become available in Code42 soon after they occur, even if drives have not completed indexing.
How do I know when the scan is complete?
Endpoint data source
Scan status is visible in the Code42 app logs on each device:
- On the device, open the Code42 app.
- Enter the keyboard shortcut Ctrl+Shift+C (Windows) or Option+Command+C (Mac) to open the Code42 Commands interface.
- Enter the command
getlogsand press Enter.
The Code42 app compiles the logs and displays the location of the compressed archive.
- Navigate to the location of the exported log archive and open the archive.
- Locate and open the service.log.0 file.
- Search the service.log.0 file for these strings, which indicate the initial scan is complete:
Transitioned FFS ingest state from INITIAL_INGEST to SCAN_SUCCESS
Transitioned FFS ingest state from SCAN_SUCCESS to STEADY_STATE
If the above strings do not appear in the log file, the scan is still in process, or the scan completed long enough ago that the messages exist in an older version of the log file (for example service.log.1 or service.log.2).
Contact our Customer Champions for support if you need help determining the scan status for a device.
Cloud data sources
In the Code42 console, go to Administration > Integrations > Data Connections and review the Status column.
- A status of Monitoring indicates initial indexing of all drives is complete. For Box and Microsoft OneDrive, this status may change to Monitoring, indexing in progress when new drives are discovered.
- A status of Initializing or Monitoring, indexing in progress indicates that initial indexing has not yet completed.
- For Google Drive cloud data sources, a status of Initializing indicates that Code42 is discovering and indexing all in-scope drives for the first time.
- For Box and Microsoft OneDrive cloud data sources, a status of Monitoring, indexing in progress indicates that all drives in your cloud environment are being monitored for file activity. At the same time, each drive is being scanned and indexed as part of the initial index process.
- To view more detailed status information, click a row in the Data Sources table to open the details panel for that data source.
- For Google Drive, under Status, this panel lists the total number of drives found by Code42, the number of drives for which indexing is still in progress, and the number of drives that have completed the initial index process and are currently being monitored. Once a drive has been indexed, its file events become available in Code42. These same numbers for shared or team drives in your Google Drive environment are also listed.
- For Box and Microsoft OneDrive, under Status, this panel lists the total number of drives found by Code42 that are currently being monitored for file activity. File events for all drives become available in Code42 soon after they occur, even if drives have not completed indexing. This total is then broken down into the number of drives for which the initial index is still in progress and the number of drives that have completed the initial index process.
Why is a device using more CPU than the max allowed?
Because the initial scan reviews and indexes all files on a device, user devices might temporarily use a high percentage of CPU resources.
Code42 app CPU settings apply to the amount of CPU processing time dedicated to Code42, not to total CPU processing capacity. Therefore, if the CPU limit is to 20% (for example), the device's Task Manager or Activity Monitor may report the Code42 app is using more than 20% of the CPU at a particular point in time.
The processing time of the CPU is measured in instruction cycles. When you limit CPU use for the Code42 app to X%, you are specifying that the Code42 app is allowed use as much of the CPU capacity as it needs for up to X% of the available cycles. For example, if the CPU limit is set to 20%, the Code42 app can use up to 100% of the CPU 20% of the time. The remaining 80% of the time, the CPU prioritizes other process requests. This allows the Code42 app to work as efficiently as possible when it requests CPU resources, but limits the overall impact to the device.
For the best mix of performance and speed, we recommend setting the When user is away, use up to setting to 50% and the When user is present, use up to setting to 20%.
How much memory does Code42 use?
The Code42 app dynamically sets memory allocation to use 25% of the physical memory on the device. For example, if the device has 8GB of RAM, the Code42 app can use up to 2GB.