Creating an Image Deduplication Configuration

Important: Image Analytics Package / Image Deduplication: This functionality has been deprecated and is no longer supported and/or available for new installations. This documentation is retained as a reference only for customers already using the functionality and for whom it remains available in the current state. The functionality will be removed in the future so customers using this should make plans to transition away from their implementation of it.

Creating an Image Deduplication Configuration defines the group of images that will be evaluated, as well as the workflow that is used for error handling and managing duplicate sets identified for clerical review, whether or not auto-handling is enabled, and the matching threshold for images sent to clerical review.

Prerequisites

  1. Perform the required initial setup as defined in the Initial Setup for Image Deduplication topic (here).
  2. Users running the image deduplication configuration right-click options must have the STEP Workflow Administrator privilege. This is because if tasks are already in the workflow, they need to be removed when a new run is done and you need that privilege to remove tasks from a workflow. This is applicable to the user running the actual process in System Setup, NOT those handling clerical review tasks.

Configuration

Use the following steps to create a new Image Deduplication Configuration.

  1. In System Setup, select and right-click the Image Deduplication Configurations node, and then click the Create Image Deduplication Configuration option. The New Image Deduplication Configuration wizard displays.

Note: Hover over a parameter label to display information for its use.

  1. Enter an ID and Name for the configuration.
  2. For the Classification parameter, click the ellipsis button () and choose the image classification that will be deduplicated. All images in the selected folder and its children will be considered. Click the Select button.

Note: If the classification on a saved configuration changes, you should clear the stored values, as defined in the Clearing Stored Values section of the Running the Image Deduplication Process topic here.

  1. For the Clerical Review Workflow parameter, click the ellipsis button () and choose the image deduplication workflow. Click the Select button.

Activating the image deduplication component creates an image deduplication workflow named 'Image Deduplication.' A custom workflow can also be used for image deduplication, provided it meets the requirements outlined in the Workbench Configuration section of the Configuring Web UI for the Image Deduplication Clerical Review Workflow topic here.

  1. For the Auto-Handling Threshold parameter, select an option from the dropdown:
  • Select Yes to automatically handle images when possible.
  • Select No to manually handle all images via the Clerical Review workflow.

This parameter works together with the Clerical Review Threshold parameter below. Refer to the following Threshold Settings section for details.

  1. For the Clerical Review Threshold parameter, select an option from the dropdown:
  • Select No Clerical Review to prevent any images from being sent to clerical review when the Auto-Handling Threshold is set to 'Yes.' When all images in the group are a pixel-to-pixel match with the system-selected master, they are considered duplicates and are marked for deletion.

If any image in the group does not match the master pixel-to-pixel, the whole group is sent to clerical review, even with this 'No Clerical Review' selection.

  • Select Near Matches to send duplicate sets with a Hamming Distance = 0 to clerical review. Pixel-to-pixel matches are auto-handled if Auto-Handling Threshold is set to 'Yes.'
  • Select Very Similar Matches to send duplicate sets with a Hamming Distance = 1 or less to clerical review. Pixel-to-pixel matches are auto-handled if the Auto-Handling Threshold is set to 'Yes.'
  • Select Similar Images to send duplicate sets with a Hamming Distance = 2 or less to clerical review. Pixel-to-pixel matches are auto-handled if the Auto-Handling Threshold is set to 'Yes.'

This parameter works together with the Clerical Review Threshold parameter. Refer to the Threshold Settings section below for details.

  1. Click the Finish button to complete and save the configuration.
  2. Continue the process by following the steps in the Configuring Web UI for the Image Deduplication Clerical Review Workflow topic here.

Threshold Settings

The Auto-Handling Threshold parameter and the Clerical Review Threshold parameter work together to determine how duplicates are identified and processed. The possible settings and results are defined in the table below.

For additional information, refer to the Handling Duplicate Images topic here.

Auto-Handling Threshold

Clerical Review Threshold

Result

No

No Clerical Review

No image deduplication processing happens. An error is displayed when the configuration attempts to run.

No

Near Matches

Only duplicate sets of images with a Hamming Distance = 0 are sent to clerical review.

No

Very Similar Images

Only duplicate sets of images with a Hamming Distance = 1 or less are sent to clerical review.

No

Similar Images

Only duplicate sets of images with a Hamming Distance = 2 or less are sent to clerical review.

Yes

No Clerical Review

When all images in the group have a Hamming Distance = 0 and are pixel-to-pixel match to the system-selected master, all images are auto-handled.

Otherwise, if more than one image remains that does not match pixel-to-pixel to the master, all images are sent to clerical review (even though the Clerical Review Threshold is set to 'No Clerical Review').

Yes

Near Matches

When all images in the group have a Hamming Distance = 0 and are pixel-to-pixel match to the system-selected master, all images are auto-handled.

Otherwise, if more than one image remains that does not match pixel-to-pixel to the master, all images are sent to clerical review.

Yes

Very Similar Images

When all images in the group have a Hamming Distance = 0 and are pixel-to-pixel match to the system-selected master, all images are auto-handled.

Otherwise, if more than one image remains with a Hamming Distance = 1 or less but are not pixel-to-pixel matches to the master, all images are sent to clerical review.

Yes

Similar Images

When all images in the group have a Hamming Distance = 0 and are pixel-to-pixel match to the system-selected master, all images are auto-handled.

Otherwise, if more than one image remains with a Hamming Distance = 2 or less but are not pixel-to-pixel matches to the master, all images are sent to clerical review.