List Processing Remove Duplicates Operations

The 'Remove duplicates' operation uses a preconfigured Matching Algorithm to merge and remove duplicate records in a list. The selected matching algorithm must be configured to use the 'List Processing Deduplicate Records' Match Action to be valid in this list operation, otherwise a validation error will be presented.

Match groups are created based on the match codes assigned to every record in the list. If a record has more match codes, the record is included as part of more match code groups.

Within a match group, each record will be compared with the other records individually. Duplicate candidate records are added to a match group when the matching algorithm with the source record scores higher than the configured auto-merge threshold. If the source record is in another match code group, and new duplicate candidates are found, these records will be added to the first match group.

For performance reasons, each record will only be compared to the first hundred records. To decrease the size of match groups, consider match tuning the matching algorithm that the list process uses. For more information, refer to the Match Tuning topic in the Matching, Linking, and Merging documentation here.

Within each match group, the system selects an arbitrary record. If a 'Merge Keep First Handler' business condition is configured on the List Processing Deduplicate Records match action it will be executed and allowed to select the surviving record. The non-surviving records are removed from the remaining match groups, so that these records will no longer be used to compare with other records.

To use this operation, users must configure a matching algorithm for deduplication as defined in the Configuring Matching Algorithm for List Deduplication Operation section below. For more information, refer to the Configuring Matching Algorithms topic in the Matching, Linking, and Merging documentation here.

Survivorship rules can be configured to allow moving individual values from multiple non-surviving records to the surviving record. Only the Business Action survivorship rule is supported for list records; changes to the surviving record will not trigger a new matching action on that record. For more information, refer to the Configure the Business Action Survivorship Rule for List Processing section below.

Configure a Matching Algorithm for the List Deduplication Operation

  1. On the matching algorithm object, click the Edit Match Action link and select the 'List Processing Deduplicate Records' match action.

  2. On the 'List Processing Deduplicate Records' operation configuration dialog, configure as follows:

  3. For the Auto Threshold parameter, set the value above which the records are automatically merged.

    • For the Merge Keep First Handler parameter, optionally set a business condition that will decide which of two duplicate records would survive. If no business condition is configured, the surviving record is selected randomly.

      The following JavaScript binds may be used to configure a Merge Keep First Handler business condition.

      • Current Object - the current record

      • Secondary Object - the identified duplicate record

        Note: Duplicate and Non-Duplicate types are not necessary for this match action and will be ignored if those types are configured.

        The following is an example of a configured Match Keep First Handler condition.

  4. Configure the survivorship rules for the matching algorithm as defined in the following section.

Configure the Business Action Survivorship Rule for List Processing

Survivorship rules are used to move data values from the non-surviving records to the surviving records. When configuring, it is possible for the survivorship rule source object bind to have multiple source records. For more information, refer to the Survivorship in Match and Merge topic in the Matching, Linking, and Merging documentation here.

Note: Only business action survivorship rules are supported for this option. If any other survivorship rules are configured, these are ignored and cause configuration validation warnings.

When configuring the survivorship rules for the list processing deduplication algorithm, only two JavaScript binds are acceptable:

  • Current Object - the record that survives

  • Survivorship Rules Source Objects - the non-survivor duplicate record

The following is an example of a fully configured List Processing Deduplication matching algorithm. For more information, refer to the Configuring Matching Algorithm topic in the Matching, Linking, and Merging documentation here.

After adding all necessary operations, continue with Web UI setup as defined in the Configuring List Processing in Web UI topic here.