Match and Link
Using an asynchronous process, Match and Link creates and maintains a set of 'golden records' as an aggregation of matching 'source records'.
-
In Product MDM, Match and Link is commonly used in automating the creation and maintenance of sell-side products as golden records, based on buy-side products as source records.
-
In Customer MDM, Match and Link is commonly used for resolving household entities as golden records using individual customer entities as source records.
A detailed setup using Match and Link is described in the Accelerator for Retail Data Onboarding topic of the Accelerator for Retail section of the Product MDM Solution Enablement documentation here.
Data Model
In a Match and Link solution, source records and golden records will be separate records of different object types.
The golden records are created by survivorship rules, and every source record belongs to exactly one golden record.
Confirming a duplicate or non-duplicate in a Match and Link solution results in a reference being created on the source record level. In the Match and Link solution, the Confirmed Duplicate is a reference between two source records which permanently identifies two specific source records as duplicates. The Confirmed Non-Duplicate is the opposite, permanently confirming that two source records should never belong to the same golden record object.
Match Score
In a link solution, thresholds determine if records can be automatically linked or if manual review is required. The match score (also called the 'rank score' in Web UI) is the percentage of equality between the two records being compared as potential duplicates. Configuring a linking solution includes setting thresholds to determine the required percentage of equality for records to be linked.
-
The Auto Threshold is the equality percentage for automatic linking. Two source objects that meet the defined percentage are automatically linked to the same golden record.
-
The Clerical Review Threshold is the equality percentage equal to or below the Auto Threshold setting that triggers a manual review. Two objects that are within this range are sent to the clerical review workflow to be manually reviewed as potential duplicates.
Potential duplicates enter the selected clerical review workflow where a user then sets one of the following reference types:
-
Confirm Duplicate - A user manually confirms the records are duplicates. The duplicate source records are linked together by a 'Confirmed Duplicate' reference and will remain part of the same golden record from that point.
-
Reject Duplicate - A user manually rejects the records as duplicates. The source records are linked by a 'Confirmed non-duplicate' reference and will never again be made part of the same golden record.
-
Information Flow
When a user or a source system updates a source record, events are written to a Matching event processor. The Matching event processor lets the matching algorithm run a match on the source record against all existing source records that share a match code.
Source records with a match score above an Auto-Link Threshold will be linked to the same golden record. The golden record will be updated with information from all linked source records, according to a set of survivorship rules. For more information, refer to Survivorship in Match and Link topic here. The resulting golden record updates can trigger events that export the golden record to external systems.
Records with match scores between the Auto-Link Threshold and the Clerical Review Threshold are added to a Clerical Review Workflow. This allows a data steward user to manually identify if this is a Confirmed Duplicate or a Confirmed Non-Duplicate. A decision by the data steward is considered an update to the source record and can invoke the flow again depending on triggering events on the Matching event processor.
The golden record in a match and link solution should be considered a system-owned object. Users should not perform manual updates to the golden record since survivorship rules overwrite this information and the golden record may be deleted by the Matching event processor.
It is common to enrich golden records with information through an additional 'internal data' source record (sometimes referred to as a 'silver record' or an 'enrichment record') that is created and maintained in association to the golden record.
Information from an internal data source record is promoted to the golden record with survivorship rules by the Matching event processor.
Internal Data Source Objects
In Match and Link setups, there is often a need to maintain data on the golden record. Since the golden record is a system-owned object, data maintenance is performed on 'enrichment records' or 'internal data source objects' according to the following rules:
- A unique object type is required, one that is different from the object types of golden record and other source objects.
- Do not generate match codes for internal data source objects.
- In the Matching component model configuration, Source Object Type aspect, add the object type of the internal data source object.
- Golden records should use the same reference types for internal source objects and for other source objects.
To update the golden record automatically when an internal data source object changes:
- Configure the event processor to listen on events for internal data source objects.
- Create a business action to find the golden record for the internal data source object, identify one of the other source objects for the golden record, and then generate an event for that object for the event processor.
- Create an event filter condition that is always false since the original event for the internal data source object will not go onto the queue.
User Actions
Match and Link is supported by a range of tools in workbench and Web UI so the expert user can analyze the results of the matching algorithm and take actions.
The Match and Link specific actions are:
Confirm Duplicates: If two objects are confirmed as duplicates, a reference of the 'Duplicate Reference Type' specified in the component model and in the matching algorithm will be created, the pair will be removed from the 'Match Result' tab, and instead, will show up on the 'Confirmed Duplicates' tab on the matching algorithm.
Confirm Non Duplicates: If two objects are rejected as being duplicates, a reference of the 'Non-Duplicate Reference Type' will be created and the pair will be shown on the 'Confirmed Non Duplicates' tab on the matching algorithm.
It is important to understand that if a pair has been confirmed as duplicate / non-duplicate, the pair will not be considered when the matching algorithm is reapplied, regardless if the data on the objects has changed. The confirmed duplicate / non-duplicate relationship can be updated either via the 'Remove From List' options or by deleting the references.
Manual Merge of source records: If by Identify Duplicates or by 'Link golden record' two source objects are confirmed as duplicates, it is possible to manually merge them into a single object.