Match and Merge Flow Details
The level of detail for the following selected match and merge flows are intended to assist integrators and administrators in understanding and troubleshooting a match and merge solution.
This topic includes the following flow charts:
-
Inbound Record Flow
-
Event Processor Flow
-
Merge Flow
Inbound Record Flow
The Configuring the Match Data Exchange Method topic (here) describes that the inbound source records can come from either and inbound integration endpoint or a web service. The flow of these is quite similar.
The Information Flow section of the Match and Merge topic (here) explains that the incoming source record is matched against the existing golden records, and if a match is found, the information from the source record is merged into the relevant golden record using survivorship rules. If no match is found, or if the match is uncertain, a new golden record is created.
The actual identification flow is detailed in the image below. This identification flow is the same for asynchronous Integration Endpoints and synchronous web service endpoints.
During an import, source records listed in the imported file are created as temporary STEP objects, which can be acted on by business rules, with some limitations. The references to and from the temporary object are not fully established, and the golden record ID is a temporary ID. A permanent ID is assigned later.
The diagram above details the decision to either create a new Golden Record or identify the existing Golden Record to be updated. This constitutes the 'matching' part of match and merge, even though in many systems the majority of source records are identified by an ID.
After identifying where the information in the source record belongs, the survivorship rules part of the merge determines which (if any) source values get promoted to existing golden records. When the import process is complete, the temporary object is either discarded or saved as part of the source traceability, depending on the 'Keep Source Records for Golden Record Object Types' setting in the Matching - Merge Golden Record component model, as defined in the Configuring the Matching - Merge Golden Record Component Model topic here.
Note: During both matching and merging, the incoming source record is accessed as a temporary STEP object. As a consequence, business functions of both matching and merging are run before references to other objects are established. It is not possible to query for references to and from the source record during the import. Furthermore, match codes can only depend on values on the current object, and survivorship rules are only allowed to update values on the current object.
Event Processor Flow
The recommendation is to have one matching event processor process events across a multitude of matching algorithms if event triggering can be correctly shared between them. As a result, the matching event processor includes a number of matching algorithms and loops over them.
-
For every matching algorithm, the event processor loops through the nodes in the event batch and updates all match code values across the event batch for that matching algorithm.
-
For every such event node, the event processor updates the match codes on the current object (again). That ensures that match codes for the entire batch are not outdated and also that if anything is updated during the event batch processing, the match code on the current event node is current.
-
For every event node, after the match codes are updated, the match scores are calculated, and the matching algorithm is applied.
Note: These flow diagrams and the descriptions in this topic, do not describe the consequence of using multiple match and merge match algorithms on the same golden record object type separated by Category, as defined in the Configuring Matching Algorithms topic here.
When potential duplicates are identified, their match scores are calculated, and action is taken, the flow is a bit more advanced, as shown below.
As illustrated above, when more than 100 objects share the same match code they are not guaranteed to be compared using the matchers of the matching algorithm.
When the score exceeds the auto merge threshold, the Event Processor invokes the Merge Flow shown in the following section.
Merge Flow
The merge flow is invoked with standard survivor selection and with manual survivorship selection from Advanced Merge.
Standard Survivor
In the standard version, the survivor is selected based on the Merge Keep First Handler, if configured. Otherwise, the oldest Golden Record is the survivor.
The standard version is invoked:
-
Automatically by the event processor, when the match score of two potential duplicate records exceeds the auto merge threshold.
-
Manually, when the user chooses the merge action directly on the clerical review task list.
-
To populate the Advanced Merge UI with the initial merge result.
When clicking a merge button in the Advanced Merge dialog, the user can manually:
-
Select a survivor which bypasses the Merge Keep First Handler survivor selection.
-
Select surviving values from source records which are applied to the survivor.
-
Add values that should survive, which are applied to the survivor.
Survivorship rules are run on a temporary object which has some consequences when writing JavaScript survivorship rules. JavaScript survivorship rules must deal with references towards the survivor pointing to the permanent object, while updates from already-run survivorship rules are only available on the temporary object.
The Deactivate nonsurvivor step in the flow diagram performs the following actions:
-
Removes all Confirmed Duplicate relations from the non-survivor
-
Removes all Confirmed Non-duplicate relations from the non-survivor
-
Deactivates the non-survivor Golden Record – that is, sets the deactivated attribute to deactivated. If using the starter package, deactivated corresponds to Value = "Yes" with Value ID = true.
-
Re-targets references that pointed to the non-survivor to now point to the survivor
-
This can fail, for example, if the reference is a data container key on a data container, in which case the update could otherwise result in the survivor having several data containers with the same key.
-
The re-target may also discover the reference is already present, which results in no change.
-
-
Ensures a major revision on the non-survivor and on the survivor, with appropriate revision comments describing the merge
-
Adds Merged Into Relation from the non-survivor to survivor
-
Removes unmerged from references from the non-survivor
-
Copies non-survivor source information to the survivor
-
Moves Source Record traceability tracking to the survivor
-
Removes all Match Codes from the non-survivor. Match codes are never created on deactivated objects, ensuring they are not part of matching in the future.
-
Adjusts clerical review workflow tasks to account for the deactivation. If the now-deactivated record was the last potential duplicate, the review task needs to be closed, even if the merge was initiated from Event Processor. This may also result in a new task being created for the survivor, and that could now match entirely new potential duplicates.
-
Removes all match scores for the non-survivor