During clerical review for a match and merge solution, a data steward could face thousands of records that must be either merged or rejected. The Machine Learning Match Recommendations (MLMR) ease the workload by providing recommendations for merging or rejecting based on the data steward's previous decisions. This functionality works entirely on the Clerical Review Task List and does not influence the matching algorithm. When using the recommendations combined with the filtering and merge / reject bulk update capabilities, the data steward can resolve the task list more rapidly.
The MLMR uses the data steward's merge / reject decisions within the Clerical Review Task List to train a machine-learning model based on those decisions that provides merge and reject recommendations as a label on each task, which the data steward can either heed or disregard.
Within a matching algorithm, the user can create a matching agent. Once enabled, the matching agent collects merge and reject decisions made in the Clerical Review Task List, which are stored as a local copy. The ASPiRE environment then uses these decisions to train a machine-learning model using the matching agent data model configured on the matching algorithm. With the trained model, the ASPiRE environment produces merge and reject recommendations, which are shown in the Clerical Review Task List.
Note: The machine-learning model and recommendation algorithm runs as a multitenant microservice in ASPiRE and is maintained outside of the normal STEP release cycle. The MLMR feature is new and adjustments to the machine-learning algorithm will happen, which could lead to changes in the number of recommendations given in the Clerical Review Task List.
The matching agent will start providing the first recommendations after a minimum of 30 reject decisions and 30 merge decisions. After that, it continuously provides a new set of updated recommendations every time 10 percent more tasks are completed. Depending on the number of tasks in Clerical Review, it might take some time before the recommendations are shown in the Web UI. The training and recommendation process runs as background processes (BGP) that you can monitor in the workbench.
The number of recommendations provided depends on the decisions made by the data steward. If the decisions are very inconsistent, meaning that similar tasks are both merged and rejected, then it is likely that only few recommendations are given. On the contrary, if decisions are consistent for similar patterns in the data, then the matching agent gives more recommendations. In the beginning, when the data steward has made less than 200 – 300 decisions, the number of recommendations can vary from training to training but will stabilize over time as the data steward makes more decisions.
Note: When performing more than 20 merge or reject decisions in one operation, those decisions are not included as training data and have no influence on future recommendations.
The matching event processor updates new and changed tasks with a new merge / reject recommendation. This happens when an enabled matching agent exists that has successfully completed the training process.
The following topics outline the setup and function of the MLMR: