Match Codes

The matching algorithm is typically configured first, followed by the match codes.

The purpose of match criteria is to determine if the current record matches another record in the database. The purpose of match codes is to provide a fast and efficient way to find the records that are potential matches and will score above the auto merge and clerical review thresholds. Records with at least one match code in common are compared with the match score, which is defined in the Match Scores topic here. Since the database can contain an incredible amount of data, algorithms use match codes to compare created results and process records quickly.

A match code is essentially a string (i.e., a text) that represents an object. Once generated, match codes populate a table sorted alphabetically. Rather than comparing every object with every other object in the dataset, only objects with at least one equal match code are compared.

In the example above, the product with STEP ID Item-548456 is the current record. Reviewing the product in the match code table shows that one other object has an identical match code.

Typically, it is necessary to use several different match codes to ensure matching records are compared. There is a balance between determining which match codes to use and how many match codes to use. It is important that matching records share at least one match code. Non-matching records should not share match codes since running full match criteria comparisons on those records will waste system resources.

Note: A match code definition can be exported as comments and submitted to an external source control system for comparison purposes. For details, refer to the Configuration Management documentation here.

Match Code Values

On a running system, match code values can be examined in workbench using the match code values tab on the matching algorithm. Match codes are expected to be relatively unique. A group of equal match codes is referred to as a match code group, which should be small. No match code group size should be larger than 100 and generally, most objects (95 percent) should be in a match code group with a size of 10 or smaller.

Use the following points to closely examine the data before configuring a match code:

  • The data profiling tool provides much valuable information. If you are planning to use a specific attribute in the match code, verify the degree to which the attribute is populated. If values are missing on a lot of objects, the attribute is likely not a good candidate or at least should not be used alone. Objects with empty values for a match codes are not compared based on that match code.
  • If an attribute is sufficiently unique, like an EAN number, the match code can be based on just that single piece of data.
  • If an attribute is less unique, like a name, it should be used in combination with other values in order to generate good match codes. An example is the Person Name and Address match code generator which is available for customer data.
  • When working with match codes combining several pieces of data, always put the most significant data first. For example, when deduplicating address objects, put the ZIP code before street and street number, since ZIP codes are geographic, standardized, and mutually exclusive, which most effectively separates addresses into discrete objects.
  • Normalize the data used in match codes. For example, if a manufacturer name is often abbreviated, the match code definition should ensure the name is represented the same way in the match codes, regardless if the source object is abbreviated or not.
  • Several match codes can be generated per source object, even by the same match code generator. Use STEP functions to resolve to a list of multiple match codes, and in JavaScript return an array. In these cases, each element is a separate match code. Consider, for example, a customer with several email addresses. Each email address should result in a separate email match code.
  • Sometimes an otherwise great identifier has exception cases that should be filtered out. Phone numbers are often very good match code candidates, but multiple contacts at a customer business may have provided the reception main number, resulting in a single match code group with hundreds of records. In this case, a match code filter can be applied to the phone match code to remove this exceptional case. For more information, refer to the Match Criteria Match Code Filter topic here.

Creating Match Code Values

On the matching algorithm, the methods used to create match code values are available as defined below. For information about each, review the following topics:

  • Match Criteria Match Code Generators (here) on the Match Criteria tab

  • Match Criteria Match Code Filter (here) on the Match Criteria tab

  • Configuring a Legacy External Match Code (here) on a separate Match Code object

Evaluator

The matching algorithm evaluator tool verifies results and can help identify unexpected results. In the evaluator, select two objects that you want to compare and click the Evaluate button. Detailed information is displayed including how the result was obtained. Additionally, the evaluators on individual sub components of the algorithm can be used to expose more details.