Configuring Matching Algorithms and Match Codes

The matching algorithm is typically configured first, followed by the match codes.

Matching Algorithm Configuration

The matching algorithm should be tailored to the data and strive for precision. The initial configurations include pre-configured Matching Algorithms that can be used as a foundation to build a client organization's Matching solution:

When configuring the matching algorithm, it is important to consider the impact that Thresholds have on match results. If the Clerical Review threshold is set too high, a large number of false negatives may be generated. Similarly, if the Auto Threshold is set too low, false positives could be generated. If the initial Matching configuration produces false negatives and/or false positives, the Thresholds should be reevaluated during the Match Tuning sessions.

As part of a proper matching strategy, confirmed duplicate and confirmed non-duplicate reference types must be configured and specified in the matching component model. It is often considered best practice to use unique reference types per algorithm rather than a shared reference for all matchable entity types. For example: A confirmed duplicate Organizations reference is used for the Organization to only display relevant decisions for each entity type, where if a shared reference for all matchable entities is used, these tabs will display all confirmed duplicates and non-duplicates.

For more information on configuring a matching algorithm, refer to the Configuring Matching Algorithms topic in the Matching, Linking, and Merging documentation.

Match Codes Configuration

Match codes should aim to ensure that anything given a high score by the algorithm is included, and that only those records that may score high get compared. The match codes should be considering the same data points that the Match Criteria does. This ensures that the comparison pool that match codes generate is relevant to the data points the Match Criteria is matching on.

For example, if the Match Criteria is matching on a combination of Person Name and Address, it is not recommended to generate match codes based on unrelated attributes (e.g., Phone and Email).

Example match codes

Typical attributes used to generate match codes for Individual (B2C) solutions include Name, Address, Email, and Phone. For Organization (B2B), these attributes potentially include Name, Address, Phone, DUNS (D&B Number), and Tax Identifier.

Email

Email is often used to narrow the pool of potential match candidates. The Email match code Generator may be selected to work in conjunction with an Email normalizer to auto-generated email match codes.

Example: A customer with the email "InesJung@armyspy.com" becomes the match code "EMAIL#INESJUNG@ARMYSPY.COM".

Phone

Phone shares many of the same benefits that email provides.

Example: A customer with the phone number '(931) 839-9039' becomes the match code "PHONE#19318399039".

In this example, the phone number normalizer is configured to default the country code to US.

Address

Address is often used in combination with person name or organization name. It is fairly unique if the full address is used but has low uniqueness if only part of the address is used.

The accuracy of addresses varies (i.e., one entry may include suite no. while another may not). When matching, techniques like edit distance is used on city and street. Because of this, and the fact that accuracy varies, the full address does not make a good match code, as it will likely lead to false negatives.

Addresses are often abbreviated ('st' for street or station, etc.) in complex patterns that cannot be uniquely resolved easily. For high accuracy of address matching, it is therefore recommended to use STEP’s Address standardization capabilities, which are integrated to Loqate.

The match code generator for addresses provides the following address combinations:

  • ZIP code + Street Name
  • Metaphone3 City + Street Name

Example: The match codes for "134 Trace Lane, Lawrenceville, GA, 30046" would be "ADDRESS#30046+TRACELANE" or "ADDRESS#LRNSFL+TRACELANE"

Name and Address

Combining elements of a person name and elements of an address is often a good way to create match codes that are sufficiently unique, without causing false negatives. However, multiple permutations are often required to avoid false negatives.

Example: Kimberly Kaine resides at 134 Trace Lane, Lawrenceville, GA, 30046. Their corresponding match code would be "INDIVIDUAL#K+K+30046+134 TRACE".

Other examples of Name and Address combination match codes:

  • First Name initial + Metaphone3 Last Name + ZIP code
  • Last Name initial + Metaphone3 First Name + ZIP code
  • First Name initial + Metaphone3 Last Name + Metaphone3 City
  • Last name initial + Metaphone3 First Name + Metaphone3 City
  • First name initial + Last Name initial + ZIP code + Street name
  • First Name initial + Last Name initial + Metaphone3 City + Street Name

For more information on configuring match codes, refer to the Match Codes topic in the Matching, Linking, and Merging documentation.