Algorithm & Match Codes - Individual Customer

Matching Algorithm

The Individual Matching Algorithm delivered as part of the initial configuration is designed as a match algorithm for Individual Customer solutions. The most common data to have available for matching Individual Customers are First Name, Middle Name, Last Name, Address, Emails, and Phone numbers

This algorithm consists of four normalizers and four matchers, with an Auto Threshold of 90.0 and a Clerical Review Threshold of 60.0.

For more information on configuring a Matching Algorithm, refer to the Configuring Matching Algorithms topic of the Matching, Linking, and Merging documentation here.

Normalizers

Normalizers are used to standardize values that are being compared. This ensures equal formatting is applied, increasing the accuracy of the comparisons being made. For more information, refer to the Match Criteria Data Elements topic of the Matching, Linking, and Merging documentation here.

Person Name Normalizer

The Person Name Normalizer is configured to normalize the corresponding first, middle, and last name attributes (e.g., FirstName, MiddleName, LastName).

For customers with a large, non-English speaking consumer base, it is recommended to normalize accents and diacritic characters. Such characters are not handled with the phonetic encoding of words (e.g., Soundex or Metaphone 3) during the match process.

Address Normalizer

Because it is recommended to model addresses as data containers, configure the Address Normalizer to normalize data container attributes as defined within the Address component model.

Email Normalizer

Because it is recommended to model emails as data containers, configure the Email Normalizer to normalize data container attributes as defined within the Email component model.

Phone Normalizer

Because it is recommended to model phone numbers as data containers, configure the Phone Normalizer to normalize data container attributes.

Matchers

For general information on configuring Matchers, refer to the Match Criteria Matchers topic of the Matching, Linking, and Merging documentation here.

Person Name Matcher

The Person Name Matcher is largely left with the default settings. However, an alias table is used to provide an equivalent names table.

Third-party sources may be leveraged to build and enhance the equivalent names table within STEP. However, it is recommended that the client’s expertise with their customer data be consulted to account for industry or business specific patterns. For example, if a company is based in southwestern United States then their equivalent names table may contain an emphasis on Hispanic name equivalents. Furthermore, equivalent name values may also be added as a result of a match tuning exercise with the client’s customer data set.

Regarding middle names, an evaluation exercise with the customer is recommended to review the quality of data they have for customer middle names. It is possible that middle names are not collected from the consumers, or only middle initials are required but rarely provided. In such cases, it is recommended to reconsider the weight of MiddleName.

Note: The names in the equivalent names table provided by the initial configuration target the US market.

Address Matcher

The Address Matcher utilizes default configuration values.

Email Matcher

The Email Matcher utilizes default configuration values.

Phone Matcher

The Phone Matcher utilizes default configuration values.

Rules

When considering match rules, the recommended strategy is to dissect the customer’s information into the smallest possible portions of data. These rules should not weigh the sum of all the customer’s input data and should instead be split so that it is possible to optimize each rule. Careful analysis of the customer dataset is required to determine what combinations of attributes present the best chance of uniqueness.

Three rules are provided to calculate the final score of the Individual Matching Algorithm. According to the configured conditions, if all respective Matchers resolve to true (i.e., scores above 70), then the algorithm will take the highest scoring rule as the final score of the algorithm. The rules are comprised of combinations of each Name score, Address score, Email score, and Phone score. The rules are then standardized to resolve to a value between 0 and 100.

These rules specifically include:

  • Name & Address
  • Name & Email
  • Name & Phone

In this scenario, Name is used in all three rules because it is common for family members who use the same email and/or phone number to live at the same address. Thus email, phone number, or address are not reliably unique by themselves. By adding Name to these rules, we are ensuring the uniqueness of the individual’s name is taken into consideration, in addition to the other contact information elements.

It is possible to extend a rule by including various other combinations of matchers. This should be considered if there are specific requirements or use cases which requires specific combinations of matchers.

By adding another parameter, you can identify false positives. Unique identifiers such as Social Security, Passport, or Driver’s License Numbers may be used as veto rules to further enhance the quality of the match process.

Other extensions include Date of Birth (DOB), which can be used in combination with other rules to be less strict on equality. For example, name and address runs the risk of a father and son having the same name which would be resolved by considering the date of birth. Additionally, you can accept lower scores of names and addresses if DOB is equal.

Survivorship

The following survivorship rules are used by the Individual Customer Algorithm:

  • Value: Most Recent
    • Attribute / Attribute Group: Individual - Most Recent
    • Last Edit Date Attribute: Last Edit Date - Record
  • Data Container: Most Recent (Emails)
    • Business Condition: DataContainer Survivorship Email
    • Data Container Type: Emails
    • Last Edit Date Attribute: Last Edit Date - Email
  • Data Container: Most Recent (Phones)
    • Business Condition: DataContainerSurvivorshipPhone
    • Data Container Type: Phones
    • Last Edit Date Attribute: Last Edit Date - Phone
  • Data Container: Most Recent (Main Address)
    • Business Condition: DataContainer Survivorship Address
    • Data Container Type: Main Address
    • Last Edit Date Attribute: Last Edit Date - Main Address

Note: Data Containers require their own survivorship rules. Additionally, each Survivorship rule requires a unique Last Edit Date attribute.

Match Codes

For the Individual Customer entity type, three separate Match Codes are being generated. While based on the demographics of the customer record, these Match Codes are composed of: Email, Phone Number, and a combination of Individual Name and Address.

For information on how to configure Match Codes which are housed in matching algorithms, refer to the Match Codes section of the Matching, Linking, and Merging documentation here.

Email Match Code

The Email Match Code is the normalized value of the email address attribute. Both the username (or local part) of the email and the domain are normalized to ensure variations of the same email address (because of differing cases or special characters) are accounted for.

It is recommended to add a discernible prefix to each Match Code so the end-user may easily identify what attribute(s) were used for the Match Code. The email Match Code contains the prefix 'EM~'.

Phone Number Match Code

The Phone Number Match Code is the normalized value of the individual’s phone number attribute. The normalization removes any parenthesis and hyphenation in between numbers. Additionally, the area itself has been removed, leaving only the last 7 digits available as the Match Code. Phone number Match Code is prefixed by 'PH~'.

Address and Name Match Code

The Address and Name Match Code is a combination of elements of the individual’s name and address. For example, the provided Match Code within the initial configuration is composed of zip code + the first letter of the individual’s first name + Metaphone 3 representation of the individual’s last name.

The Address and Name Match Code contains the prefix 'ZINM~'.

Configuration Considerations

It is worth considering the use of the Equivalent Values Lookup Table and Anonymous Value Lookup Table. The Equivalent Value Lookup Table is used by both Match Codes and the Match Criteria to ensure that values that mean the same thing are evaluated as such. Equivalent values will score appropriately high, as if the values were actually the exact same.

Example:

  • Name: Matt = Matthew

Note: Equivalent Values are only used for person & organization names.

The Anonymous Values Lookup Table is also used by both Match Codes and the Match Criteria to ensure that values that are anonymous, or not meaningful, do not contribute to identifying potential duplicates. Determining what these values should be is highly dependent on the organization's dataset.

Typically, these values are default values that users of a Source System enter when they do not have the correct value, or do not want to enter a value. The actual anonymous values are not included in the baseline build of the Customer MDM configuration.

Examples:

  • Phone: 999999999
  • Address: DO NOT USE