Matcher: Words

The Words Normalizer data element (as defined in the Data Element: Words Normalizer topic here) normalizes word data for two objects. The Words Matcher compares the normalizer output and generates a match score (also called the 'rank score' in Web UI) based on the weighted sum of relevant data elements and match factors. This allows you to define which elements are more important during matching.

When a match score is applied to the defined rules (refer to the Match Criteria Rules topic here), a final match score is determined to rank the likelihood of a match between the two objects.

Note: The Words Normalizer and Words Matcher are generic and can handle multi-word values including a wide range of data, such as customer names and social security numbers.

Considerations

An Unmatched Word Factor Table assigns weights to individual words that may be routinely be missing.

A Word Alias Table can be used to perform case-insensitive matching by alias. The Customer & Supplier MDM Configuration Guide in the Solution Enablement documentation refers to a Word Alias Table illustrated below that allows the matching to handle common name substitutions like Jasmine with Jasme or Jefferson with Jeff. A similar lookup table can be configured for words commonly encountered by this matcher.

Input

The Words Matcher takes input from the selected Words Data Element configured as List<String> for the two objects under comparison. The word-string provided as input to the Words Matcher may consist of several individual word-tokens.

Functionality

The Words Matcher processes a word-token from the first object with any word-token from the second object.

The comparison of each set of two word-tokens includes:

  1. Using the Word Splitter Regex to split the word-string value into word-tokens for comparison or create a word-token identical to the word-string when the Word Splitter Regex parameter is blank.

  2. Defining pairs based on word-token using the following methods:

    • Exact matching – applies the Exact Word Match Factor as a multiplier to the score.

    • Word Alias Table, if configured, performs case-insensitive matching by alias – Each word-token is scored individually. Any word-tokens that match based on the Word Alias Table is scored a multiplier equal to the Alias Word Match Factor. For example, 'Ajax Cleaning Supplies Co' compared to 'Ajax Cleaning Supplies Company' results in three (3) exact matches, and the Word Alias Table allows 'Co' to match with 'Company' so the Alias Word Match Factor is applied once. If the name-tokens do not match but have similar alias names, then the name-tokens are matched but with the Alias Word Match Factor score multiplier.

    • Metaphone 3 matching – The algorithm (which expands on Soundex) compares names based on their pronunciation. It works well on English words, non-English words familiar to Americans, first names, and family names commonly found in the United States. The Metaphone 3 Word Match Factor multiplier is applied to a match by Metaphone 3. For more information on Metaphone 3, search the web.

    • Edit distance matching (adjusting for a few wrong characters due to typographical errors) – If both name-tokens are at least 3 characters long, and one can be made identical with the other by adding, deleting, or changing a single character, the score multiplier is equal to the Edit Distance Word Match Factor.

    • Sequence matching – If tokens are out of order, a further penalty multiplier is determined by the Word Out Of Order Factor. For example, 'Ajax Cleaning Supplies' compared to 'Cleaning Supplies Ajax'.

    • Unmatched / Missing matching – If there are missing tokens, the score is penalized by multiplying with the Missing Word Factor. For example, 'Ajax Company Cleaning Supplies Inc' compared to 'Ajax Cleaning Supplies'. Since two words are missing, the factor is applied twice. The Customer & Supplier MDM Configuration Guide in the Solution Enablement documentation includes an Unmatched Word Factor Table that assigns the word 'Company' a special weight of 0.98 if exactly that word is missing, since it is often left out by people writing company names. If more than half the word-tokens are unpaired they considered not matching.

  3. Determine the final score by identifying the best score of matching any word-token on the first object to any word-token on the second object as defined by the following calculation: WordString Score= PairScore* MissingTokensMultiplier* OutOfOrderMultiplier*100

Configuring a Words Matcher

After adding the Words Matcher in the Matchers flipper of the Decision Table dialog (defined in the Match Criteria topic here), configure it as follows:

  1. Click into the Matcher column and click the ellipsis button () to access the configuration dialog.

  2. On the Not Configured dialog, the Settings tab is displayed.

    • For the required Input Normalizer, use the dropdown to select the associated Person Name Normalizer or enter a case-sensitive ID for the normalizer.

    • For the optional Word Alias Table, click the ellipsis button () and select a Transformation Lookup Table to substitute words with the same or similar meaning. The optional Name Word Splitter Regex runs before applying the Word Alias Table. Refer to the Considerations section above.

    • For the optional Condition Threshold, enter the minimum score required for the matcher to return 'True' on a rule.

      Note: Leave the Condition Threshold parameter empty when this matcher is used in more than one rule and the threshold varies based on the rule. For example, if one rule requires a match score of 70 while another rule requires 75, a default condition threshold can be confusing while troubleshooting. In that case, it is better to add the thresholds in the rules.

  3. Click the Advanced tab and update the default weights and factors as needed.

    • For the optional Word Splitter Regex, determine the Regex based on the data being processed:

      Data such as social security numbers (SSN) or DUNS numbers that should not be split: Remove the Word Splitter Regex parameter value so no splitting is performed, and the word-strings are identical to the word-tokens.

      Data such as location name, customer names, or sentence-like constructs: Add a Word Splitter Regex to split, such as the default which splits the word-string into word-tokens based on white spaces.

    • For the required Exact Word Match Factor, enter how greatly exact matches influence the final score.

    • For the required Alias Word Match Factor, enter how greatly words that are paired via aliases influence the final score.

    • For the required Metaphone3 Word Match Factor, enter how greatly pairs via Metaphone 3 influence the final score.

    • For the required Edit Distance Word Match Factor, enter how greatly pairs via edit distance influence the final score.

    • For the required Missing Word Factor, enter how much unpaired or missing words penalize the final result. To modify the factor for specific words, select an Unmatched Word Factor Table in the parameter below.

    • For the required Word Out of Order Factor, enter how much words that appear out of order penalize the final result.

    • For the optional Unmatched Word Factor Table, click the ellipsis button () and select a Transformation Lookup Table to assign factors to certain words and increase or decrease the significance of the unmatched word. Unmatched words that are included in this lookup table use the factor in the table instead of the Missing Word Factor from the parameter above. Refer to the Considerations section above.

  4. To test the configuration, for the Select Nodes parameters:

    • Click the ellipsis button () for each field and select two objects for comparison.

    • Click the Evaluate button.

      0.0 is displayed when a value is not available in one of the selected nodes or when the words do not match. Adjust as indicated by the Evaluator results and repeat the evaluation.

      When red text is displayed, hover to review information about the record. For example, a record that has been deactivated, and so it produces no match code and thus no match score.

  5. Click OK to save and display the configuration in the Matchers flipper.