Matcher: Organization Name

The Organization Name Normalizer data element (as defined in the Data Element: Organization Name Normalizer topic here) normalizes the organization name data for two objects. The Organization Name Matcher compares the normalizer output and generates a match score (also called the 'rank score' in Web UI). The final score calculation is based on a number of match factors available under the Advanced tab in the configuration, which allow you to tune the matcher towards your specific data set and business need.

When a match score is applied to the defined rules (refer to the Match Criteria Rules topic here), a final match score is determined to rank the likelihood of a match between the two objects.

Considerations

A Word Alias Table can be used to perform case-insensitive matching by alias. The Customer & Supplier MDM Configuration Guide in the Solution Enablement documentation refers to a Word Alias Table (illustrated below) that allows the matching to handle common organization word substitutions like 'co' with 'company'. For more information, refer to the Transformation Lookup Tables topic in the Resource Materials section of the online help here.

An Unmatched Word Factor Table assigns weights to individual words that may routinely be missing due to people writing company names in a hurry. For example, 'Ajax Company Cleaning Supplies Inc' compared to 'Ajax Cleaning Supplies'. Typically, missing words penalize the score according to the ’Missing Word Factor’ parameter. However, if the missing words ('Company' and 'Inc') are in the Unmatched Factor Word Table, the designated factor for each word will be taken from that table instead. In the unmatched word factor table below, a lot of missing words are set to a penalty of 0.98, impacting the score much less than the 0.7 that is default for the ‘Missing Word Factor’. The Customer & Supplier MDM Configuration Guide in the Solution Enablement documentation refers to an Unmatched Word Factor Table that is illustrated below. This table can also be used to assign certain words an even harsher score impact when they are missing. For more information, refer to the Transformation Lookup Tables topic in the Resource Materials section of the online help here.

Input

The Organization Name Matcher takes input from the selected Input Normalizer. This is usually an Organization Name Data Element. The matcher retrieves all organization names for the two objects under comparison.

Functionality

The Organization Name Matcher considers every organization name of the first object in the match context and compares each of those with every organization name of the second object in the match context. The final score of the Organization Name Matcher is the highest score of any two organization names. Refer to the Matching Algorithms and Match Expressions topic here.

The comparison of each set of two organization names includes:

  1. Using the Name Word Splitter Regex to split the organization name value to create name-tokens

  2. Defining pairs of name-tokens between the two objects

    Possible pairings for organization name tokens are:

    • Exact match – Scores 100.

    • Word Alias Table – If configured, performs case-insensitive matching by alias. Any name-tokens that match based on the Word Alias Table is scored a multiplier equal to the Alias Word Match Factor. For example, 'Ajax Cleaning Supplies Co' compared to 'Ajax Cleaning Supplies Company' scores the first three (3) tokens at 100 as exact matches. The final token 'co matches 'Company' by the alias table and scores a multiplier according to Alias Word Match Factor.

    • Concatenation matching – If two name-tokens in one organization name can be concatenated to match one name-token in the other organization name, it receives a score multiplier equal to the Concatenation Word Match Factor. For example, 'Ajax Cleaning Supplies Co' compared to 'Ajax Cleaningsupplies Co' scores the first and last words as 100. The middle name-tokens of the first object can be concatenated to match the name-token of the second object. Concatenated name-tokens must match exactly, so a good normalization is important for this comparator to work.

    • Edit distance matching – (adjusting for a few wrong characters due to typographical errors) - If both name-tokens are at least three (3) characters long, and one can be made identical with the other by adding, deleting, or changing a single character, the Edit Distance Word Match Factor is applied.

    • Acronym matching – If a name-token in one organization name is an acronym of the list of name-tokens in the other organization name, the Acronym Word Match Factor is applied. For example, 'Ajax Cleaning Supplies' compared to 'ACS' is a match, 'Ajax C S' compared to 'Ajax CS' is a match. Ordering of the acronym letters is important, so 'Ajax Cleaning Supplies' compared to 'ASC' is not a match.

  3. Determine score penalties

    • Sequence matching – If tokens are out of order, a further penalty multiplier is determined by the Word Out Of Order Factor. For example, 'Ajax Cleaning Supplies' compared to 'Cleaning Supplies Ajax'.

    • Unmatched / Missing matching – If there are missing tokens, the score is penalized by multiplying with the Missing Word Factor. If more than half the name-tokens in any organization name are unpaired they are considered not matching. Specific words can be assigned a higher or lower missing word penalty score by using the Unmatched Word Factor Table, described in the Considerations section above.

  4. Determine the final score by comparing an organization name from the first object to an organization name from the second object

    The final score of an Organization Name Matcher is the best score of matching any organization name on the first object to any organization name on the second object.

Configuring an Organization Name Matcher

After adding the Organization Name Matcher in the Matchers flipper of the Decision Table dialog (defined in the Match Criteria topic here), configure it as follows:

  1. Click into the Matcher column and click the ellipsis button () to access the configuration dialog.

  2. On the Not Configured dialog:

    • For the required Input Normalizer, use the dropdown to select the associated Organization Name Normalizer or enter a case-sensitive ID for the normalizer.

    • For the optional Word Alias Table, click the ellipsis button () and select a Transformation Lookup Table to substitute words with the same or similar meaning. Refer to the Considerations section above.

    • For the required Exact Word Match Factor, enter how greatly exact matches influence the final score.

    • For the required Alias Word Match Factor, enter how greatly words that are paired via aliases influence the final score.

    • For the required Concatenation Word Match Factor, enter how greatly pairs where one is concatenated and the other is not concatenated influence the final score.

    • For the required Edit Distance Word Match Factor, enter how greatly pairs via edit distance influence the final score.

    • For the required Acronym Word Match Factor, enter how greatly pairs where one is an acronym and the other is not an acronym influence the final score.

    • For the required Missing Word Factor, enter how much unpaired or missing words penalize the final result.

    • For the required Word Out of Order Factor, enter how much words that appear out of order penalize the final result.

    • For the optional Unmatched Word Factor Table, click the ellipsis button () and select a Transformation Lookup Table to assign factors to certain words. Refer to the Considerations section above.

    • For the optional Name Word Splitter Regex, leave the default to remove space characters or enter a different RegEx to split the value into words.

    • For the optional Condition Threshold, enter the minimum score required for the matcher to return 'True' on a rule.

      Note: Leave the Condition Threshold parameter empty when this matcher is used in more than one rule and the threshold varies based on the rule. For example, if one rule requires a match score of 70 while another rule requires 75, a default condition threshold can be confusing while troubleshooting. In that case, it is better to add the thresholds in the rules.

  3. To test the configuration, for the Select Nodes parameters:

    • Click the ellipsis button () for each field and select two objects for comparison.

    • Click the Evaluate button.

      0.0 is displayed when a value is not available in one of the selected nodes or when the organization names do not match. Adjust as indicated by the Evaluator results and repeat the evaluation.

      When red text is displayed, hover to review information about the record. For example, a record that has been deactivated, and so it produces no match code and thus no match score.

  4. Click OK to save and display the configuration in the Matchers flipper.