Matcher: Address
The Address Matcher compares the normalized address data of two objects and generates a match score (sometimes referred to as a 'rank score') based on the weighted sum of relevant data elements and match factors.
The Address Matcher defines weights and factors that will be applied to the match score based on different criteria. As described in this documentation, these allow you to adjust the address matching score to fit specific use cases.
Input
The Address Matcher takes input from the selected address data element and retrieves data for the two objects under comparison.
The intent is to use an Address Normalizer data element (as defined in the Data Element: Address Normalizer topic here) to normalize address data and use that data element as input to the address matcher.
Functionality
The Address Matcher considers every address in the set of input addresses of the first object and compares each of those with the set of addresses of the second object. The final score of the address matcher will be the highest score of the comparisons.
The comparison of each set of two addresses includes:
-
Using the Street Word Splitter Regex to split the Street attribute value to create street-tokens
-
Separating street-tokens into number-street-tokens and text-street-tokens
-
Defining internal temporary scores for the comparison:
-
Text Score – All text-street-tokens from the first address are paired up with all text-street-tokens on the second address. These pairings try to find exact matches, and if that is not possible, attempts to match within an edit distance. After these pairings, unmatched text-street-tokens and paired text-street-tokens where the order is different receive further penalties to the score. (Edit distance adjusts for a few different characters due to typographical errors and is only applied when the text-street-tokens are at least three (3) characters long.)
-
Number Score – Calculated exactly like the text score but can be assigned different multiplier factors. By default, the number score weighs heavier in the final score, but this can be modified on the Advanced tab.
Note: If both comparable street addresses do not contain any numbers, a Number Score will not be evaluated. Instead, a Text Score only will result in a final Street Score.
-
Street Score – Compiles the text and number scores. By default, numbers in an address are assigned a heavier weight than the text, but this can be modified on the Advanced tab.
-
Postcode / City Score – Determines the score based on the following:
-
If both addresses have postal codes, the City / Postcode score is 100 if the normalized post codes are an exact match, or 0 if there is not an exact match.
-
If either address has an ISO country code of 'US', the comparison of postal codes considers only the first 5 digits.
-
If at least one address lacks a postal code, then the cities are compared. An exact city match results in City / Postcode score 100.
-
If a single insertion, deletion, or substitution of a character could make the cities equal, the City / Postcode score is 0.9—unless the city name is shorter than 5 characters, in which case the City / Postcode score is heavily penalized according to the actual length of the name.
-
-
Refer to the Token and Scoring Examples section at the end of this topic for a detailed explanation.
Important: The Address Matcher performs best when each customer has fewer than 100 addresses. For example, comparing organization customers with many addresses results in a lot of comparisons and can degrade performance.
Configuring an Address Matcher
After adding the Address Matcher in the Matchers flipper of the Decision Table dialog (defined in the Match Criteria topic here), configure it as follows:
-
Click into the Matcher column and click the ellipsis button ()) to access the configuration dialog.
-
On the Not Configured dialog, the Settings tab is displayed.
-
For the required Input Normalizer, use the dropdown to select the associated Address Normalizer or enter a case-sensitive ID for the normalizer.
-
For the optional Condition Threshold, enter the default minimum score required for the matcher to return 'True' on a rule.
Note: Leave the Condition Threshold parameter empty when this matcher is used in more than one rule and the threshold varies based on the rule. For example, if one rule requires a match score of 70 while another rule requires 75, a default condition threshold can be confusing while troubleshooting. In that case, it is better to add the thresholds in the rules.
-
-
Click the Advanced tab and update the default weights and factors as needed.
-
For the required Postcode and City Weight, enter the relative weight of the Postcode / City score versus the Street score.
-
For the required Street Weight, enter the relative weight of the Street score versus the Postcode / City.
Note: The Street score is a weighted sum of the Number Words score and the Text Words score.
-
For the required Text Words Weight, enter the relative weight of the Text Words score versus the Number Words score.
-
For the required Number Words Weight, enter the relative weight of the Number Words score versus the Text Words score. By default, the number score weighs heavier in the final score than text words.
-
For the required Text Exact Word Match Factor, enter how greatly exact matches influence the final score.
-
For the required Text Edit Distance Word Match Factor, enter how greatly words that are paired via edit distance influence the final score.
-
For the required Number Exact Word Match Factor, enter how greatly pairs that are exact matches influence the final score.
-
For the required Number Edit Distance Word Match Factor, enter how greatly words that are paired via edit distance influence the final score.
-
For the required Missing Word Factor, enter how much unpaired or missing words penalize the final result.
-
For the required Word Out of Order Factor, enter how much words that appear out of order penalize the final result.
-
For the optional Street Word Splitter Regex, leave the default to split on white spaces or enter a different RegEx to split the Street value into words.
-
-
To test the configuration, for the Select Nodes parameters:
-
Click the ellipsis button () for each field and select two objects for comparison.
-
Click the Evaluate button.
0.0 is displayed when a value is not available in one of the selected nodes or when the addresses do not match. Adjust as indicated by the Evaluator results and repeat the evaluation.
When red text is displayed, hover to review information about the record. For example, a record that has been deactivated, and so it produces no match code and thus no match score.
-
-
Click OK to save and display the configuration in the Matchers flipper.
Token and Scoring Examples
The following shows the process of compiling a score when comparing two entities with similar addresses in Germany.
The default weights and factors are used in this example.
-
The Street Word Splitter Regex creates number-street-tokens and text-street-tokens.
Address attribute value |
Street-tokens |
---|---|
22 Damm Spandauer, 14059 Berlin, Germany |
|
Spanduer Damm 22, Berlin, Germany |
|
Text-street-token pairing |
Object 1 Value |
Object 2 Value |
Result |
---|---|---|---|
Text Exact Word Match Factor |
Damm |
Damm |
Text-street-token is an exact match |
Text Edit Distance Word Match Factor |
Spandauer |
Spanduer |
Text-street-token has a text-edit-distance of 1 |
-
Text score: Determine exact matches and edit distances for text-street-tokens between two objects.
The edit distance is only applied when the text-street-tokens are at least 3 characters long.
Text Score Elements |
Setting |
Object 1 Value |
Object 2 Value |
Result |
---|---|---|---|---|
Text Exact Word Match Factor |
1.0 |
Damm |
Damm |
exact match |
Text Edit Distance Word Match Factor |
0.8 |
Spandauer |
Spanduer |
text-edit-distance of 1 |
Text score before penalties |
1.0 * 0.8= |
|
|
0.8 |
-
Text score: Consider all text-street-tokens for sequence and missing words.
Text Score Elements |
Setting |
Object 1 Values |
Object 2 Values |
Result |
---|---|---|---|---|
Word Out of Order Factor |
0.8 |
Damm Spandauer |
Spanduer Damm |
Order is not the same |
Missing words |
0.8 |
|
|
All tokens are matched |
Total text score |
|
|
|
0.64 |
Note: If there were missing tokens ('Spandauer 10-22' compared to 'Spandauer Damm 10-22') the score would be further penalized by multiplying with the Missing Word Factor, which defaults to 0.8.
-
Number score:
Number Score Calculation |
Setting |
Object 1 Value |
Object 2 Value |
Result |
---|---|---|---|---|
Number Exact Word Match Factor |
1.0 |
22 |
22 |
Numbers are exact matches |
Total number score |
|
|
|
1.0 |
-
Street score:
(TextWordsWeight*textScore + NumberWordsWeight *numberScore) /
(TextTokensWeight + NumberTokensWeight)
Street Score |
Elements |
Calculation |
Result |
---|---|---|---|
(TextTokensWeight*textScore = 5 |
Text Words Weight = 30.0 Text Score = 0.64 |
30.0 * 0.64 = |
19.2 |
|
|
|
+ |
NumberTokensWeight *numberScore) |
Number Words Weight = 70.0 Number Score = 1.00 |
70.0 * 1.00 = |
70 |
|
|
|
/ |
(TextTokensWeight + NumberTokensWeight) |
Text Words Weight = 50.0 Number Words Weight = 50.0 |
50.0 + 50.0 |
100 |
Total street score |
|
|
0.892 |
-
City / Postcode score:
If at least one address lacks a postal code, then the cities are compared. An exact match results in City / Postcode score 1.00.
City / Postcode Score Calculation |
Score |
Object 1 Value |
Object 2 Value |
Result |
---|---|---|---|---|
Post Code |
1.00 |
14059 Berlin |
Berlin |
Only one postcode |
Total city / postcode score |
|
|
|
1.00 |
-
Final Single Address score:
(Postcode and City Weight* City/Postcode score + Street Weight * Street score) /
(Postcode and City Weight + Street score)
Final Single Address score |
Elements |
Calculation |
Result |
---|---|---|---|
(Postcode and City Weight* City/Postcode score |
Postcode and City Weight = 50.0 Text Score = 1.00 |
50.0 * 1.00 = |
50 |
|
|
|
+ |
Street Weight * Street score) |
Street Weight = 50.0 Street Score = 0.892 |
50.0 * 0.892 = |
44.6 |
|
|
|
/ |
(Postcode and City Weight + Street score) |
Postcode and City Weight = 50.0 Street Weight = 50.0 |
50.0 + 50.0 |
(100) |
Final Single Address score |
|
|
94.6 |