Legacy Match Criteria Options

Match codes defined outside the matching algorithm are legacy functionality but are still supported.

The following are supported legacy alternatives to decision tables. They are available when Match Criteria is presented as a flipper on the Matching Algorithm tab, instead of the decision table option which is available on a Match Criteria tab.

Important: These match criteria cannot be used by a matching algorithm with embedded match codes.

String Comparison Algorithms

While developing a matching, linking, and merging strategy, a string comparison algorithm can serve as the foundation for the matching process. The available string comparison algorithms include:

  • Levenshtein distance – A metric for how many edits (substitution, insertion, deletion) it takes to make one string look like another. For example, the Levenshtein distance between the strings 'AXR55487' and '8XRT5487' is 2 because the first and fourth digits are different. In STEP terms, the strings would be 75 percent alike (6/8*100).
  • Damerau-Levenshtein distance – Like the Levenshtein distance except that the transposition of two adjacent characters counts as one edit, not two. For example, the Levenshtein distance between the strings 'AA67' and 'A6A7' is 2 while the Damerau Levenshtein distance is 1.
  • Jaro / Jaro-Winkler distance – Outputs 0 or 1 where 0 is no similarity and 1 an exact match. These algorithms are available and can be made accessible in STEP via JavaScript but are not included in the STEP core.

Note: The Levenshtein / Damerau-Levenshtein distance must be manually converted into a percentage.

When the preferred string comparison algorithm is insufficient, it is possible to apply the Levenshtein / Damerau-Levenshtein distance directly to strings built using STEP functions and automatically output an equality metric. Several criteria can be added and assigned weights to calculate the total equality. The available criterion types are described as follows.

Multi Word Damerau-Levenshtein Distance

The Multi Word Damerau-Levenshtein distance is equal to the Damerau-Levenshtein distance except that the transposition of two words does not count as an edit. For example, the distance between 'Paul Johnson' and 'Johnson Paul' is 0. This criterion is useful when working with names where first name and surname are in the same attribute value, yet the order differs between objects.

Number Distance

The Number Distance criterion returns the relative distance between two numbers expressed as a percentage: lowest number / highest number * 100. This is a simplistic way of calculating a difference. For example, the numbers 1 and 2 will be as different or equal as 50 and 100.

Special cases:

  • If one or both strings are not numerical values, the criterion returns '0.'
  • If only one of the strings is '0,' the criterion returns '0.'
  • If both strings are '0,' the criterion returns '100.'
  • If both strings are negative the calculation is the highest number / lowest number * 100.
  • If one value is positive and the other negative, the criterion returns '0.'

Use STEP functions to generate the data that requires the number distance calculation.

JavaScript

The JavaScript criterion allows you to define your own algorithm for comparing objects. The only requirement is that the result is a number between 0 and 100 to represent the percentage of equality.

From the JavaScript criterion, use functions defined in business libraries in addition to the objects made available via bindings.

For more information, refer to the JavaScript Binds topic of the online help Resource Materials documentation here.