Data Element: Organization Name Normalizer

An organization name normalizer can normalize organization name data for use in the corresponding organization name matcher.

Considerations

As needed, create the following:

Replacement String Lookup Table - This lookup table should account for inconsistencies in organization names by defining semantically equivalent strings (especially in the usage of apostrophes and quotation marks). It is often a good way to remove accent, quotation and apostrophe characters and normalize non-Latin characters. Refer to the topic Transformation Lookup Tables in the Resource Materials online help documentation. For example:

Note: Although Transformation Lookup Tables can be manually ordered in the workbench, regardless of the order of the rows, system processing transforms punctuation first, followed by alphabetic characters. For example, one row with ‘ and another row with ‘s processes the ‘ first, meaning that the ‘s entry is not processed because the ‘ has been removed already.

Replacement Word Lookup Table - This lookup table is used for replacing or removing parts of an organization name that forms full words, like 'Inc' or 'Co'. When the normalizer runs, it replaces entire word occurrences of a 'From' entry to the 'To' entry sequentially from the first row to the last row. Refer to the topic Transformation Lookup Tables in the Resource Materials online help documentation. For example:

Name Split Regex - The default (\s+) splits on any 'white space' character like space, tab or line change, but can be modified to split on comma, semicolons or even '<multisep/>', depending on the source data.

Input

When configuring the data element:

The Organization Name Attribute field defines an attribute to be used as input.
The Input Parameters field allows selection of:
1. 'Use Attribute on Object' – by default, this option is set to ‘True’ and indicates to read attributes on the object itself. Click the Value dropdown to manually set it to 'False' when using information from a Data Container or an Input Normalizer.
2. 'Data Container' – read attributes from the data container.
3. 'Input Normalizer’ – read outputs from the selected Match Expression, as defined in the topic Matching Algorithms and Match Expressions.

Output

The output of an organization name normalizer is a java.util.Set< com.stibo.partydatamatching.domain.organizationname.OrganizationName>.

Functionality

The organization name normalizer automatically makes the following modifications to the organization name in the order listed for comparison purposes only:

Lower-case text
Apply the selected Replacement String Lookup Table - which case-insensitively replaces every substring occurrence of a 'From' entry in the Replacement String Lookup Table with the 'To' entry. Replacement is performed in the order of the table. This allows removal of characters, accents, quotations, or apostrophes. It can also be used to Romanize non-Latin characters.
Apply the selected Replacement Word Lookup Table, respecting the Name Split Regex as a word divider - which only makes replacements when entire words match the replacement table 'From' entry. Word divisions are defined by the Name Split Regex, to handle separation in the input by comma, <multisep/> tags, space, tabs, line feed, etc.

For example, consider the setup illustrated in the Considerations section:

Replacement String Lookup Table	From ['s] To []
Replacement Word Lookup Table	From [Inc] To [] From [Co] To [] From [&] To [and]
Name Split RegEx	\s+

Replacement String Lookup Table

From ['s] To []

From [‘n’] To [and]

Replacement Word Lookup Table

From [Inc] To []

From [Co] To []

From [&] To [and]

Name Split RegEx

\s+

Organization name normalization with this setup results in the following:

Configuring an Organization Name Normalizer Data Element

After adding the organization name normalizer in the Data Elements flipper of the Decision Table dialog (defined in the topic Match Criteria), configure it as follows:

Click into the Data Elements column and click the ellipsis button () to access the configuration dialog.
On the Organization Name Normalizer dialog:
- For the Organization Name Attribute, click the ellipsis button () and select the organization name attribute.
- For the Input Parameters, define the source of the data to be normalized. Refer to the Input section above for details.
  
  Right-click the arrow in the first column of the Input Parameters table for additional display and edit options. Although it appears that the default 'Use Attribute On Object' parameter can be removed, after closing the dialog it will continue to display. Instead, if a different input parameter is used, click the Value dropdown and manually set 'Use Attribute On Object' option to 'False.'
  
  Click the Add Input Parameter link to add other input parameters.
- For the Replacement String Lookup Table, click the ellipsis button () and select the Transformation Lookup Table asset created as defined in the Considerations section above.
- For the Name Split Regex, click the ellipsis button () add regular expression to split the value of the organization name attribute into words. Leave the default (removes any whitespace character zero or more times, such as spaces, tabs, and new lines) or add your own RegEx. For more information, refer to the topic Regular Expression in the Resource Materials online help documentation.
- For the Replacement Word Lookup Table, click the ellipsis button ()) and select the Transformation Lookup Table asset created as defined in the Considerations section above.
To test the configuration, for the Select Nodes parameters:
- Click the ellipsis button () for each field and select two objects for comparison.
- Click the Evaluate button.
  
  An empty result field indicates the value is not available in the selected node. Adjust as indicated by the Evaluator results and repeat the evaluation.
Click OK to save and display the configuration in the Data Elements flipper.