Evaluating Loqate Address Verification Codes
Understanding the components of Loqate’s address verification code (AVC) is integral to understanding the quality of the standardized address output. To verify an address, it must be compared to another address that is known to be correct. The collection of confirmed addresses is called reference data. At Loqate, reference data is stored in the Global Knowledge Repository. Reference data varies from country to country, often with better coverage in urban areas and less in rural towns. The AVC tells you how well the address you have matches with the reference data. The AVC is comprised of four sections, defined below: the verification status section, the parsing status section, the postal code status section, and the match score section. There is also an optional geoaccuracy code section.
The aforementioned sections use the AVC image below as an example.
Verification Status Section
The first section of the AVC is the verification status section. The three characters represent the verification status (V), the post-processed verification match level (4), and the pre-processed verification match level (4), respectively. In the example shown above, the verification status section is V44.
Verification status
The first character displayed is the verification status, which contains one of the following values:
-
V – Verified: A complete match was made between the input data and a single record from the available reference data.
-
P – Partially Verified: A partial match was made between the input data and a single record from the available reference data. Better data is available in the reference data for that country, but the input does not match it.
-
U – Unverified: Unable to verify. The output fields will contain the input data.
-
A – Ambiguous: More than one close reference data match.
-
C – Conflict: The record could not be verified to the specified minimum acceptable level.
-
R – Reverted: The record could not be verified to the specified minimum acceptable level. The output fields will contain the input data.
Post-processed verification match level
The second character displayed is the post-processed verification match level. This status defines the level to which the input data matches with the available reference data, once all changes and additions performed during the verification process have been considered. The valid values are as follows:
-
5 – Delivery point (postbox or subbuilding)
-
4 – Premise (premise or building)
-
3 – Thoroughfare
-
2 – Locality
-
1 – Administrative area
-
0 – None
Pre-processed verification match level
The third character displayed is the pre-processed verification match level. This status defines the level to which the input data matches the available reference data prior to any changes or additions performed during the verification process. The valid values are the same as listed above for the post-process verification match level.
Parsing Status Section
The second section of the AVC is the parsing status section. The three characters of this section represent the parsing status (I), the lexicon identification match level (LIML) (4), and the context identification match level (CIML) (4). In the example shown above, the parsing status section is I44.
Parsing status
The first character displayed is the parsing status, which is represented by one of the following values:
-
I – Identified and Parsed: All input data has been identified and placed into components.
-
U – Unable to Parse: Not all input data has been identified and parsed.
Lexicon identification match level
The LIML defines the level to which the output data has some recognized form through the use of pattern matching (e.g., a numeric value could be a premise number) and lexicon matching (e.g., ‘rd’ could be a ThoroughfareType, ‘Road’; ‘London’ could be a locality). The valid values are as follows:
-
5 – Delivery Point (postbox or subbuilding)
-
4 – Premise (premise or building)
-
3 – Thoroughfare
-
2 – Locality
-
1 – Administrative Area
-
0 – None
Context identification match level
The CIML defines the level to which the output data can be recognized based on the context it appears in. This is the least accurate form of matching and is based on identifying a word as one value that is preceded by a higher value and followed by a lower value. For instance, in the case of a thoroughfare preceded by a Premise, and followed by a locality, the latter items are identified through a comparison against the reference data or the lexicon. The valid values are the same as listed above for the LIML.
Postal Code Status Section
Postcode status
The third section of the AVC is the postcode status. Though there are two characters in this section, they represent a single status. In the above example, the postcode status is P3. The valid values are as follows:
-
P8 – PostalCodePrimary and PostalCodeSecondary verified
-
P7 – PostalCodePrimary verified, PostalCodeSecondary added or changed
-
P6 – PostalCodePrimary verified
-
P5 – PostalCodePrimary verified with small change
-
P4 – PostalCodePrimary verified with large change
-
P3 – PostalCodePrimary added
-
P2 – PostalCodePrimary identified by lexicon
-
P1 – PostalCodePrimary identified by context
-
P0 – PostalCodePrimary empty
Match Score Section
The fourth and final section of the AVC is the match score. This score is a number representing the level of similarity between the input data and closest reference data match. The generated match score is a value between 0 and 100, where 100 represents a complete match and 0 represents no match. In the above example, the match score is 100.
Geoaccuracy Code (optional)
Loqate provides optional standardization to return geocode information in addition to the standardized address. When enabled, a fifth set of values will be applied to the AVC. The geoaccuracy code is comprised of two parts: the geocoding status and the geocoding level. With geocode enabled, the original example might display as V44-I44-P3-100-P4, where P4 is the geoaccuracy code.
Geocoding status
The valid values for the geocoding status are as follows:
-
P – Point: A single geocode matches the input address.
-
I – Interpolated: A geocode can be interpolated from the input addresses location in a range.
-
A – Average: Multiple candidate geocodes match the input address, and an average of these is returned.
-
U – Unable to geocode: A geocode cannot be generated for the input address.
Geocoding level
The valid values for the geocoding level are as follows:
-
5 – Delivery point (postbox or subbuilding)
-
4 – Premise (premise or building)
-
3 – Thoroughfare
-
2 – Locality
-
1 – Administrative area
-
0 – None
Understanding the AVC
To unpack the Address Verification Code and understand how it has measured the strength of the address information, it is useful to review the component sections of the AVC.
Review the AVC's verification status section. Determine if the verification level is sufficient for your application. If the verification status is adequate (V44 in the example), confirm the match score is high enough to meet your requirements. If the verification status is less than a V4, it may mean that the best level of data available for that country is less than premise level.
Evaluate the verification status against the match score. If the verification status is high (V4), and the match score is low, the output has changed significantly in order to match the reference data. The best available verification status is 5 (delivery point). A delivery point could be a post office box, apartment number, suite number, etc., though many addresses are deliverable even with this component missing. If the reference data does not contain the delivery point data, then the highest status level best it will show is V4.
Review the parsing status (I44 in the example) to determine if thoroughfare and premise components were identified. This will give a greater indication as to the quality and completeness of the address, even if it could not be fully verified due to reference data limitations.
Understanding ambiguous results
Ambiguity occurs when the address matches more than one result in the reference data. Ambiguity in the address does not imply that the address is undeliverable; it means two or more distinct addresses could match with the input address (e.g., an office building that has multiple suites, where the reference data contains suite numbers but the input address does not). Ambiguities can arise in different ways. The categories of ambiguity level are as follows:
-
Ambiguity at the thoroughfare level: Ambiguity can arise when the input data does not fully define the thoroughfare level. An example for this category is the input: Main St as the thoroughfare value. In the reference data, both South Main St and North Main Street match Main St. This ambiguity will result in an A2 verification level.
-
Ambiguity at the locality level: Ambiguity is not necessarily a facet of the input data; it could also be a facet of the reference data. For example, there are geographies that have multiple dependent localities, like a neighborhood, that correspond to the same input address. Locality address hierarchies or geopolitical boundaries can cause this ambiguity.
Important: This kind of ambiguity happens even after premise, thoroughfare, and locality are all verified with no change.
-
Ambiguity at the premise level: Ambiguity can arise when multiple matches are available at the premise level. For example, an office building has multiple suites while the input data does not provide the suite number. In this case, the premise information may or may not include subbuilding data, and may or may not include a building value (e.g., an apartment complex name). In such a case, it is quite possible that there is still a match on premise number, thoroughfare, and locality and is most likely usable.