How many false positives is your address tool spitting out?
An address false positive during the address validation process occurs when an entered address matches an incorrect or non-existent address without an error message or sufficient indication that the address is incomplete or wrong.
Address false positives are dangerous for businesses because it can lead to lost revenue, increased liability, flawed data intelligence, misrouting, and upset customers.
While no tool is immune to false positives, Smarty™ works hard to minimize them and tell you when a false positive is possible. Our address autocomplete APIs only suggest fully valid addresses, which also helps minimize false positive risks. Try them out here or read on to learn more about the types and dangers of false positives.
US & International Address Validation | US Address Autocomplete | International Address Autocomplete |
---|---|---|
In this article, we'll discuss:
- Address tools and the false positives they create
- How do address false positives cost you money?
- Ways that false positives infiltrate your data
- How Smarty prevents false positive address data
Address tools and the false positives they create
Address validation tools are built with a primary use case in mind, such as navigation or shipping. Each primary use case influences development decisions and can predict the most probable type of false positives it'll return.
We'll cover some primary use cases that address tools are built around, along with potential false positives you're likely to encounter while using them.
Geocoding tools, risk of over-aggressive matching
Address geocoding is the process of identifying the latitude and longitude coordinates for an entered address and is useful for spatial analysis, risk assessment, and efficient routing. Geocoding is difficult and becomes more difficult with messy addresses. i.e., they contain typos, misspellings, or aren't standardized.
Messy addresses lead to low match rates. After all, if you can't find the address, you'll also fail to find the matching geocode or pinpoint the address accurately on a map.
Enter address validation. By cleaning up, standardizing, and validating the address first, you'll achieve a much higher match rate for geocodes. But it's possible to clean an address up too much, and over-cleaning creates false positives.
For example, let's validate this address with a tool designed with geocoding in mind:
A customer enters an address during checkout on your website. They live at 102 Suncrest BLVD but accidentally hit "0" twice and submitted 1002 Suncrest BLVD without noticing. But "1002 isn't a valid house number on Suncrest BLVD in the customer's city. An invalid response would tell the user that the address entered was incorrect and they need to fix it.
But tools built for geocoding take a different approach. Instead, they try to find the closest geocode to the address entered and change the house or building number to find a match.
See the problem yet?
The closest real building number on Suncrest to "1002" is "1010" So, the geocoding address tool changes the address to "1010 Suncrest BLVD" and returns a "successful match" and no message indicating that they changed the building number to find the match. In this case, the "match" is over a mile from the customer's address.
Now we have a false positive resulting from an overly aggressive matching algorithm that's willing to change too much of the address. From there, it's easy to see how false positives quietly enter your system and wreak havoc whenever this address is used for analysis, risk assessment, shipping, mailing, or anything else.
The solution? Certain parts of the address shouldn't change during validation. The goal isn't to find ANY match but to find the CORRECT match.
Tell the user they made a mistake, and allow them to correct it or flag the address for manual review. Catching the error early on leads to fewer false positives.
If this scenario would cause problems for your organization, you should find out if your address validation tool will update the building number to find a match. Your high match rate could be signaling bad data.
Next, let's cover shipping validation tools.
Shipping tools, risk of overmatching building & secondary numbers
The most common false positives with shipping tools happen with how they treat building and secondary numbers (apartments, suites, etc.). Here's how they handle each.
Building numbers
When you're a shipping company with trained drivers visiting every street in the country daily, knowing the exact building number may not be necessary since the driver can figure out the rest once they get to the correct street. Or at least, that's the belief.
The United Parcel Service (UPS) is one such company that built an address validation tool and API to fit its own shipping purposes. UPS' tool and API completely validate the country, state, city, ZIP Code, and street name.
The false positives stream in because they only perform range validation on house/building numbers. Range validation is where they check that a building number is within the range of correct values found on a street.
For example, Champa St in Denver, Colorado, runs between 1100 St and 3300 St. UPS will flag the address as incorrect only if the building number entered is less than 1101 or more than 3299. This means there are 2,198 possible building numbers that get labeled as valid.
How many actual, valid building numbers are there on Champa St.?
There are only 352, which leaves 1,846 invalid entry combinations that get the green light from UPS and become false positives.
To decrease false positives, address validation should tell you if the street actually has a building number that exactly matches what the user enters. Without a match, the address should be flagged for review before going further.
Apartments/suites and other secondary addresses
Secondary numbers are also largely ignored by address tools built for shipping. UPS wouldn't tell you important things about the secondary address information submitted, like if the secondary address is missing from the entry, incorrect, or unnecessary.
Address validation should be able to tell if an address of an apartment building should contain secondary address information.
Not validating the secondary number or the building number creates thousands of possible combinations of invalid address overmatching that'll slip past UPS and other address validation tools if they don't fully validate the building number AND the apartment number.
Don't feel bad for UPS for all the lost packages, though. UPS can wipe their tears away with the $20 bills from the $19.50 fee they charge whenever they update an incorrect address for a package in transit. You might even say it's a lucrative income stream.
Mapping & navigation tools, risk of infinite address overmatch
Google does mapping, so many assume that the Google Places Address Autocomplete API will help them prevent false positive address data from entering their system.
Unfortunately for those users, Google Places doesn't validate or standardize the addresses served in their autocomplete API. Google built the API as an aid for quick address entry and navigation.
A goal during development was avoiding a "no results found" response. Accordingly, Google Maps will give you a location for virtually any gibberish you enter.
We tried it out and successfully found geographic locations for "No Buts", "No Nuts", and "No Coconuts" in Google Maps.
Since much of Google's data comes from volunteers, Google users, and open-source projects, bad data can slip in easily.
The goal of avoiding "no results found" responses extends to the addresses served by the Google address autocomplete API.
Here's an easy example. Just meander over to Google Maps, and search for an obviously fake address like "123 Fake Street". Google's Places Autocomplete API has no problem serving up 5 entirely fictitious addresses. Not one of these addresses is mailable. Check it out:
Examples like this abound. Functionally, it works for navigation but not so much for data quality. Services like this don't validate addresses; they're giving you a match for any entry regardless of what address components they change to get you there. But, if you're changing the city, ZIP Code, building number, and street name to find a match, the match probably isn't useful.
You can read more about Google and how their address autocomplete works compared to other providers.
Google's Autocomplete and UPS Validation aren't the only address tools not fully validating addresses. There are many other address tools out there that behave similarly to UPS and Google. They work well for their designed function but not for other uses. Now, on to address quality tools and the false positives they create.
Tool: Address quality, false positive type: matching the wrong valid address
Of all the types of address validation tools in the market, tools designed for address quality are the least likely to produce false positives.
Why?
They're unlikely to skip steps in the address validation process because the quality would suffer, so they fully validate the country, state, city, ZIP Code, street name, house/building, and secondary unit information. Sure, it's more work to build and maintain than the other use cases mentioned, but if quality is the defining goal, false positives must be destroyed on sight.
By validating every part of the address, the probability of false positives decreases, but they're still possible due to the nature of address validation. The address validation process only returns if the address is real but can't know if the address is correct for the intended use.
For example, a customer enters an address like this:
2707 Champa St, Denver, Colorado 80205
This address will pass any address validation tool's scrutiny because it's a real & valid delivery point. At the same time, this address is also a false positive.
Why?
Because the customer doesn't live at 2707 Champa St, they live at 2705 Champa St.
During entry, the user typoed the "5" in the building number, yet still managed to enter a combination of numbers that matched a real address on their street. In this case, it was their neighbor's address. So a bad address can still enter the system when this happens.
The probability of false positives when the actual building number and secondary is validated is much lower than with a shipping validation system that only validates down to a range of building numbers.
As mentioned, 352 valid postal delivery points exist on Champa St, including apartments/suites and other secondary addresses. For a false positive to be created, the typo would have to match one of those 352 valid number combinations, have all relevant secondary information correct, and not be noticed by the user to pass validation.
Going back to our UPS example above, UPS would flag the address as incorrect only if the building number entered is less than 1101 or more than 3299.
Conversely, UPS' range validation (ignoring numbers less than 1101 or more than 3299) creates a minimum of 2,198 possible false positive combinations. Possible false positive combinations spiral out to 10,000+ when multiplied by the secondary address information, which UPS doesn't validate either. False positives and other address errors can be costly, and now we'll count the ways.
How do address false positives cost you money?
How false positives cost you money depends on how you use address data. Here are a few examples:
-
Shipping false positives may cause misdeliveries, lost goods, returned packages, increased customer service, shipping delays, damaged goods, reshipments, & reputation loss.
-
Mailing false positives may cause duplicate mailings, lost mail, poor mail penetration, failure to qualify for bulk mail discounts, compromised customer information, and increased fraud liability.
-
Address-based business intelligence false positives may lead to poor decisions based on incorrect or incomplete data.
-
Address database false positives can cause duplicate customer records, customer datasets that don't merge appropriately, and unreliable data.
-
Installer, delivery, and outside sales false positives can cause increased mileage, missed appointments and sales opportunities, inefficient routing, and higher demands on customer support.
Once false positive address data enters your system, it's challenging to identify and nearly impossible to correct. Why spend time fixing something when it can be prevented at the outset?
Next, let's cover some common entry points where false positives enter business systems.
Where false positive addresses infiltrate your data
Bad address data can be costly, and preventing it can be tricky since it enters your system in many ways. Here are a few ways false positive address data can enter your system.
- Customer-facing ecommerce and address entry Forms - Humans make mistakes, especially as we rely more on touchscreen devices. Form entry errors are more common now than ever before. At the same time, customers have higher expectations of the companies they work with to identify and fix errors before they become problems.
- Employee data entry - Employees that take customer data over the phone or in person can easily misunderstand address data or fail to enter all components, which can lead to entering an address as:
314 Bear Street
When the address should've been:
340 Pear Street Apt 3
- Importing address data from 3rd party sources - When importing address data for analysis or enrichment, it's difficult to know how well the provider vets their data. Importing without cleansing enables bad data to enter your system en masse.
- Address updating processes - Sometimes addresses change. People move and get new addresses. Without adequate validation, each change may introduce new false positives into the system as they're updated by the customer directly, with a receptionist or a customer service representative.
Smarty's approach to address validation
Smarty's address validation was built with address quality in mind, which means:
- We validate the building/house number is real.
- We validate the secondary/apartment number entered is real.
- We also tell you when a unit number is missing or unnecessary in countries where the secondary data is available.
- When we change or edit other components, we'll explicitly tell you what we changed and why.
- We don't change the primary number or apartment number to get a match. Instead, we return a response that the address entered is invalid to avoid false positives.
- Won't change the building or apartment numbers to find a match.
Smarty's address autocomplete API differs from many other autocomplete APIs in the same way. Our address autocomplete only serves valid address suggestions, including building and apartment numbers with NO fake addresses.
Questions to ask an address validation service
Here are some questions you can ask of your address validation service to identify and prevent possible false positives from creeping into your address database.
- What is the primary intended purpose or use case for your tool? Is it meant for shipping, geocoding, navigation, mapping, address quality, or another use case?
- What type of building/house numbering validation do you provide?
- Complete - The actual number is a valid delivery point.
- Range - The number is found within an accepted range of numbers.
- Not validated - The building/house number is ignored (Only street-level validation is offered.)
- What type of secondary/apartment numbering validation do you provide?
- Complete - The secondary number entered is a valid delivery point, and whether a secondary number is missing, incorrect, or unnecessary.
- Range - That the number is found within an accepted range of numbers.
- Not validated - The secondary number isn't evaluated during validation.
- Do you count it as an address match even when the secondary number isn't confirmed?
- Will you tell me if a secondary number is missing?
- Does your tool ever automatically change, delete, or add a building number or secondary number changed to find a match?
- When a component is changed from an initial entry like, spelling, abbreviation, city, state, ZIP Code, street, building/apartment number, etc., are all the changes explicitly stated in the output?
- Is a confidence score provided? How is the confidence score calculated?
Don't just take our word for how Smarty is the right choice for you and your business. Try our address validation tools at the top of the page to prevent false positives from entering your system and clean up false positives already hidden in your data.
Then, jump in the driver's seat and test our capabilities with a 42-day free trial.