Link to home
Create AccountLog in
Microsoft SQL Server

Microsoft SQL Server

--

Questions

--

Followers

Top Experts

Avatar of prabakaran preethi
prabakaran preethi

SQL: Comparing 2 Addresses(String) - Identifying Match Percentage
Hello Experts,

I have 2 Tables namely AddressTable1 and AddressTable2. Both of them have same number of Columns.

Columns:
-------------
ProviderID
AddressLine1 VARCHAR(55)
AddressLine2 VARCHAR(55)
City
ZIP
State

What I would like to find out is something different here.

A provide(ProviderID) will have entries in both the tables(Only 1 row) and the address of the Provider in both tables may or may not match.

I need a query/proc to find out the Providers those addresses are not matching.

When I say NOT MATCHING, I simply;

do not mean the direct comparison of the fields
do not mean converting them to UPPER case and compre

The addresses may logically be matching. But not physically(actual data in the fields)
E.g:User generated imageHere, we know that STE and SUITE are same but when you compare them, it will always fail. We can create a crosswalk as one of the functionality to complete this. But there are other scenarios to consider.

The texts in the address fields might be loaded vice-versa. There could be space in between. There could be # instead of No.

The main reason of these data discrepancies is "Manual intervention". The data is loaded into database in 2 different modes.
User Mode: Business Users manually enter the data from a front-end application.
Batch Mode: Inbound data is fed to DB through certain interfaces

We may be able to teach the Business Users to enter the data as per the US standard. We may be able to request the source to send the data in a proper mode.

But, I need to find out the discrepancy in the data that is already there in the DB. I somehow need to compare the addresses by taking each of the words from the fields to compare against other fields to find out the match.

What I need is, Provider ID and the Percentage of match. I don't how to explain or define the Percentage of Match. If you can suggest, it would be great too.

Note:
I do not worry about City/Zip/State. Because, I will simply ignore the providers missing ZIP/City/State match.
The data is in stored in different instances. It is stored in Sybase and Oracle.

Zero AI Policy

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


ASKER CERTIFIED SOLUTION
Avatar of slightwv (䄆 Netminder)slightwv (䄆 Netminder)

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.
Create Account

SOLUTION
Avatar of Kent OlsenKent Olsen🇺🇸

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.

SOLUTION
Avatar of wilcoxonwilcoxon🇺🇸

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.

SOLUTION
Avatar of awking00awking00🇺🇸

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.

Avatar of prabakaran preethiprabakaran preethi

ASKER

Thank you, Experts, for the comments.

We can look at some of the data but there are huge.

There are more than 100 Business Users intervene to update the addresses manually. On an avg. 2k records are processed on a Daily basis.

The DB has 2,74,498 providers as of now and the providers may have more than 1 addresses(Home, Remittance, Billing, Service 1,2,3 etc.)

What I am planning to do is to format the addresses in certain method;
Change the addresses so that they have only Space in between the words in the address lines
Change the case to Upper
Identify the SUITE/STE/STREET/ST related addresses that also have numbers. If the number of numeric characters are more than 3 and they match, then change the STE to SUITE and ST to STREET
Take the words from address lines and compare it against the other and if there are 3 or more positive matches, then assume that the addresses match

Will it do good. I am pretty sure they may not be 100% but closest I can get to.

Is there an API/tool where I can pass the addresses and get their Latitude and Latitude so that I can compare against the other? How closer we can get? I will assume if the lang/long cannot be identified, I will have users looked at it manually.

??

>>Will it do good. I am pretty sure they may not be 100% but closest I can get to.

Sounds like a decent method to me.  I've used similar.  Mentioned above, directional words cause issues (N,S,E,W,etc...).  I try to keep city and state with the address when I've done it.  How many cities have a "1 Main Street" in them?  Add the city and state and they become unique.

>>Is there an API/tool where I can pass the addresses and get their Latitude and Latitude

Several.  None free that I'm aware of.  Look around for: Geo locating street address.

The suggestion of paying the USPS for their data is a good one.  If they don't know about the street address then it probably doesn't exist (or is so new they haven't updated the data yet).

Avatar of Kent OlsenKent Olsen🇺🇸

I suggest that you use this only for comparison, not as a "real" snail-mail address.  Too many things can go wrong that could result in mis-delivery.

Another gotcha is "STREET" vs. "ST", vs. "st".  It's not uncommon, particularly with people where English is their second language, to  not butt letters and digits.

"21st Ave" is normally meant to be "Avenue 21".  However, separate "21" and "st" and it reads "21 st Ave".  Depending on the rest of the address it could be ambiguous, or at least confusing.

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


I don't think this can be done directly.
Did you make any advances on this?
Microsoft SQL Server

Microsoft SQL Server

--

Questions

--

Followers

Top Experts

Microsoft SQL Server is a suite of relational database management system (RDBMS) products providing multi-user database access functionality.SQL Server is available in multiple versions, typically identified by release year, and versions are subdivided into editions to distinguish between product functionality. Component services include integration (SSIS), reporting (SSRS), analysis (SSAS), data quality, master data, T-SQL and performance tuning.