Solved

Compairing Data using T-SQL

Posted on 2008-06-18
16
321 Views
Last Modified: 2010-04-21
Hi,
i would like to be able to compair 2 tables against each other to find similar address matches.
I first need to standerdise my data in each table and insert this data into  temp tables.
Like :
(2)  change all data to same case (upper or lower, doesn't matter which).
(3)  strip out all punctuation and all extra spaces
(4)  parse the space-delimited elements and standardize all descriptive words such as "NORTH" to "N", "STREET" to "ST",     substitute soundex values for uncommon words, etc.  For this task I used the Postal Systems's standard abbreviations.

Can some one help me as to how i can achieve this?
I am using t-SQL on SQL SERVER 2005..
Thanks
0
Comment
Question by:Putoch
  • 6
  • 6
  • 4
16 Comments
 
LVL 13

Expert Comment

by:rickchild
ID: 21814223
To try another angle, depending on the size of your organisation you may be able to buy in address matching software, as matching addresses successfully can be a very specialised task.

I have used Capscan Matchcode Batch in the past when I was working in Direct Marketing (Junk Mail!), and I am fairly sure they aupport Irish EIRE data in addition to UK Royal Mail addresses.

Might be worth a trial, they are extremely good pieces of software for address standardisation and matching.

http://www.capscan.com/matchcode_batch.aspx
http://www.qas.co.uk/products/clean-and-suppress-address-data/batch.htm
0
 
LVL 13

Expert Comment

by:rickchild
ID: 21814301
To go into further detail those types of software can process the data and standardise the addresses to how they are stored in the postal service database. This is often used to make sure mail reaches the address successfully.  If that wasn't enough to help you match then they also assign a Unique ID to each address which you can match on.

You can also extract separate address elements, such as Town, County, House Name, House Number.

They will flag the data on how successful the match to the postal service database file was, either with 1/0, or sometimes a 0 and a percentage of how confident it is in the match.  (For example you could have 2 streets with the same name in a post town, and if the postcode is incomplete it will have to make assumptions.)
0
 

Author Comment

by:Putoch
ID: 21814309
Thank you rick, but i am afraid i don't think they will pay for that.

I was hoping there was some way of Formatting this, maybe by using Functions or SP

Has anyone every used anything like this before using t-sql?

thanks,
0
 
LVL 13

Expert Comment

by:rickchild
ID: 21814352
How is the data currently stored?  As one long address line, or separate columns for address lines?
If it is currently in one long line with say commas your first step is to separate this out into individual address lines, as you would see on an envelope.

Unfortunately I have found in the past when unable to use matching software, the best way to go about it is just match on Postcode and House Number/Name/Address Line 1, and forget those small percentage which fall through.
0
 
LVL 13

Expert Comment

by:rickchild
ID: 21814476
Sorry to get back to your question anyway!  I just feel strongly about these packages, as I know if you are serious about good address matching it can be weeks/months of work, and then the £2000 software looks a bit more reasonable...


(2)  change all data to same case (upper or lower, doesn't matter which).

SQL has functions UPPER() and LOWER() which you could use in a select or an update.
Note this is not required if your server collation is set to CI (Case-Insensitive)


(3)  strip out all punctuation and all extra spaces

REPLACE() function can be used here.  So for example a common one is to replace double spaces with single spaces.   REPLACE(Column1, '  ', ' ')  Again you use this in a SELECT or UPDATE

(4)  parse the space-delimited elements and standardize all descriptive words such as "NORTH" to "N", "STREET" to "ST",     substitute soundex values for uncommon words, etc.  For this task I used the Postal Systems's standard abbreviations.

You can again use REPLACE() so say REPLACE(Column1, 'STREET', 'ST')

For the soundex you can see some documentation here.
http://msdn.microsoft.com/en-us/library/ms187384.aspx
0
 

Author Comment

by:Putoch
ID: 21817060
Thanks Rick for your advice and help, i will give this a go and see how i fair out.
0
 
LVL 9

Expert Comment

by:Tone' Shelby
ID: 21818286
If You are running SQL 2005 with SMS Bids then you should have the SSIS tool ( SQL Server Integration Services). This  ETL feature of 2005 is amazing once you start using it. The data transformation part of this tool will do most everything you mention above. I think it's worth the time to learn it and the learning curve is pretty low. I created my first packages and was transforming data the same day ...

You might want to check out the info on it. Here's a first step tutorial link that can get you started.

http://msdn.microsoft.com/en-us/library/ms169917.aspx
0
 

Author Comment

by:Putoch
ID: 21818347
Hi tshel,
i do indeed have the ssis tool, i have created packages before for extracting data from differnet sources into tables into mssql, and for loading tables with data. but never to do this.
Have you used it for this specifically when you tried it out ( i mean to clean up the data )

thanks for you respnse also!!
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:Putoch
ID: 21818402
Did you use Fuzzy lookups?
0
 
LVL 9

Expert Comment

by:Tone' Shelby
ID: 21821135
Yes  Absolutely, In fact that's the main reason we use it and I have for some pretty major conversion projects.

When Using the Data Transformation portion of BIDS  in the Data Flow task when creating packages I have used many of the transformations to cleanse the data ie: Regular expressions for email matching, changing phone numbers to all numbers vs dashes and parenthesis, Grabbing like titles with Fuzzy lookup & dumping the errors to a seperate files etc etc...... It is a very powerful conversion tool once you get the knack of it...  Here is a pretty good starting step by step article on using the data flow in ssis...

http://dotnetslackers.com/articles/sql/UsingSQLServerIntegrationServicesDataFlowSQLServer2005.aspx

Enjoy!
Thanks
0
 

Author Comment

by:Putoch
ID: 21821736
Ok tshel, thanks for the link too, i will have a go at this and let u know how i get on.
0
 
LVL 9

Accepted Solution

by:
Tone' Shelby earned 300 total points
ID: 21827699
My pleasure -- PS the link above gets you atrted on a simple package but if you need How to: specifics MSDN can walk you through - example for how to on fuzzy grouping: http://msdn.microsoft.com/en-us/library/ms137786.aspx
when you get there you will see all kinds of how tos on the left.

and here are all the other available  Integration Services Transformations: http://msdn.microsoft.com/en-us/library/ms141713.aspx

Have Fun!!


0
 
LVL 9

Expert Comment

by:Tone' Shelby
ID: 21827723
woops my apologies: the how to on fuzzy grouping is ... http://msdn.microsoft.com/en-us/library/ms142155.aspx
again once you get there you will see all the other how to's on just about anything you may need to trnasform... take care.
0
 
LVL 9

Expert Comment

by:Tone' Shelby
ID: 22196230
Hey checking in on you. How are you getting along on this? ...need any more assistance...thanks
0
 
LVL 9

Expert Comment

by:Tone' Shelby
ID: 22465673
Just checking up on how you again are coming along & if you need any additional assistance on this or is your issue resolved? --- thanks!
0
 

Author Closing Comment

by:Putoch
ID: 31468414
thanks Tshel, i ended up using some nested queies and pulled back what ever results i could.
The business needed the results very quickly and i did not have the time to develop a technique to pulling and looking through the data. thanks so much for your advise along the way.
regards,
putoch
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Join & Write a Comment

In this article I will describe the Copy Database Wizard method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now