• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 32
  • Last Modified:

Stripping away excess address information using SQL

I have database that is full of demographic information including addresses. I need to extract just address information and remove any references to apartment and suite numbers and floors and the like. This is because I have table that full of latitude and longitude records that are base solely on the address portion of the data. Does anyone have any helpful tips on how to accomplish this? My thought right now is to scan what is in there right now and use the replace command to change them to blanks but I was wondering if there is a better way.
0
rwheeler23
Asked:
rwheeler23
  • 5
  • 2
1 Solution
 
Dean ChafeeIT/InfoSec ManagerCommented:
The best way is so send the database records through a USPS address correction service to properly correct address elements and return the parsed address in the USPS corrected form.  Short of a process like this, we would need to see examples of what the column data looks like, but address can vary a bunch and can be complex to there is no standard pattern matching to clean the data.
0
 
rwheeler23Author Commented:
17 CATHERINE ST   2ND FL is a typical example. There are three fields involved. Address, city and state. So in this case it would be like this:    17 CATHERINE ST   2ND FL, Smallville, WI. I need to remove the 2NF FL or Suite 203 or Room 222 or whatever.
0
 
rwheeler23Author Commented:
I also see instances where in the data table STREET is spelled STREET  and in the cross reference table it is abbreviated to ST. So it is more than just the apartment or suite numbers messing this up. I have looked on the USPS site and it not clear if there are files there I can use.
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
rwheeler23Author Commented:
I forgot to mention that I need to build this routine into our nightly SSIS project that pulls data from a generic source  and publishes onto another website.
0
 
Dean ChafeeIT/InfoSec ManagerCommented:
Variations in free form address fields is very common and is the problem, like STREET or ST,  APT or #, and so on.   I notice in your example that there are multiple spaces between "ST   2ND", is this a common pattern? if so, use it to separate them because it seems that would be a delimiter between address line 1 and line 2.

I highly recommend an address hygiene product to process the file once extracted via SSIS. Here is one example:
https://smartystreets.com/products
1
 
rwheeler23Author Commented:
The closer I examine the data the more combinations I find. I believe this is the norm rather than the exceptions.  Even the same person will spell names differently from one day to another. The base table is stable, it the file I am being given that contains all the irregularities. I will check out your suggestion. Thanks.
0
 
rwheeler23Author Commented:
Thanks
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 5
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now