Detect Address String.

Does anybody know of a way to detect an address from a string entered in a random order?

For example the user can simply type their address into a text field, and we can locate and extract the following components:

postcode
house number

I'll up the points for a detailed response :)
MoOTottleAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mosphatCommented:
The first thing that comes to mind are regular expressions. However, there are a lot of different ways to write an address. Which kind of addresses (that is, which countries) do you want to support?
0
MoOTottleAuthor Commented:
It will be all UK addresses,

they are "normally" in the format:

Joe Blogs, 55 Springfield Road, Springfield, Manchester, MJ15 4FB
0
mosphatCommented:
Most people will probably write it like:

Joe Blogs
55 Springfield Road
Springfield
Manchester
MJ15 4FB

But I'm sure there are (many) other ways. (Different people, different habits)

Is the user going to get some kind of confirm page in which he/she can make corrections, in case the address is entered in a way not foreseen by the parser you're about to write?

Which language do you intend to use? (Not that it matters to the problem, it just might be more convenient when giving examples)
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

MoOTottleAuthor Commented:
it will be a single line input, as the user interface is very limited.

its basically just like a command prompt.

And its a one way process im afraid, once the data has been submitted we have one shot to grab the info as best we can.

I am able to use either coldfusion, Classic ASP or VB 6.0. Although ASP is my preference.

I managed to hack this together for the postcode from reading around, but it doesnt seem to be working.
0
MoOTottleAuthor Commented:
<%
Sub GetAdd(strInput)

  Dim objRegExp, objMatch
  Set objRegExp = New RegExp
 
  objRegExp.IgnoreCase = True
  objRegExp.Pattern = "^((([A-PR-UWYZ])([0-9][0-9A-HJKS-UW]?))|(([A-PR-UWYZ][A-HK-Y])([0-9][0-9ABEHMNPRV-Y]?))\s{0,2}(([0-9])([ABD-HJLNP-UW-Z])([ABD-HJLNP-UW-Z])))|(((GI)(R))\s{0,2}((0)(A)(A)))$"
  objRegExp.Global = True


  set colMatches = objRegExp.Execute(strInput)

  'Print the # of matches we found
  Response.Write colMatches.Count & " matches found...<P>"

  'Step through our matches
  For Each objMatch in colMatches
     Response.Write objMatch.Value & "<BR>"
  Next

  'Clean up
  Set colMatches = Nothing
  Set objRegExp = Nothing



End Sub


strAdd = "Joe Blogs, 55 Springfield Road, Springfield, Manchester, MJ15 4FB"
%>
<%GetAdd(strAdd)%>
0
mosphatCommented:
That's because the ^ at the beginning and the $ at the end tell the regex parser that the match should start at the beginning and end at the end. Obviously the postcode doesn't start at the beginning in your example.
Try removing the ^ and $.
0
MoOTottleAuthor Commented:
excellent, its working a treat now. :)

so is there a better way to return one match? rather than an array of matches?

And how would i go about pulling out the house number?

(btw thanks for the help so far :] )
0
mosphatCommented:
Somewhere along the lines of this?

objRegExp.Pattern = "([0-9\-])+\s+([A-Z\s])(?:,)"

The first group would contain the housenumber, the second one the streetname.
Mind you, this is a very basic regex. I'm sure the UK has some exotic streetnames that won't be found this way. But it's a start.
0
MoOTottleAuthor Commented:
that one doesnt seem to be working?

its returning 0 matches.

Sub GetAddress(strInput)

  Dim objRegExp, objMatch
  Set objRegExp = New RegExp
 
  objRegExp.IgnoreCase = True
  objRegExp.Pattern = "([0-9\-])+\s+([A-Z\s])(?:,)"
  objRegExp.Global = True


  set colMatches = objRegExp.Execute(strInput)

  'Print the # of matches we found
  Response.Write colMatches.Count & " matches found...<P>"

  'Step through our matches
  For Each objMatch in colMatches
     Response.Write objMatch.Value & "<BR>"
  Next

  'Clean up
  Set colMatches = Nothing
  Set objRegExp = Nothing



End Sub
0
mosphatCommented:
Sorry, get rid of the (?:,)
0
MoOTottleAuthor Commented:
it now gives an output of "55 S"

when i feed
"Joe Blogs, 55 Springfield Road, Springfield, Manchester, MJ15 4FB"

into it?
0
mosphatCommented:
Hmm, I took the liberty of actually testing what I'm claiming here :)

This really works: ([0-9\-]+)\s+([A-Z\s\-\.]+)
On my machine that is...
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
MoOTottleAuthor Commented:
n1, thanks for the help. points upped to 200 ;)
0
mosphatCommented:
You're very welcome and thank you for the extra points.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Web Languages and Standards

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.