Link to home
Start Free TrialLog in
Avatar of Matt Chornomaz
Matt Chornomaz

asked on

VBA - Parse Address

I need to parse addresses into their individual parts (street, city, state, zip).  It's proving extremely difficult for me.  I've found a few suggestions online, but they're either too complicated to adapt to my situation, or the code just doesn't work.  I'm not a newbie to vba, but not an advanced expert either.

The addresses are located in an Excel column.  Each address is in only one cell.  There is only one delimiter...a comma between the city and state.  But some cities are one word, others are two.  It just seems impossible to do with normal string parsing.

Some examples (I used dummy street numbers, so as to keep confidentiality):
123 N PRESBYTERIAN AVE ATLANTIC CITY, NJ 084014702
123 POPLAR AVE APT A GALLOWAY, NJ 082054598
1 TEXAS CT ATLANTIC CITY, NJ 084016422
1 W LEE AVE ABSECON, NJ 082012608
PO BOX 123 ABSECON, NJ 082010688

Any help is greatly appreciated!
Avatar of Rob Henson
Rob Henson
Flag of United Kingdom of Great Britain and Northern Ireland image

State and ZIP can be extracted relatively simply using formula, with address in A2:

State:  =TRIM(MID(A2,FIND(",",A2,1)+1,4))
ZIP:   =RIGHT(A2,9)

PS: Added Excel to the Topics so will attract more relevant experts. Do you really need Word topic as well?
Avatar of Matt Chornomaz
Matt Chornomaz

ASKER

Thanks, but i'm good with state and zip.  It's the city that's the biggest problem, since it's not always just one word.  Judging by what i saw when i researched online, i think it might require interaction with a Google service or similar?? A lot of people insisted on outsourcing this, but unfortunately i can't spend a dime on this project.

As a note, I don't need to differentiate between address line 1 and address line 2 (apt #, etc.).  Once I figure out the city, I can just assume everything before it is the street.
working with addresses entered as a single field is one of the most frustrating data tasks I've ever dealt with, primarily because there are so many combinations of street addresses and apt number designations.  Additionally, you have to deal with misspellings, abbreviations, and inconsistent entries.

You will probably need to parse this up into segments and process each format separately.

For example, Search for "PO Box" and if that occurs, then capture the next value to the right of that as the number and assume the address portion ends at that number.

Search for " AVE " or " Avenue " and parse that plus everything to the left as an address.  This might leave Apt or Lot # between the street and the city, but you can also search for those value in what remains.  Then do this again for different types of streets (Road/rd, Court/CT, Street/ST, Boulevard/Blvd, ...).  Make sure those searches wrap the search term in spaces to make sure that you are not finding values embedded in other values.

Good luck.
Do you have a list of cities available against which to check? Use Wikipedia maybe to get a list.

Once the state and ZIP are identified, the last word or two words of the remaining string will be the city. You could do a check of the last word against a list and if that occurs then it is valid otherwise check the last two words.

It might be worth doing the two word check first to eliminate cities where there are similarities, don't know US cities as I am in UK but for example you could have Atlanta and New Atlanta, checking for Atlanta first in an address that is actually New Atlanta would then leave the New on the end of the street part.
Thanks for the suggestion.  Unfortunately I can't check against a list.  The address/city could be anywhere in the U.S.
It just seems impossible to do with normal string parsing.
If you don't control the input data then I don't think there's any practical way to do it.
failure is not an option :-)  How about an impractical method?
Where does the data come from?
It is exported from a jury management system to a .csv file, which we open with Excel, then clean up the data.  Unfortunately, I have no control over the input data if that's what you're getting at :(

I guess I could test for all the various types of address suffixes like someone already suggested (ave, avenue, rd, road, etc).  I was really hoping to avoid that as there's no way to fool-proof that.  But if it's too complicated otherwise...
This purports to solve the issue.  It uses the Google Maps API.  However when copied over the code and tried it, it froze up once it got to a certain line of code.  And unfortunately it's too complicated for me to troubleshoot.
https://msexcelvancouver.com/blog/parse-address-excel-macro-vba/
Can't check against a list or don't have a list to check against???

Here's a list, csv version is free to download

https://simplemaps.com/data/us-cities
"Don't have a list to check against"...until now that is :).  Thanks so much, this is great.  Any tips on how to incorporate this into my project?  Do I have to loop through the entire list each time I parse an address?...using instr?  If you have any tips on the best way to do this, once again...help is very appreciated.  Thanks again for the link!
You can make a call from your VBA code, via the msxml2 object, to a web site that will give you city name(s) for the zip code.  That would be the best way to parse your addresses.
I'm a fan of regular expression in VBA with
Dim oRE As Object
Set oRE = CreateObject("vbscript.regexp")

Open in new window

The pattern for the first parse is
([^,]+), (\w{2}) (\d+)

Open in new window

You should be able to plug your zip codes into this URL and get the city name on the returned page.
https://www.zip-info.com/cgi-local/zipsrch.exe?zip=08401-4702

Note you will need to supply the hyphen character between the zip and the +4 data.  You can also just supply the first five characters of the zip (without the +4) to this URL.
I had some free time this morning, and this was an interesting question, so I built a function that parses an address string, and returns a pipe delimited list.  It can handle 5 or 10 digit zip codes, and doesn't care how many separate words are in a city's name.  It requires an additional function ("BitsAfterLast"), which is down below the big function.

A list of cities/zip codes must be downloaded from the USPS website (URL in the function), imported, and saved as a table named Free-zipcode-database.

Addressed passed to this function must be in the standard StreetAddress/City/State/Zip format, and the city name MUST be spelled correctly (if it's not, the function will return "InvalidCityName").  
If a valid address is passed to this function, it will return a pipe delimited string:  

ParseAddress("P.O. Box 942873 Sacramento, CA 94273-0001")   returns P.O. Box 942873|SACRAMENTO|CA|94273-0001|




Public Function AddressParser(strWholeAddress As String) As String
'Author:  Paul Cook-Giles (paul@cookgiles.net)
'Created June 25, 2019
'permission to use is freely given, provided authorship information is preserved.
On Error GoTo HandleError

'parses addresses, returns pipe-delimited string
'this function assumes that elements are ordered StreetAddress|City|ST|Zip, and
'   elements are separated by spaces
'   State abbreviation is always two characters
'   city name in address will match a city name for the zip code in the PO database:
'http://federalgovernmentzipcodes.us/
'download free-zipcode-database.csv
'create [Free-zipcode-database] table
'if the city in the string cannot be matched to a city/zipcode in the database, the function returns "InvalidCityName"

Dim strStreetAddress As String, strCity As String, strST As String, strZip As String, strWorkingCity As String, strWorkingZip As String
Dim intCount As Integer

'strip out any commas and leading or trailing spaces:
strWholeAddress = trim(Replace(strWholeAddress, ",", ""))
'populate strZip with characters after last space:
strZip = BitsAfterLast(strWholeAddress, " ")
'remove Zip and trailing space from original string:
strWholeAddress = Trim(Replace(strWholeAddress, strZip, ""))
'populate strST with characters after last space:
strST = BitsAfterLast(strWholeAddress, " ")
'remove last three characters from original string:
strWholeAddress = Left(strWholeAddress, Len(strWholeAddress) - 3)

'extract City from string:
'get city from zip code:
strWorkingZip = Left(strZip, 5)
'count appearances of zipcode in PO data:
intCount = DCount("ZipCode", "[Free-zipcode-database]", "Zipcode = '" & strWorkingZip & "'")

If intCount = 1 Then
'there is only one city for this zipcode
    strWorkingCity = DLookup("City", "[Free-zipcode-database]", "Zipcode = '" & strWorkingZip & "'")
    If InStr(strWholeAddress, strWorkingCity) > 1 Then
        strCity = strWorkingCity
        GoTo RemoveCity
    Else
        GoTo InvalidCity
    End If
Else
'there are multiple appearances of zipcode
'loop through cities, starting with the longest name, looking for a match in string
strSQL = "Select City from [Free-zipcode-database] where ZipCode = '" & strWorkingZip & "' order by Len(City) desc"
Set db = CurrentDb
Set rs = db.OpenRecordset(strSQL)
Do Until rs.EOF = True

If InStr(strWholeAddress, rs.Fields("City")) > 0 Then
    strCity = rs.Fields("City")
    GoTo FoundCity
End If

rs.MoveNext
Loop

End If
    
InvalidCity:
'city name could not be matched to zip code:
AddressParser = "InvalidCityName"
GoTo ExitFunction


FoundCity:
'recordset is closed:
       rs.Close
    Set rs = Nothing
    Set db = Nothing

RemoveCity:
'remove City from original string:
strWholeAddress = Trim(Replace(strWholeAddress, strCity, ""))

PopulateStreetAddress:
'remaining string is street address
strStreetAddress = strWholeAddress

'concatenate elements into pipe-delimited string:
AddressParser = strStreetAddress & "|" & strCity & "|" & strST & "|" & strZip & "|"

ExitFunction:
   DoCmd.SetWarnings True
   DoCmd.Hourglass False
   Exit Function
HandleError:
   DoCmd.Hourglass False
   DoCmd.SetWarnings True
   AddressParser = "ParsingError"
'   MsgBox "Error " & Err.Number & ": " & Err.Description & vbCrLf & "Line:  " & Erl & vbCrLf & "Sub AddressParser in PublicCode" & vbCrLf & "Release " & DMax("UpdateDt", "UpdateTb"), vbOKOnly, strAppNa
   Resume ExitFunction
EndFunction:
End Function

Open in new window




Public Function BitsAfterLast(strNa As String, strDelimiter As String)
'this function returns the characters to the right of the first sought character
'SEEKING FROM THE RIGHT END OF THE STRING
   BitsAfterLast = Right(strNa, Len([strNa]) - InStrRev([strNa], strDelimiter))
End Function

Open in new window

I'm such a git.  I was so intrigued by the question I just assumed this was an Access topic.
Can anyone tell Matt (and me) how to drop this function into Excel, and then call it in a Workbook?
@Paul,

Here's my article on pushing data into Excel.
https://www.experts-exchange.com/articles/2253/Fast-Data-Push-to-Excel.html

I would expect your target range would be columns B:E
Aikimark, thanks-- the data, according to the OP, is already in Excel;  the user needs to split the address string into its component parts (presumably into separate columns, although that isn't explicitly stated).  
I assume that there's a way to drop the function I wrote into the Excel VBA environment, and then call it to walk down column A and drop the address elements into columns B thru E... but I don't know what it is (and yes, I read through your post, but if the solution is there, I'm not bright enough to see it).

Matt, if you're open to using Access to solve this problem, I can walk you through creating an Access db that will import a column of addresses from Excel, and spit out a worksheet with the addresses split into their elements over 4 columns.  :)
I copy/pasted your sample addresses into my worksheet in column A.  Running this code seems to correctly parse the results into columns B:E
Option Explicit

Function GetCitiesForZip(ByVal parmZip As String) As Object
    Dim strText As String
    Dim xmlhttp As Object
    Dim oRE As Object
    Dim oMatches As Object
    Dim oM
    
    Set oRE = CreateObject("vbscript.regexp")
    oRE.Global = True
    oRE.ignorecase = True
    oRE.Pattern = "(City Alias\(es\):|City Alias\(es\) To Avoid Using:)</span></td><td class=""info"">(.*?)</td"
    
    Set xmlhttp = CreateObject("MSXML2.serverXMLHTTP")

    strText = Left(parmZip, 5)
    xmlhttp.Open "GET", "https://www.zip-codes.com/zip-code/" & strText & "/zip-code-" & strText & ".asp", False
    xmlhttp.Send
    
    Do Until xmlhttp.ReadyState = 4
        DoEvents
    Loop
    
    If xmlhttp.Status = 200 Then
        strText = xmlhttp.responseText
        Set oMatches = oRE.Execute(strText)
        Set GetCitiesForZip = oMatches
    End If
End Function

Function ConvertMatchesIntoPattern(parmMatches) As String
    Dim oM As Object
    Dim vItem As Variant
    For Each oM In parmMatches
        ConvertMatchesIntoPattern = ConvertMatchesIntoPattern & "|" & _
                                    Replace(UCase(oM.submatches(1)), "<BR>", "|")
    Next
    ConvertMatchesIntoPattern = Mid(ConvertMatchesIntoPattern, 2)
End Function

Sub Q_29150165()
    Dim dicZips As Object
    Set dicZips = CreateObject("system.collections.hashtable")
    
    Dim rng As Range
    
    Dim strAddr As String
    Dim strCity As String
    Dim strST As String
    Dim strZip As String
    
    Dim vItem As Variant
    Dim oRE_Cell As Object
    Dim oRE_Addr As Object
    Dim oMatches As Object
    
    Set oRE_Cell = CreateObject("vbscript.regexp")
    oRE_Cell.Pattern = "^([^,]+), (\w{2}) (\d+)$"
    
    Set oRE_Addr = CreateObject("vbscript.regexp")
    
    Application.ScreenUpdating = False
    
    For Each rng In ActiveSheet.UsedRange
        Set oMatches = oRE_Cell.Execute(rng.Value)
        strAddr = oMatches(0).submatches(0)
        strST = oMatches(0).submatches(1)
        strZip = oMatches(0).submatches(2)
        
        Set oMatches = GetCitiesForZip(Left(strZip, 5))  'vItem
        If dicZips.containskey(Left(strZip, 5)) Then
            oRE_Addr.Pattern = "^(.+?) (" & dicZips(Left(strZip, 5)) & ")$"
        Else
            oRE_Addr.Pattern = "^(.+?) (" & ConvertMatchesIntoPattern(oMatches) & ")$"
            dicZips.Add Left(strZip, 5), ConvertMatchesIntoPattern(oMatches)
        End If
        
        Set oMatches = oRE_Addr.Execute(strAddr)
        
        strAddr = oMatches(0).submatches(0)
        strCity = oMatches(0).submatches(1)
        
        ActiveSheet.Range(rng.Offset(0, 1), rng.Offset(0, 4)).Value = _
                                    Array(strAddr, strCity, strST, strZip)
    Next
    Application.ScreenUpdating = True
End Sub

Open in new window

Thank you all so much for your efforts...especially to those who put all the time into sharing the actual code!

aikimark, thank you very much.  Unfortunately, it freezes, then times out at "xmlhttp.send".  I suspect that it may be related to our security.  I work for a State Courthouse, so everything is very sensitive, and thus locked down in many ways.  I will try when i get home on my own computer to confirm or not.  Does this make sense?  If anyone has an idea of what could block this, please lmk...that way i can approach one of the "higher-ups" to see if they can make an exception.

Or...

I hate to ask for anymore help at this point, but i was wondering if this code can be adapted to work with my own list of zip-codes.  Like say i have a separate worksheet that has two columns...one with the zip codes, and the next column will have their corresponding city name.  Is there a (fairly easy) way to adapt the code to work with something like this?  I hate to ask this...especially if this is working code (which you tested, so i assume it is).  So if not, we can close the thread.  I'm sure i can figure out something with all these suggestions.

Thanks again!
If you have an existing list of ZIP codes and Cities then it would be possible to use that list as a lookup.

Formulas would then:
Strip out State and ZIP from source text string
Use ZIP to identify City
Identify City within source string and use everything prior to that as Street.

Major assumption is that the source string has city spelt the same as your list of Cities.
Rob Henson, thank you for the quick reply.  You're right..."a major assumption"...I don't know why I didn't think of that.  But doesn't the other solution rely on the same assumption?  The vlookup solution might work just as accurately?

I guess as I run into issues, I can fix the list one city at a time.  This may be the best I can do.  Google Maps api would probably work, but that also relies on reaching out to an external url.
with your browser at work, can you open https://www.zip-codes.com  ?
The free zip code csv file that Paul is using in his solution is certainly an offline solution.  You would have to get past your firewall on this approach as well.

Keep us apprised of your testing.
Can you upload your list of ZIP codes and cities? I will see what I can do with the sample addresses and your list.
I've attached a simplified version of the zip code csv file.  It is sorted in zip code order.
free-zipcode-database_Simplified.csv
I don't understand how US ZIP Codes work, that list has 5 digit ZIP codes whereas the sample addresses had 9 digits. I've seen some mention of +4, I assume this is the difference between the 5 and 9 digits.

Also there are some 5 digit ZIP codes that are listed more than once with different city names.
Rob and Matt, if you'll look at the logic in the sub I posted, you'll see how I handled the fact that some zip codes have multiple forms of the same city name, and others have multiple cities for a single code.
Here is my humble, array based, contribution to solving your complex problem.
it is an offline solution but it does heavily rely upon arrays.
Assuming you data is in column A, the parsed data will  be placed in columns B-F.
Option Explicit

Private Type ParsedAddress
    StreetAddress As String
    POB As String
    City As String
    StateAbbr As String
    ZipCode As String
End Type

Dim tpParsedAddressed As ParsedAddress

Sub Process()
    Dim rng As Range, c As Range
    
    Set rng = Range("A1:A" & ActiveSheet.Columns("A").Cells.Find("*", SearchOrder:=xlByRows, LookIn:=xlValues, SearchDirection:=xlPrevious).Row)
    For Each c In rng.Cells
        ParseAddress c.Value
        c.Offset(0, 1).Value = tpParsedAddressed.StreetAddress
        c.Offset(0, 2).Value = tpParsedAddressed.POB
        c.Offset(0, 3).Value = tpParsedAddressed.City
        c.Offset(0, 4).Value = tpParsedAddressed.StateAbbr
        c.Offset(0, 5).Value = tpParsedAddressed.ZipCode
    Next
    
End Sub

Sub ParseAddress(strAddress As String)
    
    Dim arrEntireAddress() As String, arrCommaDelimited() As String, arrStreetAddress() As String
    Dim strPossibleZipcode As String, strPossibleStateAbbr As String, strPossibleStreetAddress As String, strPossibleCity As String, strPossiblePOB As String
    Dim arrStreetSuffixes() As String, arrUnits() As String
    Dim i As Long, x As Long, y As Long, z As Long
    
    arrEntireAddress = Split(strAddress)
    arrCommaDelimited = Split(strAddress, ",")
    arrStreetAddress = Split(arrCommaDelimited(0))
    
    arrStreetSuffixes = Split("AV,AVE,LN,BL,CT,ST,RD,DR,TER,PL,LN,ROAD,COURT,STREET,BOULVEVARD,AVENUE,LANE,DRIVE,TERRACE,PLACE,LANE", ",")
    arrUnits = Split("STE,APT,RM,SUITE,APPARTMENT,ROOM", ",")
    
    For i = 0 To UBound(arrStreetAddress)
        If UCase(Replace(arrStreetAddress(i), ".", "")) = "POB" Then
            strPossiblePOB = "P.O. Box " & arrStreetAddress(i + 1)
            strPossibleCity = arrStreetAddress(i + 1)
            Exit For
        ElseIf UCase(Replace(arrStreetAddress(i), ".", "")) = "PO" Then
            strPossiblePOB = "P.O. Box " & arrStreetAddress(i + 2)
            strPossibleCity = arrStreetAddress(i + 3)
            Exit For
        End If
        strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(i)
        For x = 0 To UBound(arrStreetSuffixes)
            If arrStreetAddress(i) = arrStreetSuffixes(x) Then
                For y = 0 To UBound(arrUnits)
                    If arrStreetAddress(i + 1) = arrUnits(y) Then
                        strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(i + 1) & " " & arrStreetAddress(i + 2)
                    End If
                Next
                GoTo EndLoop
            End If
        Next
    Next

EndLoop:
    
    strPossibleStreetAddress = Trim(strPossibleStreetAddress)
    strPossibleZipcode = arrEntireAddress(UBound(arrEntireAddress))
    strPossibleStateAbbr = arrEntireAddress(UBound(arrEntireAddress) - 1)
        
    With tpParsedAddressed
        .StreetAddress = IIf(Len(strPossibleStreetAddress) > 0, Trim(strPossibleStreetAddress), "NO STREET ADDRESS")
        .POB = IIf(Len(strPossiblePOB) > 0, Trim(strPossiblePOB), "NO P.O. BOX")
        .City = IIf(Len(strPossibleCity) > 0, strPossibleCity, Trim(Replace(arrCommaDelimited(0), strPossibleStreetAddress, "")))
        .StateAbbr = IIf(Not IsNumeric(strPossibleStateAbbr) And Len(strPossibleStateAbbr) = 2, strPossibleStateAbbr, "NO STATE ABBREVIATION")
        .ZipCode = IIf(IsValidZipCode(strPossibleZipcode), strPossibleZipcode, "NO ZIPCODE")
    
        Debug.Print .StreetAddress
        Debug.Print .POB
        Debug.Print .City
        Debug.Print .StateAbbr
        Debug.Print .ZipCode
    End With
    
End Sub

Private Function IsValidZipCode(ZipString As String) As Boolean
    
    Dim blnValid As Boolean

    Select Case Len(ZipString)
        Case 5
            blnValid = ZipString Like "#####"
        Case 9
            blnValid = ZipString Like "#########"
        Case 10
            blnValid = ZipString Like "#####-####"
    End Select
                
    IsValidZipCode = blnValid

End Function

Open in new window

I used a little Powershell scripting to create the attached file.  It can be used to populate the dictionary for the regex address patterns.

Content example:
"Zipcode","Cities"
"01013","CHICOPEE|WILLIMANSETT"
"01014","CHICOPEE"
"01020","CHICOPEE"
"01021","CHICOPEE"
"01022","CHICOPEE|WESTOVER AFB"
"01026","CUMMINGTON|WEST CUMMINGTON"
"01027","EASTHAMPTON|E HAMPTON|MOUNT TOM|WESTHAMPTON|LOUDVILLE"
"01028","EAST LONGMEADOW|E LONGMEADOW"
"01029","EAST OTIS|BIG POND|E OTIS"
"01030","FEEDING HILLS"
"01031","GILBERTVILLE|OLD FURNACE"

Open in new window

free-zipcode-CityMasks.csv
Joe Howard, I just gave your solution a go.  It's quite impressive.  Unfortunately it only worked on 233 out of the 266 I just tried it on.  But those 233 did come out perfectly.  Below is a list of the ones that it didn't work for.  All of the states and zips split out perfectly.  However with 25 of the 33 in question...the city stayed with the street address in column B, leaving column D blank.  The other 8 split out a city, but the city includes some of the street address.  I've separated them out below.  Note: I changed the street # to all 1's so to keep anonymity.

Also, when it's a PO Box, would there be an easy way to just put the PO Box in the street address column (col B), instead of putting, "No street address" in col B, and the actual PO Box in col C?  Eventually this will be used as a mail merge data source.

These are the ones where the city stayed with the street address, leaving the city column blank, but state and zip were fine.
111 BLACK HORSE PIKE APT 1-1 PLEASANTVILLE, NJ 082322358
11 HIGHLAND CIR EGG HARBOR TWP, NJ 082346626
111 SUN VALLEY CIR EGG HARBOR TOWNSHIP, NJ 082347746
11 WEEPING WILLOW CIR EGG HARBOR TWP, NJ 082345954
11 MALIBU WAY GALLOWAY, NJ 082053222
1111 BRIGANTINE BLVD APT A ATLANTIC CITY, NJ 084018127
111 BOARDWALK APT 111 ATLANTIC CITY, NJ 084017911
1111 S WHITE HORSE PIKE HAMMONTON, NJ 080371038
11 HEATHER CROFT EGG HARBOR TWP, NJ 082344612
11 BLACK HORSE PIKE TRLR 111 EGG HARBOR TWP, NJ 082341891
111 E MOURNING DOVE WAY GALLOWAY, NJ 082056202
11 HEALD CIR BRIGANTINE, NJ 082032920
111 E REVERE WAY GALLOWAY, NJ 082053224
111 BLACK HORSE PIKE APT 1-1 PLEASANTVILLE, NJ 082322356
11 MEADOW CIR MAYS LANDING, NJ 083304929
111 RAILROAD BLVD RICHLAND, NJ 083502103
111 S WHITE HORSE PIKE HAMMONTON, NJ 080371136
1111 ROUTE 50 MAYS LANDING, NJ 083302647
111 HARBOR BEACH BLVD APT A1 BRIGANTINE, NJ 082031248
111 HEATHER CROFT EGG HARBOR TOWNSHIP, NJ 082344634
111 BLUEBIRD CIR MAYS LANDING, NJ 083305611
1111 STARLIGHT CIR MAYS LANDING, NJ 083303461
1111 HARDING HWY MAYS LANDING, NJ 083301520
1111 ATLANTIC BRIGANTINE BLVD APT 1 BRIGANTINE, NJ 082033521
11 S CANARY WAY GALLOWAY, NJ 082056207

With the following, the opposite happened.  Some of the street address came over with the city.  Once again, the state and zip were good.
1527 MAGELLAN AVE # A ATLANTIC CITY, NJ 084012337
111 SO MAIN ST 2ND FLR PLEASANTVILLE, NJ 082322729
103 S TEXAS AVE FL 2 ATLANTIC CITY, NJ 084016519
443 N HARRISBURG AVE FL 2 ATLANTIC CITY, NJ 084011105
108 N WILLOW ST FL 2ND LANDISVILLE, NJ 083261025
101 11TH ST N UNIT B BRIGANTINE, NJ 082033210
2501 TILTON RD TRLR 505 EGG HARBOR TWP, NJ 082349517
409 W DELILAH RD UNIT 8 PLEASANTVILLE, NJ 082321288

The above split into:
street                                      city
1527 MAGELLAN AVE              # A ATLANTIC CITY
111 SO MAIN ST                      2ND FLR PLEASANTVILLE
103 S TEXAS AVE                      FL 2 ATLANTIC CITY
443 N HARRISBURG AVE        FL 2 ATLANTIC CITY
108 N WILLOW ST                      FL 2ND LANDISVILLE
101 11TH ST                              N UNIT B BRIGANTINE
2501 TILTON RD                      TRLR 505 EGG HARBOR TWP
409 W DELILAH RD              UNIT 8 PLEASANTVILLE
This improved version works for all but 2 addresses, can those be done manually or do they too have to be automated?:
Private Type ParsedAddress
    StreetAddress As String
    POB As String
    City As String
    StateAbbr As String
    ZipCode As String
End Type

Dim tpParsedAddressed As ParsedAddress

Sub Process()
    Dim rng As Range, c As Range
    
    Set rng = Range("A1:A" & ActiveSheet.Columns("A").Cells.Find("*", SearchOrder:=xlByRows, LookIn:=xlValues, SearchDirection:=xlPrevious).Row)
    For Each c In rng.Cells
        ParseAddress c.Value
        c.Offset(0, 1).Value = tpParsedAddressed.StreetAddress & " " & tpParsedAddressed.POB
        c.Offset(0, 2).Value = tpParsedAddressed.City
        c.Offset(0, 3).Value = tpParsedAddressed.StateAbbr
        c.Offset(0, 4).Value = tpParsedAddressed.ZipCode
    Next
    
End Sub

Sub ParseAddress(strAddress As String)
    
    Dim arrEntireAddress() As String, arrCommaDelimited() As String, arrStreetAddress() As String
    Dim strPossibleZipcode As String, strPossibleStateAbbr As String, strPossibleStreetAddress As String, strPossibleCity As String, strPossiblePOB As String
    Dim arrStreetSuffixes() As String, arrUnits() As String
    Dim i As Long, x As Long, y As Long, z As Long
    
    arrEntireAddress = Split(Trim(strAddress))
    arrCommaDelimited = Split(Trim(strAddress), ",")
    arrStreetAddress = Split(arrCommaDelimited(0))
    
    arrStreetSuffixes = Split("AV,AVE,LN,BL,CT,ST,BLVD,RD,DR,TER,PL,LN,TWP,WAY,HWY,CIR,ROAD,COURT,STREET,BOULVEVARD,AVENUE,LANE,DRIVE,TERRACE,PLACE,LANE,TOWNSHIP,HIGHWAY,CROFT,PIKE", ",")
    arrUnits = Split("STE,APT,RM,SUITE,APARTMENT,ROOM,TRLR,TRAILER,UNIT,#,FL,FLR", ",")
    
    For i = 0 To UBound(arrStreetAddress)
        If IsNumeric(arrStreetAddress(i)) And i > 0 Then
            strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(i)
            Exit For
        End If
        If UCase(Replace(arrStreetAddress(i), ".", "")) = "POB" Then
            strPossiblePOB = "P.O. Box " & arrStreetAddress(i + 1)
            strPossibleCity = arrStreetAddress(i + 1)
            Exit For
        ElseIf UCase(Replace(arrStreetAddress(i), ".", "")) = "PO" Then
            strPossiblePOB = "P.O. Box " & arrStreetAddress(i + 2)
            strPossibleCity = arrStreetAddress(i + 3)
            Exit For
        End If
        strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(i)
        For x = 0 To UBound(arrStreetSuffixes)
            If arrStreetAddress(i) = arrStreetSuffixes(x) Then
                For y = 0 To UBound(arrUnits)
                    If arrStreetAddress(i + 1) = arrUnits(y) Then
                        strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(i + 1) & " " & arrStreetAddress(i + 2)
                    End If
                Next
                GoTo EndLoop
            End If
        Next
    Next

EndLoop:
    
    strPossibleStreetAddress = Trim(strPossibleStreetAddress)
    strPossibleZipcode = arrEntireAddress(UBound(arrEntireAddress))
    strPossibleStateAbbr = arrEntireAddress(UBound(arrEntireAddress) - 1)
        
    With tpParsedAddressed
        .StreetAddress = IIf(Len(strPossibleStreetAddress) > 0, Trim(strPossibleStreetAddress), "")
        .POB = IIf(Len(strPossiblePOB) > 0, Trim(strPossiblePOB), "")
        .City = IIf(Len(strPossibleCity) > 0, strPossibleCity, Trim(Replace(arrCommaDelimited(0), strPossibleStreetAddress, "")))
        .StateAbbr = IIf(Not IsNumeric(strPossibleStateAbbr) And Len(strPossibleStateAbbr) = 2, strPossibleStateAbbr, "NO STATE ABBREVIATION")
        .ZipCode = IIf(IsValidZipCode(strPossibleZipcode), strPossibleZipcode, "NO ZIPCODE")
    
        Debug.Print .StreetAddress
        Debug.Print .POB
        Debug.Print .City
        Debug.Print .StateAbbr
        Debug.Print .ZipCode
    End With
    
End Sub

Private Function IsValidZipCode(ZipString As String) As Boolean
    
    Dim blnValid As Boolean

    Select Case Len(ZipString)
        Case 5
            blnValid = ZipString Like "#####"
        Case 9
            blnValid = ZipString Like "#########"
        Case 10
            blnValid = ZipString Like "#####-####"
    End Select
                
    IsValidZipCode = blnValid

End Function

Open in new window

Joe,

Misspelled "APPARTMENT" on line 37
Joe Howard.  Thank you so much for your help.  Unfortunately it's still faltering on a number of entries.  However it's getting very close. While I came up with an alternative solution, I will definitely find use for your code.  Much appreciated!!!
SOLUTION
Avatar of Matt Chornomaz
Matt Chornomaz

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Matt

Why not use the regular expression engine?  It will save you some coding and will be faster than iterating the city candidates.
Just take into count when using an offline zip code database, changes are made to zip codes from time to time.
Joe Howard, that's a really good point you bring up.  Didn't think of that.  I found out that probably 90% of the addresses are always going to be just in New Jersey.  So it shouldn't be too much of a problem keeping track.  But thank you for pointing that out.  I will have to download a fresh copy of the zip db once in a while.  A pain, but necessary if i'm using the zip db solution.  I will go back to your algorithm to see if I can also get it to work closer to 100% of the time.
aikimark: Thank you for the suggestion.  Unfortunately I've never used Regex before.  Might you have an example of how I would use it for this project?  Thanks again.
Joe Howard: I'm so sorry...when I tried (or thought I did) your new code, I still had several it didn't handle correctly.  I gave your code another go this morning, and it handled all but 3.  Apparently I had ran the old version of your algorithm yesterday.

Here are the three that didn't work.  If there's just no way to handle these, it's not the biggest deal.  It's going to be an automated mail merge for several addresses at once.  So it's not ideal.  But I'm sure I'd eventually run into some that the zip database method doesn't work for as well.

and btw...thank you for moving PO Boxes to the street address instead of in their own column.

If you can get these 3 working, great.  If not, you've already spent a lot of time on this.  This is a question I've seen in many forums...that always remains unanswered due to its difficulty.  I'm sure many will get good use out of your solution.

I feel like the 3rd address in this list should've worked with your method.  Is it because it's a PO Box?
111 SO MAIN ST 2ND FLR PLEASANTVILLE, NJ 082322729
101 11TH ST N UNIT B BRIGANTINE, NJ 082033210
PO BOX 32 WEST MIDDLESEX, PA 161590032

street                                city
111 SO MAIN ST                  2ND FLR PLEASANTVILLE
101 11TH ST                          N UNIT B BRIGANTINE
 P.O. Box 32                         WEST
If I could, I'd like to mark two methods for the thread "answer".  I think Joe Howard's and my code (with the help of aimark and others in this thread) are both viable solutions.  Does anyone know if this is possible?
Hey Matt,
If you are considering using my solution, I will work on it further, just bear with me as I'll only get around to it tomorrow.
Hi Joe, I do really like your solution as it's more permanent than the zip db, and I have time to wait.  I have lots of other parts I can work on for now.  But if it can do the below, I think we're close enough.

Is there a way for the macro to figure out if it wasn't able to determine the city properly?  If I can capture those and just put some warnings in the address line, that would work too.  I believe the mail merge will generate letters where the address will show in a window envelope.  And a "live" person will be stuffing them.  So I can tell them to just keep an eye out.  If they see something like, "*******CHECK ADDRESS*****" in the address line, that should catch their eye enough to discard.  I'm told that would be acceptable.
I posted an example in this comment:
https://www.experts-exchange.com/questions/29150165/VBA-Parse-Address.html?anchorAnswerId=42891931#a42891931

You should be able to use that as your solution template.  The vbscript regexp object wraps the regex engine.  You only need to supply the pattern.  In the case of multiple cities, the downloaded city names are delimited by a vertical bar in the pattern.  The parenthesis causes a capture of the matching city name to be available as a submatch within the matches collection.  In this case, there should only be one match returned by the .Execute method.  The other submatch will be the address.

I don't know which version of the zip code file you are using, so I'm not sure how to tweak your code without breaking it.
@Matt
Try this:
Option Explicit

Private Type ParsedAddress
    StreetAddress As String
    POB As String
    City As String
    StateAbbr As String
    ZipCode As String
End Type

Dim tpParsedAddressed As ParsedAddress

Sub Process()
    Dim rng As Range, c As Range
    
    Set rng = Range("A1:A" & ActiveSheet.Columns("A").Cells.Find("*", SearchOrder:=xlByRows, LookIn:=xlValues, SearchDirection:=xlPrevious).Row)
    For Each c In rng.Cells
        ParseAddress c.Value
        c.Offset(0, 1).Value = tpParsedAddressed.StreetAddress & " " & tpParsedAddressed.POB
        c.Offset(0, 2).Value = tpParsedAddressed.City
        c.Offset(0, 3).Value = tpParsedAddressed.StateAbbr
        c.Offset(0, 4).Value = tpParsedAddressed.ZipCode
    Next
    
End Sub

Sub ParseAddress(strAddress As String)
    
    Dim arrEntireAddress() As String, arrCommaDelimited() As String, arrStreetAddress() As String
    Dim strPossibleZipcode As String, strPossibleStateAbbr As String, strPossibleStreetAddress As String, strPossibleCity As String, strPossiblePOB As String
    Dim arrStreetSuffixes() As String, arrUnits() As String
    Dim i As Long, w As Long, x As Long, y As Long, z As Long
    
    arrEntireAddress = Split(Trim(strAddress))
    arrCommaDelimited = Split(Trim(strAddress), ",")
    arrStreetAddress = Split(arrCommaDelimited(0))
    
    arrStreetSuffixes = Split("AV,AVE,LN,BL,CT,ST,BLVD,RD,DR,TER,PL,LN,TWP,WAY,HWY,CIR,ROAD,COURT,STREET,BOULVEVARD,AVENUE,LANE,DRIVE,TERRACE,PLACE,LANE,TOWNSHIP,HIGHWAY,CROFT,PIKE", ",")
    arrUnits = Split("STE,APT,RM,SUITE,APPARTMENT,ROOM,TRLR,TRAILER,UNIT,#,FL,FLR", ",")
    
    For i = 0 To UBound(arrStreetAddress)
        If IsNumeric(arrStreetAddress(i)) And i > 0 Then
            strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(i)
            Exit For
        End If
        If UCase(Replace(arrStreetAddress(i), ".", "")) = "POB" Then
            strPossiblePOB = "P.O. Box " & arrStreetAddress(i + 1)
            strPossibleCity = IIf(Len(arrStreetAddress(i + 3)) = 2, arrStreetAddress(i + 2), arrStreetAddress(i + 2) & " " & arrStreetAddress(i + 3))
            Exit For
        ElseIf UCase(Replace(arrStreetAddress(i), ".", "")) = "PO" Then
            strPossiblePOB = "P.O. Box " & arrStreetAddress(i + 2)
            strPossibleCity = IIf(Len(arrStreetAddress(i + 4)) = 2, arrStreetAddress(i + 3), arrStreetAddress(i + 3) & " " & arrStreetAddress(i + 4))
            Exit For
        End If
        strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(i)
        For x = 0 To UBound(arrStreetSuffixes)
            If arrStreetAddress(i) = arrStreetSuffixes(x) Then
                For y = 0 To UBound(arrUnits)
                    If arrStreetAddress(i + 1) = arrUnits(y) Then
                        strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(i + 1) & " " & arrStreetAddress(i + 2)
                    End If
                Next
                w = i
                Do While w <= (UBound(arrStreetAddress) - 2)
                    strPossibleStreetAddress = strPossibleStreetAddress & " " & arrStreetAddress(w + 1)
                    Debug.Print arrStreetAddress(w + 1)
                    w = w + 1
                Loop
                GoTo EndLoop
            End If
        Next
    Next

EndLoop:
    
    strPossibleStreetAddress = Trim(strPossibleStreetAddress)
    strPossibleZipcode = arrEntireAddress(UBound(arrEntireAddress))
    strPossibleStateAbbr = arrEntireAddress(UBound(arrEntireAddress) - 1)
        
    With tpParsedAddressed
        .StreetAddress = IIf(Len(strPossibleStreetAddress) > 0, Trim(strPossibleStreetAddress), "")
        .POB = IIf(Len(strPossiblePOB) > 0, Trim(strPossiblePOB), "")
        .City = IIf(Len(strPossibleCity) > 0, strPossibleCity, Trim(Replace(arrCommaDelimited(0), strPossibleStreetAddress, "")))
        .StateAbbr = IIf(Not IsNumeric(strPossibleStateAbbr) And Len(strPossibleStateAbbr) = 2, strPossibleStateAbbr, "NO STATE ABBREVIATION")
        .ZipCode = IIf(IsValidZipCode(strPossibleZipcode), strPossibleZipcode, "NO ZIPCODE")
    
        Debug.Print .StreetAddress
        Debug.Print .POB
        Debug.Print .City
        Debug.Print .StateAbbr
        Debug.Print .ZipCode
    End With
    
End Sub

Private Function IsValidZipCode(ZipString As String) As Boolean
    
    Dim blnValid As Boolean

    Select Case Len(ZipString)
        Case 5
            blnValid = ZipString Like "#####"
        Case 9
            blnValid = ZipString Like "#########"
        Case 10
            blnValid = ZipString Like "#####-####"
    End Select
                
    IsValidZipCode = blnValid

End Function

Open in new window

Aimark, thank you for the code!
Joe Howard: Unfortunately the latest iteration of code didn't work on many of the addresses.  The previous code handled more entries.  And it threw an error before it completed the list.  Here are some examples.

138 N PRESBYTERIAN AVE ATLANTIC CITY, NJ 084014702
429 POPLAR AVE APT A GALLOWAY, NJ 082054598
1519 BALTIC AVE APT 210 ATLANTIC CITY, NJ 084014472
6050 WHEELING AVE MAYS LANDING, NJ 083304010
1421 PACIFIC AVE ATLANTIC CITY, NJ 084018004
257 N MANNHEIM AVE EGG HARBOR CITY, NJ 082153335
3423 JUNIPER CT MAYS LANDING, NJ 083302914
323 BEACH AVE ATLANTIC CITY, NJ 084015406
6101 MILL RD EGG HARBOR TWP, NJ 082349620
152 CLARKS LANDING RD PORT REPUBLIC, NJ 082419772
727 BLACK HORSE PIKE APT 3-6 PLEASANTVILLE, NJ 082322358
233 N NEW YORK AVE APT 506 ATLANTIC CITY, NJ 084014451

turned into...
STREET                                                                                               CITY
138 N PRESBYTERIAN AVE ATLANTIC                               CITY
429 POPLAR AVE APT A APT A                                           429 POPLAR AVE APT A GALLOWAY
1519 BALTIC AVE APT 210 APT 210 ATLANTIC               1519 BALTIC AVE APT 210 ATLANTIC CITY
6050 WHEELING AVE MAYS                                               LANDING
1421 PACIFIC AVE ATLANTIC                                               CITY
257 N MANNHEIM AVE EGG HARBOR                              CITY
3423 JUNIPER CT MAYS                                                       LANDING
323 BEACH AVE ATLANTIC                                                   CITY
6101 MILL RD EGG HARBOR                                               TWP
152 CLARKS LANDING RD PORT                                       REPUBLIC
727 BLACK HORSE PIKE APT 3-6 APT 3-6                          727 BLACK HORSE PIKE APT 3-6 PLEASANTVILLE
233 N NEW YORK AVE APT 506 APT 506 ATLANTIC       233 N NEW YORK AVE APT 506 ATLANTIC CITY

I got a Run-time #9 (Subscript out of range) when it got to the first PO Box.
PO BOX 688 ABSECON, NJ 082010688
Code line with error:
strPossibleCity = IIf(Len(arrStreetAddress(i + 4)) = 2, arrStreetAddress(i + 3), arrStreetAddress(i + 3) & " " & arrStreetAddress(i + 4))

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Amazing!  Worked on 249 out of 250 of my addresses.  And it gave a warning to the end-user for the one it didn't work on, just like I hoped.  Now...how do I mark this as answered?
I see what you mean in your pic, but those items don't appear for me????

 User generated image
It may have to do with previously marking my post as helpful.

See if you can remove that and see if that makes a difference.
Hey Joe.  Not sure why, but when I logged off and back in again, that strip of options returned.  I will be marking your solution as the answer.  Thanks to you and everyone else for all the help.  It was a daunting problem that I wasn't sure i'd be able to solve.  So thanks again.
As you wrote, failure is not an option :-)
Glad to help.
Thank you all very much!!!