• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 304
  • Last Modified:

How to Convert Address List in Word to Excel

I'm using office 2007.

I have a list of 500ish addresses that someone typed up in word. They need each address to be on its own row in Excel.

This would be relatively simple if the addresses were all the same number of lines long; however, they are not.

Some are 3 lines. Some are 4. Some are 5, 6, even 7 lines long. All of the different length addresses are scattered randomly throughout the document.

Is there an easy way that I am missing to convert this giant list into an Excel document?

I would prefer not to have to separate the different length files into different documents and use "convert text to table" from there.
  • 5
  • 3
  • 3
  • +3
2 Solutions
yo_beeDirector of Information TechnologyCommented:
Off hand I do not think so.
You might need to do some serious editing.

Why would this have ever been created in word to begin with if it was needed in a true row Column format.

I am sorry  for this lack of a answer.
nkillAuthor Commented:
I didn't create it.

I'm just being asked to fix it.   >_<
yo_beeDirector of Information TechnologyCommented:

They should have asked your first.  now you look like some serious formatting.
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

yo_beeDirector of Information TechnologyCommented:
can you post a sample DOC file?
nkillAuthor Commented:

names, addresses, and locations changed because the people in the file scare me.
nkillAuthor Commented:
There are 500 or so of those addresses and the length in lines varies from 3 to 7.
yo_beeDirector of Information TechnologyCommented:
You could do it if there was some conformity to the lines.
Unfortunately with one set of data have the address in line 2 and others in line 3.

I would recommend that a non-IT person recreate this in a excel sheet or some sort of template that would be easier to format.

You would need some serious logic to work this DOC file to get the desired results.
Saqib Husain, SyedEngineerCommented:
Try this macro from Word while excel is open.

Sub a()
 Debug.Print Len(Selection)
addr = Selection
addr = Replace(addr, vbCr & vbCr, "###")
addr = Replace(addr, vbCr, ",")
addr = Split(addr, "###")
Set xls = GetObject(, "excel.application")
Set ws = xls.activesheet
ws.Range(ws.Cells(1, 1), ws.Cells(UBound(addr) + 1, 1)) = xls.worksheetfunction.transpose(addr)
End Sub
It is hard to parse into columns so that, say column 1 = Name, column 2 = Building, Column3 = Road, but this Word macro will put each address in its own row, with each line of the address in a separate column.
Sub CopyToExcel()
    Dim xlApp As Excel.Application
    Dim xlWbk As Excel.Workbook
    Dim xlWks As Excel.Worksheet
    Dim r As Integer
    Dim c As Integer
    Dim para As Word.Paragraph
    Dim rng As Word.Range
    Dim strText As String
    Set xlApp = CreateObject("Excel.Application")
    Set xlWbk = xlApp.Workbooks.Add
    xlApp.Visible = True
    Set xlWks = xlWbk.Sheets(1)
    r = 1
    c = 1
    For Each para In ActiveDocument.Paragraphs
        Set rng = para.Range
        rng.MoveEnd wdCharacter, -1
        If Len(rng.Text) = 0 Then
            c = 1
            r = r + 1
            xlWks.Cells(r, c).Value = rng.Text
            c = c + 1
        End If
    Next para
    'xlWbk.SaveAS filename
End Sub

Open in new window

yo_beeDirector of Information TechnologyCommented:
Since the Word Doc does not have any set pattern there will still need a tremendous amount of manual formatting and manipulating to get the data in a logical table (DB) format.

The nice thing about getting this into an excel table at least each row will contain each grouping.
Patrick MatthewsCommented:
It is possible to improve a bit on the suggestions thus far by using Regular Expressions or some other text parsing techniques, but any method is going to require lots of manual cleanup.

And as for entries with 7 lines: Just.  Don't.  If these are US addresses, then there should not ever be more than 5 lines.
Patrick MatthewsCommented:
Assuming you simply copy your list into Excel first, this code seems to be working.

Sub ParseEntries()
    Dim LastR As Long
    Dim arr As Variant
    Dim Counter As Long
    Dim DestR As Long
    Dim TestLine As String
    Dim ItemLine As Long
    Dim CityState As String
    With ActiveSheet
        LastR = .Cells(.Rows.Count, 1).End(xlUp).Row
        arr = .Range("a1:a" & LastR).Value
    End With
    [a1:g1].Value = Array("Name", "C/O", "Address1", "Address2", "City", "State", "ZIP")
    DestR = 2
    For Counter = 1 To LastR
        TestLine = Application.Trim(Application.Clean(Replace(arr(Counter, 1), Chr(160), " ")))
        If Counter = 1 Then
            Cells(DestR, "a") = TestLine
            If RegExpFind(TestLine, "^[^,]+, [A-Z]{2} \d{5}(-\d{4})?$", 1) <> "" Then
                CityState = RegExpFind(TestLine, "^[^,]+, [A-Z]{2}", 1)
                Cells(DestR, "e") = Left(CityState, InStr(1, CityState, ",") - 1)
                Cells(DestR, "f") = Right(CityState, 2)
                Cells(DestR, "g") = RegExpFind(TestLine, "\d{5}(-\d{4})?$", 1)
                ItemLine = 5
            ElseIf TestLine Like "#*" Then
                If ItemLine < 3 Then
                    Cells(DestR, "c") = TestLine
                    ItemLine = 3
                    Cells(DestR, "d") = TestLine
                    ItemLine = 4
                End If
            ElseIf TestLine <> "" Then
                If ItemLine = 0 Then
                    Cells(DestR, "a") = TestLine
                    ItemLine = 1
                ElseIf ItemLine = 1 Then
                    Cells(DestR, "b") = TestLine
                    ItemLine = 2
                ElseIf ItemLine = 3 Then
                    Cells(DestR, "d") = TestLine
                    ItemLine = 4
                End If
                DestR = DestR + 1
                ItemLine = 0
            End If
        End If
    MsgBox "Done"
End Sub

Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
    Optional MultiLine As Boolean = False)
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    ' For more info, please see:
    ' http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' Pos = 0                   : the last match
    ' Pos = -1                  : the last match
    ' Pos = -2                  : the 2nd to last match
    ' Pos = <negative integer>  : the Nth to last match
    ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
    ' matches, the function returns an empty string.  If no match is found, the function returns
    ' an empty string.  (Earlier versions of this code used zero for the last match; this is
    ' retained for backward compatibility)
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    ' ReturnType indicates what information you want to return:
    ' ReturnType = 0            : the matched values
    ' ReturnType = 1            : the starting character positions for the matched values
    ' ReturnType = 2            : the lengths of the matched values
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
    ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
    ' the character positions conform to VBA/VB6 expectations
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    Static RegX As Object
    Dim TheMatches As Object
    Dim Answer()
    Dim Counter As Long
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
            Pos = CLng(Pos)
        End If
    End If
    ' Evaluate ReturnType
    If ReturnType < 0 Or ReturnType > 2 Then
        RegExpFind = ""
        Exit Function
    End If
    ' Create instance of RegExp object if needed, and set properties
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
    ' Test to see if there are any matches
    If RegX.Test(LookIn) Then
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        Set TheMatches = RegX.Execute(LookIn)
        ' Test to see if Pos is negative, which indicates the user wants the Nth to last
        ' match.  If it is, then based on the number of matches convert Pos to a positive
        ' number, or zero for the last match
        If Not IsMissing(Pos) Then
            If Pos < 0 Then
                If Pos = -1 Then
                    Pos = 0
                    ' If Abs(Pos) > number of matches, then the Nth to last match does not
                    ' exist.  Return a zero-length string
                    If Abs(Pos) <= TheMatches.Count Then
                        Pos = TheMatches.Count + Pos + 1
                        RegExpFind = ""
                        GoTo Cleanup
                    End If
                End If
            End If
        End If
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1)
            For Counter = 0 To UBound(Answer)
                Select Case ReturnType
                    Case 0: Answer(Counter) = TheMatches(Counter)
                    Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
                    Case 2: Answer(Counter) = TheMatches(Counter).Length
                End Select
            RegExpFind = Answer
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
            Select Case Pos
                Case 0                          ' Last match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
                        Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
                    End Select
                Case 1 To TheMatches.Count      ' Nth match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(Pos - 1)
                        Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(Pos - 1).Length
                    End Select
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    ' If there are no matches, return empty string
        RegExpFind = ""
    End If
    ' Release object variables
    Set TheMatches = Nothing
End Function

Open in new window

It relies in part on Regular Expressions to do its magic; for more about that (and for that RegExpFind function I included above), please see:

Helen FeddemaCommented:
matthews -- I have seen U.S. addresses with 7 lines.  Usually at universities, with lines for department, building, floor, etc.

Here is a suggestion for dealing with this mess:  Supposing that the document contains only addresses (no other type of info in between them), go through it and put a page break at the end of each address.  Then you could write VBA code to copy each page (address) to a cell in the Excel workbook.

The ideal solution would be to parse the addresses into separate fields and put them in an Access database, but that would take a *lot* of work.
Patrick MatthewsCommented:

Point taken.  It's unusual, but not unheard of.

That said, I despair of having to figure out the logic of where to place everything in a 7-line entry :)

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 5
  • 3
  • 3
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now