Solved

How to Convert Address List in Word to Excel

Posted on 2011-09-20
14
277 Views
Last Modified: 2012-05-12
I'm using office 2007.

I have a list of 500ish addresses that someone typed up in word. They need each address to be on its own row in Excel.

This would be relatively simple if the addresses were all the same number of lines long; however, they are not.

Some are 3 lines. Some are 4. Some are 5, 6, even 7 lines long. All of the different length addresses are scattered randomly throughout the document.

Is there an easy way that I am missing to convert this giant list into an Excel document?

I would prefer not to have to separate the different length files into different documents and use "convert text to table" from there.
0
Comment
Question by:nkill
  • 5
  • 3
  • 3
  • +3
14 Comments
 
LVL 21

Expert Comment

by:yo_bee
ID: 36571082
Off hand I do not think so.
You might need to do some serious editing.

Why would this have ever been created in word to begin with if it was needed in a true row Column format.

I am sorry  for this lack of a answer.
0
 

Author Comment

by:nkill
ID: 36571086
I didn't create it.

I'm just being asked to fix it.   >_<
0
 
LVL 21

Expert Comment

by:yo_bee
ID: 36571093
:(

They should have asked your first.  now you look like some serious formatting.
0
 
LVL 21

Expert Comment

by:yo_bee
ID: 36571106
can you post a sample DOC file?
0
 

Author Comment

by:nkill
ID: 36571121
sure.

names, addresses, and locations changed because the people in the file scare me.
address-sample.docx
0
 

Author Comment

by:nkill
ID: 36571127
There are 500 or so of those addresses and the length in lines varies from 3 to 7.
0
 
LVL 21

Expert Comment

by:yo_bee
ID: 36571144
You could do it if there was some conformity to the lines.
Unfortunately with one set of data have the address in line 2 and others in line 3.

I would recommend that a non-IT person recreate this in a excel sheet or some sort of template that would be easier to format.

You would need some serious logic to work this DOC file to get the desired results.
:(
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 43

Expert Comment

by:Saqib Husain, Syed
ID: 36571403
Try this macro from Word while excel is open.

Sub a()
ThisDocument.Select
 Debug.Print Len(Selection)
addr = Selection
addr = Replace(addr, vbCr & vbCr, "###")
addr = Replace(addr, vbCr, ",")
addr = Split(addr, "###")
Set xls = GetObject(, "excel.application")
Set ws = xls.activesheet
ws.Range(ws.Cells(1, 1), ws.Cells(UBound(addr) + 1, 1)) = xls.worksheetfunction.transpose(addr)
End Sub
0
 
LVL 76

Expert Comment

by:GrahamSkan
ID: 36573032
It is hard to parse into columns so that, say column 1 = Name, column 2 = Building, Column3 = Road, but this Word macro will put each address in its own row, with each line of the address in a separate column.
Sub CopyToExcel()
    Dim xlApp As Excel.Application
    Dim xlWbk As Excel.Workbook
    Dim xlWks As Excel.Worksheet
    Dim r As Integer
    Dim c As Integer
    Dim para As Word.Paragraph
    Dim rng As Word.Range
    Dim strText As String
    Set xlApp = CreateObject("Excel.Application")
    Set xlWbk = xlApp.Workbooks.Add
    
    xlApp.Visible = True
    Set xlWks = xlWbk.Sheets(1)
    r = 1
    c = 1
    For Each para In ActiveDocument.Paragraphs
        Set rng = para.Range
        rng.MoveEnd wdCharacter, -1
        If Len(rng.Text) = 0 Then
            c = 1
            r = r + 1
        Else
            xlWks.Cells(r, c).Value = rng.Text
            c = c + 1
        End If
    Next para
        
    'xlWbk.SaveAS filename
    'xlWbk.Close
    'xlApp.Quit
End Sub

Open in new window

0
 
LVL 21

Expert Comment

by:yo_bee
ID: 36573107
Since the Word Doc does not have any set pattern there will still need a tremendous amount of manual formatting and manipulating to get the data in a logical table (DB) format.

The nice thing about getting this into an excel table at least each row will contain each grouping.
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 36573555
It is possible to improve a bit on the suggestions thus far by using Regular Expressions or some other text parsing techniques, but any method is going to require lots of manual cleanup.

And as for entries with 7 lines: Just.  Don't.  If these are US addresses, then there should not ever be more than 5 lines.
0
 
LVL 92

Accepted Solution

by:
Patrick Matthews earned 250 total points
ID: 36574115
Assuming you simply copy your list into Excel first, this code seems to be working.


Sub ParseEntries()
    
    Dim LastR As Long
    Dim arr As Variant
    Dim Counter As Long
    Dim DestR As Long
    Dim TestLine As String
    Dim ItemLine As Long
    Dim CityState As String
    
    With ActiveSheet
        LastR = .Cells(.Rows.Count, 1).End(xlUp).Row
        arr = .Range("a1:a" & LastR).Value
    End With
    
    Workbooks.Add
    [a1:g1].Value = Array("Name", "C/O", "Address1", "Address2", "City", "State", "ZIP")
    DestR = 2
    
    For Counter = 1 To LastR
        TestLine = Application.Trim(Application.Clean(Replace(arr(Counter, 1), Chr(160), " ")))
        If Counter = 1 Then
            Cells(DestR, "a") = TestLine
        Else
            If RegExpFind(TestLine, "^[^,]+, [A-Z]{2} \d{5}(-\d{4})?$", 1) <> "" Then
                CityState = RegExpFind(TestLine, "^[^,]+, [A-Z]{2}", 1)
                Cells(DestR, "e") = Left(CityState, InStr(1, CityState, ",") - 1)
                Cells(DestR, "f") = Right(CityState, 2)
                Cells(DestR, "g") = RegExpFind(TestLine, "\d{5}(-\d{4})?$", 1)
                ItemLine = 5
            ElseIf TestLine Like "#*" Then
                If ItemLine < 3 Then
                    Cells(DestR, "c") = TestLine
                    ItemLine = 3
                Else
                    Cells(DestR, "d") = TestLine
                    ItemLine = 4
                End If
            ElseIf TestLine <> "" Then
                If ItemLine = 0 Then
                    Cells(DestR, "a") = TestLine
                    ItemLine = 1
                ElseIf ItemLine = 1 Then
                    Cells(DestR, "b") = TestLine
                    ItemLine = 2
                ElseIf ItemLine = 3 Then
                    Cells(DestR, "d") = TestLine
                    ItemLine = 4
                End If
            Else
                DestR = DestR + 1
                ItemLine = 0
            End If
        End If
    Next
    
    Columns.AutoFit
    
    MsgBox "Done"
    
End Sub

Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' For more info, please see:
    ' http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' Pos = 0                   : the last match
    ' Pos = -1                  : the last match
    ' Pos = -2                  : the 2nd to last match
    ' Pos = <negative integer>  : the Nth to last match
    ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
    ' matches, the function returns an empty string.  If no match is found, the function returns
    ' an empty string.  (Earlier versions of this code used zero for the last match; this is
    ' retained for backward compatibility)
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' ReturnType indicates what information you want to return:
    ' ReturnType = 0            : the matched values
    ' ReturnType = 1            : the starting character positions for the matched values
    ' ReturnType = 2            : the lengths of the matched values
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
    ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
    ' the character positions conform to VBA/VB6 expectations
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Answer()
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Evaluate ReturnType
    
    If ReturnType < 0 Or ReturnType > 2 Then
        RegExpFind = ""
        Exit Function
    End If
    
    ' Create instance of RegExp object if needed, and set properties
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' Test to see if Pos is negative, which indicates the user wants the Nth to last
        ' match.  If it is, then based on the number of matches convert Pos to a positive
        ' number, or zero for the last match
        
        If Not IsMissing(Pos) Then
            If Pos < 0 Then
                If Pos = -1 Then
                    Pos = 0
                Else
                    
                    ' If Abs(Pos) > number of matches, then the Nth to last match does not
                    ' exist.  Return a zero-length string
                    
                    If Abs(Pos) <= TheMatches.Count Then
                        Pos = TheMatches.Count + Pos + 1
                    Else
                        RegExpFind = ""
                        GoTo Cleanup
                    End If
                End If
            End If
        End If
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1)
            For Counter = 0 To UBound(Answer)
                Select Case ReturnType
                    Case 0: Answer(Counter) = TheMatches(Counter)
                    Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
                    Case 2: Answer(Counter) = TheMatches(Counter).Length
                End Select
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        
        Else
            Select Case Pos
                Case 0                          ' Last match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
                        Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
                    End Select
                Case 1 To TheMatches.Count      ' Nth match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(Pos - 1)
                        Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(Pos - 1).Length
                    End Select
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFind = ""
    End If
    
Cleanup:
    ' Release object variables
    
    Set TheMatches = Nothing
    
End Function

Open in new window



It relies in part on Regular Expressions to do its magic; for more about that (and for that RegExpFind function I included above), please see:

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/A_1336-Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
0
 
LVL 31

Assisted Solution

by:Helen_Feddema
Helen_Feddema earned 250 total points
ID: 36574193
matthews -- I have seen U.S. addresses with 7 lines.  Usually at universities, with lines for department, building, floor, etc.

Here is a suggestion for dealing with this mess:  Supposing that the document contains only addresses (no other type of info in between them), go through it and put a page break at the end of each address.  Then you could write VBA code to copy each page (address) to a cell in the Excel workbook.

The ideal solution would be to parse the addresses into separate fields and put them in an Access database, but that would take a *lot* of work.
0
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 36574635
Helen_Feddema,

Point taken.  It's unusual, but not unheard of.

That said, I despair of having to figure out the logic of where to place everything in a 7-line entry :)

Patrick
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

I'm writing to share my clumsy experience in using this elegant tool so you can avoid every stupid mistake I made. (I leave it to the authorities to decide if this deserves a place in the Knowledge archives.)  Now that I am on the other side of my l…
Shortcuts in Word Just the other day I had a training for Microsoft and they wanted me to show how well the new Windows and Office behaved on a touch device, which by the way is great, but it was only then that I realized that using keyboard shortc…
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
This Micro Tutorial will demonstrate on a Mac how to change the sort order for chart legend values and decrpyt the intimidating chart menu.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now