Parsing multi-line text file records into CSV format

Lloyd Charlier
Lloyd Charlier used Ask the Experts™
I have a text file containing company data for more than 25000 organizations, six lines per company, with repeated fields as shown below:

Principal's_Name, Principal's_Title
City ST Zip

I need to get this data into CSV format (to be imported as "Accounts" inside Outlook 2007 BCM), one record per line.  While I can see how to do this programmatically using VB or possibly some scripting language, I'm wondering if there isn't an even easier way to do what I want using either a standalone utility or perhaps some import capability possessed by one of the Office apps (Excel, Access, etc...).  Suggestions?

** NOTE **
Just to make things challenging, the fourth (4th) line of each record (City State ZIP) is not comma-delimited - the original author simply used spaces <grrr>.  I point this out as obviously there are city names (e.g. "New York City") having embedded spaces.  Conversely, many of the records contain street addresses (3rd line) having embedded commas <double grrr>.

Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Depending on how the file is formatted, you could copy and paste it into MS Word.  Then chose the Replace function from the menu.  At the bottom menu choose "More", then "Special" - then choose the top choice "paragraph mark".  Replace that with a comma.
Then copy and paste the whole file into Excel and clean up if needed.  Then save-as  CSV.
Top Expert 2008

If you don't have any blank rows b/n the companies, your data starts at row 1 without headers, you can use the following code to reformat in excel, then save as csv.

It's transposing the block of 6 lines to 6 columns in a first loop, then delete the empty rows.


Sub MacroDefault()
Dim lastRow As Long, i As Long, j As Integer 'define variables
application.ScreenUpdating = False 'disable ScreenUpdating to avoid screen flashes
lastRow = range("A" & Rows.Count).End(xlUp).Row 'get last row
i = 1
Do While i < lastRow 'loop from last Row to one
    For j = 1 To 5
        Cells(i, 1 + j) = Cells(i + j, 1)
        Cells(i + j, 1) = ""
    Next j
    i = i + 7
For i = lastRow To 1 Step -1 'loop from last Row to one
    If Cells(i, 1) = "" Then Rows(i).Delete
application.ScreenUpdating = True 'reenable ScreenUpdating
End Sub

Open in new window

Shawn CøadyTechonology and Information Systems

In the past I've used a product called "Monarch" that was a great tool for scraping data from old fashioned spool file (reports) into csv or excel files.
Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

This little snippet I use relies on the postal state / province code being uppercase, and the city name to be propercase as in "New York".  If this is the case it will parse out the city, state, and zip and remove commas from the city field. It also saves leading zeros on the zipcode field.

You only need to find / replace to remove the commas from the address.

Sub macro_csz()
    Dim csz As Range, zcode As Range
    Dim letter As Integer
    Dim csz_col As String
    csz_col = InputBox("Enter the letter of the column containing City State Zip data.", "Where is your data?")
    Set csz = Range(csz_col & "2", Range(csz_col & "2").End(xlDown))
    Range(csz.Offset(0, 1), csz.Offset(0, 3)).EntireColumn.Insert
    csz.Offset(-1, 1).Value = "City"
    csz.Offset(-1, 2).Value = "State"
    csz.Offset(-1, 3).Value = "ZipCode"
    For Each zcode In csz
        For letter = 2 To Len(zcode.Value)
            If Asc(Mid(zcode, letter, 1)) >= 65 And Asc(Mid(zcode, letter, 1)) <= 90 And Asc(Mid(zcode, letter - 1, 1)) >= 65 And Asc(Mid(zcode, letter - 1, 1)) <= 90 Then
                zcode.Offset(0, 1).Value = Trim(Mid(zcode, 1, letter - 2))
                zcode.Offset(0, 1).Value = IIf(Right(zcode.Offset(0, 1).Value, 1) = ",", Left(zcode.Offset(0, 1).Value, Len(zcode.Offset(0, 1).Value) - 1), zcode.Offset(0, 1).Value)
                zcode.Offset(0, 2).Value = Mid(zcode, letter - 1, 2)
                zcode.Offset(0, 3).NumberFormat = "@"
                zcode.Offset(0, 3).Value = Trim(Mid(zcode, letter + 2, Len(zcode) - letter))
                Exit For
            End If
End Sub

Open in new window

Lloyd CharlierChief Executive Officer


As is usually the case, the input data I received was not perfectly consistent in format therefore no one answer completely solved my problem.  That said, I was ultimately successful by using a combination of offered solutions:

1) I reformatted the data inside Word using the replacement functions found under the Advanced button (never knew that option was there - very powerful stuff hidden there!).
2) From here I was able to input the data into Excel where I subsequently  performed additional parsing of the City, State and ZIP data using the provide VBA.

Thanks to all of you, I now have a CRM database having more than 30000 new customer profiles.

I didn't mention it at first because it seemed unnecessary, but for dirty text files, where the number of address lines, absence of company line, partial inclusion of country line, etc may occur.  I do the following hand-cleanup process.
I insert two columns on the left, the first column is record number.  Using a formula (which can vary according to the data) I increase the record number by one when something occurs, like a blank line or a zip.  The second column (B) is a fieldtype identifier.  This can be harder to generate with a formula, but I use 1-6, where 1 is name, 2 company, .... 6 country.  I then use Filter, Advanced, copy to new range, to separate all the data into a series of two column tables of "record number, field"  where each table has only one field.  Sort all the tables by record number and paste the columns into one table, and you are done.
The only good thing about this method is that it doesn't take more than a moment or two to estimate how long it will take me to complete, so if the estimate looks more like hours than minutes I forget it and move on to plan B.  It also affords some eyeball time, so I see things are screwy early rather than after a macro has trashed it to the point I need to start over.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial