Solved

Parsing a comma delimited text file to a string

Posted on 2008-06-25
9
1,774 Views
Last Modified: 2013-12-20
I have a routine text file that is supplied in comma delimited format.  Each field is set off with quotation "" marks; however, the total columns are all grouped together in quotation marks, but are seperated with commas inside the quotation marks.  I need to split these fields into an array so I can append to a table.  I have attached my code that I am using to split the data, but I need something that goes inside the quotation marks and splits the data inside the quote marks.  Keep in mind, I do have several circumstances where a last name and first name are together in quotation marks that need to have a comma between.  A sample of my data is:

Sample:
"Brown, John", "05/12/09", "111.32, 3432.11, 6325.21, 29383.22", "BP"

Code:

    Do Until objFile.AtEndOfStream
    StrFile = objFile.ReadLine
    StrFile = Replace(StrFile, Chr(34), "")
    ArrFile = Split(StrFile, ",")
0
Comment
Question by:OB1Canobie
  • 3
  • 3
  • 2
  • +1
9 Comments
 
LVL 7

Expert Comment

by:zhuba
ID: 21870802
You'll probably need a regex to make sure that it only parses commas that aren't already enclosed in quotation marks.

http://msdn.microsoft.com/en-us/library/ms974570.aspx and http://www.brettb.com/VBScriptRegularExpressions.asp for a brief intro

You'll want a regex along the lines of """[^""]+""," - which is "[^"]+", but with the quotations doubled so that they get converted to a string properly.

0
 

Author Comment

by:OB1Canobie
ID: 21870810
The problem is that I also need commas parsed inside quotation marks, however, all fields that are inside quotation marks that need to be parsed are numeric.
0
 
LVL 7

Expert Comment

by:zhuba
ID: 21870842
I'm assuming you want your data

"Brown, John", "05/12/09", "111.32, 3432.11, 6325.21, 29383.22", "BP"

to turn into a list of:

"Brown, John"
 "05/12/09"
"111.32, 3432.11, 6325.21, 29383.22"
"BP"

If you use the regex engine, it searches for the pattern "[^"]+", which breaks down as follows:

" - opening quotation mark
[^"]+ - [^"] means any character except " and the + means any number of characters
" - the closing quotation mark
, - the comma before the next entry.

This will iterate to the end but skip the last entry, which you can then find by searching "[^"]+" in the end bit.
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 

Author Comment

by:OB1Canobie
ID: 21870856
I'm trying to get to the following output:

Brown, John   > Field 1
05/12/09 > Field 2
111.32 > Field 3
3432.11 > Field 4
6325.21 > Field 5
29383.22 > Field 6
0
 
LVL 7

Expert Comment

by:zhuba
ID: 21870898
This is a little more complicated. Since you only need to split by commas within quotation marks when using non-numeric values, you're probably best to split by " first, then remove the commas between the quotes every second entry, then go through and check for numeric characters in the string and if there are then split again by comma.
0
 
LVL 65

Accepted Solution

by:
RobSampson earned 500 total points
ID: 21871473
Hi, from I see, things could be complicated, but could hopefully be simplified.

Basically, from this:
"Brown, John", "05/12/09", "111.32, 3432.11, 6325.21, 29383.22", "BP"

you have a definition of five fields, being:
"Brown, John"
"05/12/09"
"111.32, 3432.11, 6325.21, 29383.22"
"BP"

From your last "ideal" scenario, it will be too difficult to say that "Brown, John" is to remain together, but "111.32, 3432.11, 6325.21, 29383.22" is to be split up, UNLESS "Brown, John" was *always* the first field, and "111.32, 3432.11, 6325.21, 29383.22" was *always* the third field.

Here is some code:
'==============
strFile = objFile.ReadLine
If Left(strFile, 1) = """" Then strFile = Mid(strFile, 2)
If Right(strFile, 1) = """" Then strFile = Left(strFile, Len(strFile) - 1)
arrFields = Split(strFile, """, """)
MsgBox Join(arrFields, VbCrLf)
'==============

that will give you this, in an Array:
Brown, John
05/12/09
111.32, 3432.11, 6325.21, 29383.22
BP

So, Brown, John is arrFields(0) and 111.32, 3432.11, 6325.21, 29383.22 is arrFields(2)

So, you could then loop through the array, and say that arrFields(0) is going to be in cell 1, but arrFields(2) is going to be split by the comma, and placed accordingly.....

You could also do this:
'=================
strFile = objFile.ReadLine
If Left(strFile, 1) = """" Then strFile = Mid(strFile, 2)
If Right(strFile, 1) = """" Then strFile = Left(strFile, Len(strFile) - 1)
arrFields = Split(strFile, """, """)
arrAllFields = Split(Join(arrFields, ", "), ", ")
MsgBox Join(arrAllFields, VbCrLf)
'=================

where arrAllFields now contains the elements:
Brown
John
05/12/09
111.32
3432.11
6325.21
29383.22
BP

So you could decide what to do with it then as well....

Regards,

Rob.
0
 
LVL 15

Expert Comment

by:sr75
ID: 21874565
if the third field is the only field on each line that you want to parse inside the quotes, then you can also use this as you loop through the text file:

strFile = objFile.ReadLine
Cnt = 1
If Left(strFile, 1) = """" Then strFile = Mid(strFile, 2)
If Right(strFile, 1) = """" Then strFile = Left(strFile, Len(strFile) - 1)
arrFields = Split(strFile, """, """)

For i = 0 to ubound(arrFields)
      If i = 2 then
            arrNumFields = Split(arrFields(i), ", ")
            For x = 0 to ubound(arrNumFields)
                  List = List & "Field " & Cnt & vbtab & arrNumFields(x)
                        Cnt = Cnt +1
            Next
      Else
            List = List & "Field " & Cnt & vbtab & arrFields(i)
                Cnt = Cnt +1
      End If
Next

Wscript.echo  List
0
 

Author Closing Comment

by:OB1Canobie
ID: 31470801
Thanks Rob for your help.  This is what I was looking for.
0
 
LVL 65

Expert Comment

by:RobSampson
ID: 21888697
Thanks for the grade.

Regards,

Rob.
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you have ever used Microsoft Word then you know that it has a good spell checker and it may have occurred to you that the ability to check spelling might be a nice piece of functionality to add to certain applications of yours. Well the code that…
Article by: Martin
Here are a few simple, working, games that you can use as-is or as the basis for your own games. Tic-Tac-Toe This is one of the simplest of all games.   The game allows for a choice of who goes first and keeps track of the number of wins for…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now