Link to home
Start Free TrialLog in
Avatar of OB1Canobie
OB1CanobieFlag for United States of America

asked on

Parsing a comma delimited text file to a string

I have a routine text file that is supplied in comma delimited format.  Each field is set off with quotation "" marks; however, the total columns are all grouped together in quotation marks, but are seperated with commas inside the quotation marks.  I need to split these fields into an array so I can append to a table.  I have attached my code that I am using to split the data, but I need something that goes inside the quotation marks and splits the data inside the quote marks.  Keep in mind, I do have several circumstances where a last name and first name are together in quotation marks that need to have a comma between.  A sample of my data is:

Sample:
"Brown, John", "05/12/09", "111.32, 3432.11, 6325.21, 29383.22", "BP"

Code:

    Do Until objFile.AtEndOfStream
    StrFile = objFile.ReadLine
    StrFile = Replace(StrFile, Chr(34), "")
    ArrFile = Split(StrFile, ",")
Avatar of zhuba
zhuba
Flag of New Zealand image

You'll probably need a regex to make sure that it only parses commas that aren't already enclosed in quotation marks.

http://msdn.microsoft.com/en-us/library/ms974570.aspx and http://www.brettb.com/VBScriptRegularExpressions.asp for a brief intro

You'll want a regex along the lines of """[^""]+""," - which is "[^"]+", but with the quotations doubled so that they get converted to a string properly.

Avatar of OB1Canobie

ASKER

The problem is that I also need commas parsed inside quotation marks, however, all fields that are inside quotation marks that need to be parsed are numeric.
I'm assuming you want your data

"Brown, John", "05/12/09", "111.32, 3432.11, 6325.21, 29383.22", "BP"

to turn into a list of:

"Brown, John"
 "05/12/09"
"111.32, 3432.11, 6325.21, 29383.22"
"BP"

If you use the regex engine, it searches for the pattern "[^"]+", which breaks down as follows:

" - opening quotation mark
[^"]+ - [^"] means any character except " and the + means any number of characters
" - the closing quotation mark
, - the comma before the next entry.

This will iterate to the end but skip the last entry, which you can then find by searching "[^"]+" in the end bit.
I'm trying to get to the following output:

Brown, John   > Field 1
05/12/09 > Field 2
111.32 > Field 3
3432.11 > Field 4
6325.21 > Field 5
29383.22 > Field 6
This is a little more complicated. Since you only need to split by commas within quotation marks when using non-numeric values, you're probably best to split by " first, then remove the commas between the quotes every second entry, then go through and check for numeric characters in the string and if there are then split again by comma.
ASKER CERTIFIED SOLUTION
Avatar of RobSampson
RobSampson
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
if the third field is the only field on each line that you want to parse inside the quotes, then you can also use this as you loop through the text file:

strFile = objFile.ReadLine
Cnt = 1
If Left(strFile, 1) = """" Then strFile = Mid(strFile, 2)
If Right(strFile, 1) = """" Then strFile = Left(strFile, Len(strFile) - 1)
arrFields = Split(strFile, """, """)

For i = 0 to ubound(arrFields)
      If i = 2 then
            arrNumFields = Split(arrFields(i), ", ")
            For x = 0 to ubound(arrNumFields)
                  List = List & "Field " & Cnt & vbtab & arrNumFields(x)
                        Cnt = Cnt +1
            Next
      Else
            List = List & "Field " & Cnt & vbtab & arrFields(i)
                Cnt = Cnt +1
      End If
Next

Wscript.echo  List
Thanks Rob for your help.  This is what I was looking for.
Thanks for the grade.

Regards,

Rob.