Solved

Parse csv with double quotes in vb.net

Posted on 2012-04-13
13
1,874 Views
Last Modified: 2012-04-13
How can I extend the function below to escape double quotes found in field data, eg:
"some data", "some more data with a size eg 8" where the double quotes need escaping", "some more data"

    
Private Function ParseLine(ByVal oneLine As String) As String()
        ' Returns an array containing the values of the comma-separated fields.
        ' This pattern actually recognizes the correct commas.
        ' The Regex.Split() command later gets text between the commas.
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        Return r.Split(oneLine)
    End Function

Open in new window

0
Comment
Question by:ncw
13 Comments
 
LVL 17

Expert Comment

by:nepaluz
ID: 37841813
Change your line 8 above to
Return r.Split(oneLine).Replace("""", String.Empty)

Open in new window

(also see article here.)
0
 
LVL 1

Author Comment

by:ncw
ID: 37841838
Vb IDE says: Replace is not a member of 'System.Array'
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 37841855
Hmmm!
    Private Function ParseLine(ByVal oneLine As String) As String()
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        Dim Result = r.Split(oneLine)
        Array.ForEach(Result, Function(x) x.Replace("""", String.Empty))
        Return Result
    End Function

Open in new window

0
 
LVL 5

Expert Comment

by:hafeezmca
ID: 37841857
Hi, you can do that without a function too.
As your concern is with " just pass it to the replace function like below:

dim str as string
str=yourfield.replace(""","")
0
 
LVL 83

Expert Comment

by:CodeCruiser
ID: 37842065
Try this

Dim strLine as string = "your text here"
strLine = strLine.Replace(chr(34) & "," & chr(34), "|")

return strLine.Split("|")
0
 
LVL 1

Author Comment

by:ncw
ID: 37842096
@nepaluz:

if the line of data is:

""9","955.351.401.51","","LHF DISC 350MM FOR 18" WHEELS CAYE","PORSCHE","75.00","0.0000""

then after Result = r.Split(oneLine)
Result(0) gives:

""9","955.351.401.51","","LHF DISC 350MM FOR 18" WHEELS CAYE""

So the split has already  split in the wrong place as Result(0) should be ""9"" therefore Array.ForEach(Result, Function(x) x.Replace("""", String.Empty)) is too late.
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 17

Accepted Solution

by:
nepaluz earned 500 total points
ID: 37842175
You have some strange file structure.
OK, I take it that the line is enclosed in double quotes, so this will handle that.
    Private Function ParseLine(ByVal oneLine As String) As String()
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)
        Return r.Split(oneLine.Replace(vbTab, ","))
    End Function

Open in new window

0
 
LVL 1

Author Comment

by:ncw
ID: 37842185
0
 
LVL 1

Author Comment

by:ncw
ID: 37842302
@nepaluz: Not quite sure what you're doing with

oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)

You seem to be using a tab as a temporary deliminator. Not sure what pattern you are replacing as there are probably some double quotes being used as escape characters. BUT IT WORKS!!
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 37842844
What that does is replacing delimiters with tabs and removing all double quotes remaining in the string after that (and of-course replace the tabs with commas when returning!).

On a side note, the proof of any pudding is in the eating, that the code suggestion works is the proof that it is correct. Having said that, I'd use the FileIO.TextFieldParser to parse any delimited file.
0
 
LVL 1

Author Closing Comment

by:ncw
ID: 37842909
Thanks very much you've been very helpful!
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 37843032
You are welcome!
Having had a little mull over the code suggestion, I realised that my work-aroundactually renders the Regex redundant! Consider this:
Private Function ParseLine(ByVal oneLine As String) As String()
    oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)
    Return oneLine.Split(vbTab)
End Function

Open in new window

Would not shed any tears for regex though, sure there is adequate of it pre-loaded in the Split method!
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 37843051
Of-course, the above will only hold its own IF there are no fields with embeded commas. (ooops ... may have spoken too soon!)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now