Solved

Parse csv with double quotes in vb.net

Posted on 2012-04-13
13
1,978 Views
Last Modified: 2012-04-13
How can I extend the function below to escape double quotes found in field data, eg:
"some data", "some more data with a size eg 8" where the double quotes need escaping", "some more data"

    
Private Function ParseLine(ByVal oneLine As String) As String()
        ' Returns an array containing the values of the comma-separated fields.
        ' This pattern actually recognizes the correct commas.
        ' The Regex.Split() command later gets text between the commas.
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        Return r.Split(oneLine)
    End Function

Open in new window

0
Comment
Question by:ncw
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
13 Comments
 
LVL 17

Expert Comment

by:nepaluz
ID: 37841813
Change your line 8 above to
Return r.Split(oneLine).Replace("""", String.Empty)

Open in new window

(also see article here.)
0
 
LVL 1

Author Comment

by:ncw
ID: 37841838
Vb IDE says: Replace is not a member of 'System.Array'
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 37841855
Hmmm!
    Private Function ParseLine(ByVal oneLine As String) As String()
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        Dim Result = r.Split(oneLine)
        Array.ForEach(Result, Function(x) x.Replace("""", String.Empty))
        Return Result
    End Function

Open in new window

0
Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

 
LVL 5

Expert Comment

by:hafeezmca
ID: 37841857
Hi, you can do that without a function too.
As your concern is with " just pass it to the replace function like below:

dim str as string
str=yourfield.replace(""","")
0
 
LVL 83

Expert Comment

by:CodeCruiser
ID: 37842065
Try this

Dim strLine as string = "your text here"
strLine = strLine.Replace(chr(34) & "," & chr(34), "|")

return strLine.Split("|")
0
 
LVL 1

Author Comment

by:ncw
ID: 37842096
@nepaluz:

if the line of data is:

""9","955.351.401.51","","LHF DISC 350MM FOR 18" WHEELS CAYE","PORSCHE","75.00","0.0000""

then after Result = r.Split(oneLine)
Result(0) gives:

""9","955.351.401.51","","LHF DISC 350MM FOR 18" WHEELS CAYE""

So the split has already  split in the wrong place as Result(0) should be ""9"" therefore Array.ForEach(Result, Function(x) x.Replace("""", String.Empty)) is too late.
0
 
LVL 17

Accepted Solution

by:
nepaluz earned 500 total points
ID: 37842175
You have some strange file structure.
OK, I take it that the line is enclosed in double quotes, so this will handle that.
    Private Function ParseLine(ByVal oneLine As String) As String()
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)
        Return r.Split(oneLine.Replace(vbTab, ","))
    End Function

Open in new window

0
 
LVL 1

Author Comment

by:ncw
ID: 37842185
0
 
LVL 1

Author Comment

by:ncw
ID: 37842302
@nepaluz: Not quite sure what you're doing with

oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)

You seem to be using a tab as a temporary deliminator. Not sure what pattern you are replacing as there are probably some double quotes being used as escape characters. BUT IT WORKS!!
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 37842844
What that does is replacing delimiters with tabs and removing all double quotes remaining in the string after that (and of-course replace the tabs with commas when returning!).

On a side note, the proof of any pudding is in the eating, that the code suggestion works is the proof that it is correct. Having said that, I'd use the FileIO.TextFieldParser to parse any delimited file.
0
 
LVL 1

Author Closing Comment

by:ncw
ID: 37842909
Thanks very much you've been very helpful!
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 37843032
You are welcome!
Having had a little mull over the code suggestion, I realised that my work-aroundactually renders the Regex redundant! Consider this:
Private Function ParseLine(ByVal oneLine As String) As String()
    oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)
    Return oneLine.Split(vbTab)
End Function

Open in new window

Would not shed any tears for regex though, sure there is adequate of it pre-loaded in the Split method!
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 37843051
Of-course, the above will only hold its own IF there are no fields with embeded commas. (ooops ... may have spoken too soon!)
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

In my previous two articles we discussed Binary Serialization (http://www.experts-exchange.com/A_4362.html) and XML Serialization (http://www.experts-exchange.com/A_4425.html). In this article we will try to know more about SOAP (Simple Object Acces…
For those of you who don't follow the news, or just happen to live under rocks, Microsoft Research released a beta SDK (http://www.microsoft.com/en-us/download/details.aspx?id=27876) for the Xbox 360 Kinect. If you don't know what a Kinect is (http:…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question