Solved

Parse csv with double quotes in vb.net

Posted on 2012-04-13
13
1,827 Views
Last Modified: 2012-04-13
How can I extend the function below to escape double quotes found in field data, eg:
"some data", "some more data with a size eg 8" where the double quotes need escaping", "some more data"

    
Private Function ParseLine(ByVal oneLine As String) As String()
        ' Returns an array containing the values of the comma-separated fields.
        ' This pattern actually recognizes the correct commas.
        ' The Regex.Split() command later gets text between the commas.
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        Return r.Split(oneLine)
    End Function

Open in new window

0
Comment
Question by:ncw
13 Comments
 
LVL 17

Expert Comment

by:nepaluz
Comment Utility
Change your line 8 above to
Return r.Split(oneLine).Replace("""", String.Empty)

Open in new window

(also see article here.)
0
 
LVL 1

Author Comment

by:ncw
Comment Utility
Vb IDE says: Replace is not a member of 'System.Array'
0
 
LVL 17

Expert Comment

by:nepaluz
Comment Utility
Hmmm!
    Private Function ParseLine(ByVal oneLine As String) As String()
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        Dim Result = r.Split(oneLine)
        Array.ForEach(Result, Function(x) x.Replace("""", String.Empty))
        Return Result
    End Function

Open in new window

0
 
LVL 5

Expert Comment

by:hafeezmca
Comment Utility
Hi, you can do that without a function too.
As your concern is with " just pass it to the replace function like below:

dim str as string
str=yourfield.replace(""","")
0
 
LVL 83

Expert Comment

by:CodeCruiser
Comment Utility
Try this

Dim strLine as string = "your text here"
strLine = strLine.Replace(chr(34) & "," & chr(34), "|")

return strLine.Split("|")
0
 
LVL 1

Author Comment

by:ncw
Comment Utility
@nepaluz:

if the line of data is:

""9","955.351.401.51","","LHF DISC 350MM FOR 18" WHEELS CAYE","PORSCHE","75.00","0.0000""

then after Result = r.Split(oneLine)
Result(0) gives:

""9","955.351.401.51","","LHF DISC 350MM FOR 18" WHEELS CAYE""

So the split has already  split in the wrong place as Result(0) should be ""9"" therefore Array.ForEach(Result, Function(x) x.Replace("""", String.Empty)) is too late.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 17

Accepted Solution

by:
nepaluz earned 500 total points
Comment Utility
You have some strange file structure.
OK, I take it that the line is enclosed in double quotes, so this will handle that.
    Private Function ParseLine(ByVal oneLine As String) As String()
        Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern)
        oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)
        Return r.Split(oneLine.Replace(vbTab, ","))
    End Function

Open in new window

0
 
LVL 1

Author Comment

by:ncw
Comment Utility
0
 
LVL 1

Author Comment

by:ncw
Comment Utility
@nepaluz: Not quite sure what you're doing with

oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)

You seem to be using a tab as a temporary deliminator. Not sure what pattern you are replacing as there are probably some double quotes being used as escape characters. BUT IT WORKS!!
0
 
LVL 17

Expert Comment

by:nepaluz
Comment Utility
What that does is replacing delimiters with tabs and removing all double quotes remaining in the string after that (and of-course replace the tabs with commas when returning!).

On a side note, the proof of any pudding is in the eating, that the code suggestion works is the proof that it is correct. Having said that, I'd use the FileIO.TextFieldParser to parse any delimited file.
0
 
LVL 1

Author Closing Comment

by:ncw
Comment Utility
Thanks very much you've been very helpful!
0
 
LVL 17

Expert Comment

by:nepaluz
Comment Utility
You are welcome!
Having had a little mull over the code suggestion, I realised that my work-aroundactually renders the Regex redundant! Consider this:
Private Function ParseLine(ByVal oneLine As String) As String()
    oneLine = oneLine.Replace("""" & "," & """", vbTab).Replace("""", String.Empty)
    Return oneLine.Split(vbTab)
End Function

Open in new window

Would not shed any tears for regex though, sure there is adequate of it pre-loaded in the Split method!
0
 
LVL 17

Expert Comment

by:nepaluz
Comment Utility
Of-course, the above will only hold its own IF there are no fields with embeded commas. (ooops ... may have spoken too soon!)
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

This article describes relatively difficult and non-obvious issues that are likely to arise when creating COM class in Visual Studio and deploying it by professional MSI-authoring tools. It is assumed that the reader is already familiar with the cla…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now