Solved

Strip surrounding double quotes from array of strings from split using regular expression

Posted on 2004-10-12
4
2,341 Views
Last Modified: 2012-06-27
Hello-

I am using the following function to parse a CSV file into an array of strings.  The function works fine, but all of the values in the resulting array of strings are surrounded by double quotes.  I wanted to find an efficient way to strip the surrounding quotes from string values in the resultant array.  Im pretty sure I can do this by modifying the regular expression, but not positive.

    Private Shared Function ParseLine(ByVal oneLine As String) As String()
        Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))")
        Return r.Split(oneLine)
    End Function

Right now, I am using the following code on the resultant array, which takes a long time and is obviously very inefficient:

                Dim i As Integer
                For i = 0 To parsedString.Length - 1
                    dataStore(recordCount - 1, i) = parsedString(i).Substring(1, parsedString(i).Length - 2)
                Next


Thanks in advance.  Cheers to whoever can do this :-)

Dan
0
Comment
Question by:dsulli2000
  • 2
4 Comments
 
LVL 44

Expert Comment

by:Arthur_Wood
ID: 12292716
why don't you use the Replace method, to replace all occurrences of " with a zero-length string?

AW
0
 

Author Comment

by:dsulli2000
ID: 12293042
Technically, the string between the outer double quotes could contain double quotes itself..
0
 
LVL 12

Accepted Solution

by:
farsight earned 250 total points
ID: 12293200
dsulli200:
  Your code assumes that each and every string has enclosing double-quotes.  Is that safe?
  If you're not concerned with your strings _containing_ double-quotes, or if you _know_ that _every_ string has enclosing double-quotes, you can certainly make the regex simpler.  It depends on if you want a solution to a specific simple case, or if you want a solution that will handle (almost) every commonly found CSV file.

VB string is ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
Regular expression is:
  ,(?=(?:[^"]*"[^"]*")*(?![^"]*"))
This part: [^"]*"[^"]*" means:
  [^"]*   any number of characters that are not a double-quote (typically nothing, or whitespace)
  "          the initial double-quote
  [^"]*   any number of characters that are not a double-quote (the content)
  "          the terminating double-quote

Regular Expression Reference
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconRegularExpressionsLanguageElements.asp?frame=true

Grouping Constructs (Especially useful for this complicated regular expression)
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconRegularExpressionsLanguageElements.asp?frame=true

I recommend Expresso ( http://weblogs.asp.net/robsteel/archive/2004/05/06/127415.aspx ) or Regulator ( http://royo.is-a-geek.com/iserializable/regulator/ ) or a similar tool to work with and help understand regular expressions.

Also, check the regular expression library:
http://regexlib.com/

Arthur_Wood:
  That would replace ALL double-quotes, including any that are inside the string, not just the ones that "enclose" the string.

Of course, if it turns out that that's OK, and if one can be sure that no commas are inside the string either, then we could just split on the comma, and ignore all the complexity.
0
 
LVL 12

Expert Comment

by:farsight
ID: 12293465
dsulli2000
  Ah! I just saw your response.  I think the regex is intended to recognize VB-style strings in the CSV.

All these would contain three items:
  a, b, c
  "a,a", "b,b", "c,c"
  "I said, ""Hello""!", this, that
The first item in the last one would be:
  "I said, ""Hello""!"
You'd need to intelligently parse this, changing enclosed doubled-double-quotes to single-double-quotes.  Note that this applies only to strings that are enclosed by double-quotes.  Strings that are NOT enclosed by double-quotes are "raw", and need no additional processing.

So I have a correct solution (though even slower).

[VB.NET with .NET 1.1, compiled, partially tested]

                For i As Integer = 0 To parsedString.Length - 1
                    dataStore(recordCount - 1, i) = ParseItem(parsedString(i))
                Next

    Function ParseItem(ByVal text As String) As String
        Dim oneDoubleQuote As String = """"        ' for readability of code below.
        Dim twoDoubleQuotes As String = oneDoubleQuote & oneDoubleQuote
        If text.Length > 1 _
        AndAlso text.StartsWith(oneDoubleQuote) _
        AndAlso text.EndsWith(oneDoubleQuote) Then
            Return text.Substring(1, text.Length - 2).Replace(twoDoubleQuotes, oneDoubleQuote)
        Else
            Return text        ' unchanged
        End If
    End Function
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Help with error message for ReportViewer in VS2015 4 47
System32Int Error 8 57
Help with AsEnumerable(), LINQ 4 23
VB.NET (2008) - Refactoring Question 2 0
This article explains how to create and use a custom WaterMark textbox class.  The custom WaterMark textbox class allows you to set the WaterMark Background Color and WaterMark text at design time.   IMAGE OF WATERMARKS STEPS Create VB …
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now