dsulli2000
asked on
Strip surrounding double quotes from array of strings from split using regular expression
Hello-
I am using the following function to parse a CSV file into an array of strings. The function works fine, but all of the values in the resulting array of strings are surrounded by double quotes. I wanted to find an efficient way to strip the surrounding quotes from string values in the resultant array. Im pretty sure I can do this by modifying the regular expression, but not positive.
Private Shared Function ParseLine(ByVal oneLine As String) As String()
Dim r As System.Text.RegularExpress ions.Regex = New System.Text.RegularExpress ions.Regex (",(?=(?:[ ^""]*""[^" "]*"")*(?! [^""]*"")) ")
Return r.Split(oneLine)
End Function
Right now, I am using the following code on the resultant array, which takes a long time and is obviously very inefficient:
Dim i As Integer
For i = 0 To parsedString.Length - 1
dataStore(recordCount - 1, i) = parsedString(i).Substring( 1, parsedString(i).Length - 2)
Next
Thanks in advance. Cheers to whoever can do this :-)
Dan
I am using the following function to parse a CSV file into an array of strings. The function works fine, but all of the values in the resulting array of strings are surrounded by double quotes. I wanted to find an efficient way to strip the surrounding quotes from string values in the resultant array. Im pretty sure I can do this by modifying the regular expression, but not positive.
Private Shared Function ParseLine(ByVal oneLine As String) As String()
Dim r As System.Text.RegularExpress
Return r.Split(oneLine)
End Function
Right now, I am using the following code on the resultant array, which takes a long time and is obviously very inefficient:
Dim i As Integer
For i = 0 To parsedString.Length - 1
dataStore(recordCount - 1, i) = parsedString(i).Substring(
Next
Thanks in advance. Cheers to whoever can do this :-)
Dan
ASKER
Technically, the string between the outer double quotes could contain double quotes itself..
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
dsulli2000
Ah! I just saw your response. I think the regex is intended to recognize VB-style strings in the CSV.
All these would contain three items:
a, b, c
"a,a", "b,b", "c,c"
"I said, ""Hello""!", this, that
The first item in the last one would be:
"I said, ""Hello""!"
You'd need to intelligently parse this, changing enclosed doubled-double-quotes to single-double-quotes. Note that this applies only to strings that are enclosed by double-quotes. Strings that are NOT enclosed by double-quotes are "raw", and need no additional processing.
So I have a correct solution (though even slower).
[VB.NET with .NET 1.1, compiled, partially tested]
For i As Integer = 0 To parsedString.Length - 1
dataStore(recordCount - 1, i) = ParseItem(parsedString(i))
Next
Function ParseItem(ByVal text As String) As String
Dim oneDoubleQuote As String = """" ' for readability of code below.
Dim twoDoubleQuotes As String = oneDoubleQuote & oneDoubleQuote
If text.Length > 1 _
AndAlso text.StartsWith(oneDoubleQ uote) _
AndAlso text.EndsWith(oneDoubleQuo te) Then
Return text.Substring(1, text.Length - 2).Replace(twoDoubleQuotes , oneDoubleQuote)
Else
Return text ' unchanged
End If
End Function
Ah! I just saw your response. I think the regex is intended to recognize VB-style strings in the CSV.
All these would contain three items:
a, b, c
"a,a", "b,b", "c,c"
"I said, ""Hello""!", this, that
The first item in the last one would be:
"I said, ""Hello""!"
You'd need to intelligently parse this, changing enclosed doubled-double-quotes to single-double-quotes. Note that this applies only to strings that are enclosed by double-quotes. Strings that are NOT enclosed by double-quotes are "raw", and need no additional processing.
So I have a correct solution (though even slower).
[VB.NET with .NET 1.1, compiled, partially tested]
For i As Integer = 0 To parsedString.Length - 1
dataStore(recordCount - 1, i) = ParseItem(parsedString(i))
Next
Function ParseItem(ByVal text As String) As String
Dim oneDoubleQuote As String = """" ' for readability of code below.
Dim twoDoubleQuotes As String = oneDoubleQuote & oneDoubleQuote
If text.Length > 1 _
AndAlso text.StartsWith(oneDoubleQ
AndAlso text.EndsWith(oneDoubleQuo
Return text.Substring(1, text.Length - 2).Replace(twoDoubleQuotes
Else
Return text ' unchanged
End If
End Function
AW