Link to home
Start Free TrialLog in
Avatar of endrec
endrec

asked on

Parsing txt/cvs values with a comma delimiter, but exclude items in quotes like "Company, Inc."

I am having an issue with parsing a csv file with a comma delimiter.  There are fields like “Name, Inc.” (they have the quotes in the actual file) which I do not want broken up with the line/file is parsed.  How would you do that?

The streamreader is all set up, here is my split statement.


                Dim strDelimiter As String = ","
                Dim splitout = Split(strLine, strDelimiter)


"Company, Inc.",value2,value3,"value,should,bevalid"

The array values I want would be:
"Company, Inc"
value2
value3"value,should,bevalid"

The array values I current get are:
"Company,
Inc"value2
value3
"value
should
bevalid"
Avatar of Bob Learned
Bob Learned
Flag of United States of America image

What version of .NET do you have?

Bob
Avatar of ZeonFlash
ZeonFlash

Given your example input string, the following code works....but it's probably not the best to use on a large csv file.

        Dim indxStart, indxEnd As Integer
        Dim strTemp As String
        Dim strReplace As String = "**"
        indxStart = strLine.IndexOf("""")
        indxEnd = strLine.IndexOf("""", indxStart + 1)

        While indxEnd > 0
            strTemp = strLine.Substring(indxStart + 1, indxEnd - indxStart - 1).Replace(",", strReplace)
            strLine = strLine.Remove(indxStart + 1, indxEnd - indxStart - 1)
            strLine = strLine.Insert(indxStart + 1, strTemp)

            indxStart = strLine.IndexOf("""", indxEnd + strReplace.Length)
            indxEnd = strLine.IndexOf("""", indxStart + 1)
        End While

        Dim strDelimiter As String = ","
        Dim splitout As Array = Split(strLine, strDelimiter)

        For i As Integer = 0 To splitout.Length - 1
            splitout(i) = CStr(splitout(i)).Replace(strReplace, ",")
        Next
If you control the input file (i.e. you are generating it from somewhere else), the easiest method is to use something other than a comma as the delimiter when you output the file. | is a good choice.

Then, your split becomes simple again.

But Bob probably knows of some cool 2.0 function that handles just this case for you....and that I've overlooked.
Avatar of endrec

ASKER

I have both versions 1.0 (1.1) and 2.0 of .NET

I do not have control over the input file, it must use commas to seperate the fields coming into the sub/function.

Fields can have values like "this with spaces and periods and other commas,,, and this should still only count as one field/part of the array since it is inside of quotes."
Avatar of endrec

ASKER

None of these solutions are working.
Avatar of endrec

ASKER

I increased the point value.
In what way are "none of these solutions" not working?  What does each of them produce and how does it differ from what you want?

Unless/until we know that, we're unlikely to be able to help further.

Roger
Avatar of endrec

ASKER

This is what the actual input line is:
Soft Agreement 1-1-01,,,License For Software,Outbound,Ok,Ok – 100%,,1-2-01,, Four,Six, Donaldw, Software, Company Inc.,1,Contact Ray,No,68 Item T+30,5300

The function seems to ignore blank values (not something I would like) and seperate some commas into their own index in the array.

This is the output of the array that the function creates from the line above.

Soft Agreement 1-1-01
,
License For Software
Ok
Ok – 100%
1-2-01
Four
Six
Donaldw
Software
Company Inc.
1
Contact Ray
No
68 Item T+30
5300


The function i'm using:

    Private Function ParseQuotedCSV2(ByVal strInput As String) As ArrayList
        Dim a As New ArrayList
        Dim s As String() = strInput.Split(",")
        Dim i As Integer
        For i = s.Length - 1 To 1 Step -1
            If s(i).TrimEnd.LastIndexOf(Chr(34)) = s(i).TrimEnd.Length - 1 Then
                If s(i).TrimStart.IndexOf(Chr(34)) <> 0 Then
                    s(i - 1) = s(i - 1) & "," & s(i)
                    s(i) = "|||" 'This is an 'ignore' marker which needs to be set to something that would never appear in the input string
                End If
            End If
        Next
        For i = 0 To s.Length - 1
            If s(i) <> "|||" Then 'See comment above
                If s(i).TrimStart.IndexOf(Chr(34)) = 0 Then
                    s(i) = s(i).Remove(s(i).IndexOf(Chr(34)), 1)
                    s(i) = s(i).Remove(s(i).LastIndexOf(Chr(34)), 1)
                End If
                a.Add(s(i).ToString)
            End If
        Next
        Return a
End Function
ASKER CERTIFIED SOLUTION
Avatar of Sancler
Sancler

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial