Avatar of endrec
endrec
 asked on

How To Find and Replace In Specific Column of CSV File

I wanted to know if someone has ever had to find and replace on a specific column of a .CSV and/or could help me find a way to either modify the code below or use completely new logic to perform a find and replace on individual columns of an entire file.  One of the issues is that none, one, many, or all of the field in the input string could be surrounded by quotes and it's still supposed to be considered valid.

In the function below strInput in the output of a streamreader, e.g.:
           Dim sr2 As New StreamReader(strFileName)
            Dim InOutput As String = sr2.ReadToEnd()
            sr2.Close()
            sr2.Dispose()


So to call the function it would be something like...fncReplaceInSpecificColumn(InOutput, "Company 2345", "Company Y", 4)


    Private Function fncReplaceInSpecificColumn(ByRef strInput As String, ByVal strFind As String, ByVal strReplace As String, ByVal intColumn As Integer) As String()
        Dim r As New Regex(ControlChars.CrLf)
        Dim indivlines() As String = r.Split(strInput)

        'Cycle through lines/records
        For i As Integer = 0 To indivlines.Length - 1
            'Split each line into individual fields by comma or quote enclosed comma delimiter
            Dim splitout() As String = ParseQuotedCSV3(indivlines(i))

            If indivlines(i) <> "" And Len(intColumn) > 0 Then

                If intColumn - 1 > UBound(splitout) Then
                    MsgBox("You cannot specify a column for find and replace that is greater than the number of columns in the input file.")
                    Exit Function
                End If

                splitout(intColumn - 1) = Replace(splitout(intColumn - 1), strFind, strReplace)

                '***IS THERE A WAY TO RECONSTRUCT THE .CSV IN IT'S ORIGINAL FORMAT (WHICH COULD BE WITH OR WITHOUT QUOTES SURROUNDING ONE OR MORE OF THE FIELDS)?

            End If
        Next
    End Function


The following function is one I used from Experts Exchange to split an incoming CSV up into fields regardless of if the fields were commas delimited or surrounded in quotes (part of the problem).  So it deconstructs the CSV instead of reconstructing it in it's original format.  I think I need a way to reconstruct it after finding and replacing on a specific column.

The .CSV could come in with variations like:
Field1,2,3,"4","5",6
"1","2","3","4,,,,","5","6"
or
1,2,3,4,5,6


    Private Function ParseQuotedCSV3(ByVal strInput As String) As String()
        If strInput <> "" Then
            Dim a As New ArrayList
            Dim s1 As String = strInput.Replace(Chr(4), "") '<<< NEW LINE
            s1 = s1.Replace(Chr(23), "")
            s1 = s1.Replace(Chr(24), "")
            s1 = s1.Replace(Chr(25), "")
            Dim s As String() = s1.Split(",") '<<< ALTERED LINE

            Dim i As Integer
            For i = s.Length - 1 To 1 Step -1
                If s(i).TrimEnd.Length > 0 Then
                    If s(i).TrimEnd.LastIndexOf(Chr(34)) = s(i).TrimEnd.Length - 1 Then
                        If s(i).TrimStart.IndexOf(Chr(34)) <> 0 Then
                            s(i - 1) = s(i - 1) & "," & s(i)
                            s(i) = "|||" 'This is an 'ignore' marker which needs to be set to something that would never appear in the input string
                        End If
                    End If
                End If
            Next
            For i = 0 To s.Length - 1
                If s(i) <> "|||" Then 'See comment above
                    If s(i).TrimStart.IndexOf(Chr(34)) = 0 Then
                        s(i) = s(i).Remove(s(i).IndexOf(Chr(34)), 1)
                        s(i) = s(i).Remove(s(i).LastIndexOf(Chr(34)), 1)
                    End If
                    a.Add(s(i).ToString)
                End If
            Next
            Dim ss(a.Count - 1) As String
            a.CopyTo(ss)
            Return ss

        End If

    End Function
Visual Basic.NET

Avatar of undefined
Last Comment
Sancler

8/22/2022 - Mon
Sancler

Do ANY of the fields in the .CSV file contain commas WITHIN them?  

This is quite often the case when some fields are enclosed in quotes, but the examples you give don't expressly suggest that that is the case here and, if it's not, you could simply identify the relevant field by counting the commas, do the necessary replace at that point, and write out the replacement file with those replacements.  

But, if there are commas inside quoted fields, it becomes a bit more difficult.

Roger
endrec

ASKER
Yes, some of the fields contain commas within them, if that is the case they are surrounded in quotes.  ParseQuotedCSV3 allows fields from the CSV to be read correctly into an array instead of just splitting on a comma.  So intead of reading "2,3,4,5" as "2 then 3 then 4 then 5" it reads it as one field 2,3,4,5.

"1","2","3","4,,,,","5","6"

"4,,,," = 4,,,, as the field value as an example

Sancler

I didn't think it would be that easy ;-(

I've got a couple of thoughts, but no time immediately to test them out.  If no one else has come up with anything by the time I'm more available, I'll post again later.

Roger
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
Sancler

Try this.  It's extensively commented.

    Private Function GetField(ByVal sLine As String, ByVal iField As Integer) As String()

        'NOTE the iField argument assumes that fields are numbered ...
        '... from 1, not from 0.  This was just to tie it in with the ...
        '... count argument in the Regex.Split function.

        'regex pattern to identify commas if, and only if, ...
        '... they are succeeded by an even number of quotes
        Dim spattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
        Dim r As New Regex(spattern)

        'split the input string, just as far as ...
        '... the field we want and one more ...
        '... so we can also return what follows it
        Dim s() = r.Split(sLine, iField + 1)

        'create an output array
        Dim result(2) As String

        'this is the field we want
        result(1) = s(iField - 1)

        'put all the preceding fields back together ...
        '... as one string in the first element
        If iField = 1 Then
            'there are no preceding fields
            result(0) = ""
        Else
            'join them with commas
            result(0) = String.Join(",", s, 0, iField - 1)
        End If

        'put all the succeeding fields in the last element
        If iField = s.Length Then
            'there are no succeeding fields
            result(2) = ""
        Else
            'its the last element (unsplit) of the array
            result(2) = s(s.Length - 1)
        End If

        Return result

    End Function

It won't do the whole job for you, but if you pass it a single line from your CSV file it will return it split into three: everything up to the field/column you want, that field/coumn, and everything after.  All the formatting (quotes, etc) will be preserved.  So you can just do your replace on the middle bit and then reconstitute the line with a .Join (like in the code above).

I'd better warn you that I'm not a Regex expert.  But I picked this pattern up from this thread

https://www.experts-exchange.com/Programming/Programming_Languages/Dot_Net/VB_DOT_NET/Q_21489283.html

and have thrown a number of tests at it and it seems to work.

Roger
endrec

ASKER
I am receiving an error that I hope you could help me fix.

Error      2      Value of type '1-dimensional array of String' cannot be converted to 'String'.      


I changed the name and inputs of the fuction to:
Private Function ReplaceInColumn(ByVal sLine As String, ByVal strFind As String, ByVal strReplace As String, ByVal iField As Integer) As String

I also tried

Private Function ReplaceInColumn(ByVal sLine As String, ByVal strFind As String, ByVal strReplace As String, ByVal iField As Integer) As String()


The only code inside of it that I changed is below...does not cause the error.
        'this is the field we want
        result(1) = s(iField - 1)

        result(1) = Replace(result(1), strFind, strReplace)



            '---FIND AND REPLACE SPECIFIC COLUMN: GROUP 1
            'Replace in one column
            If chkFindReplaceGroup1.Checked = False Then
                'Replace what with what
                InOutput = ReplaceInColumn(InOutput, strFind1, strReplace1, intColumnToReplaceGroup1).ToString

            End If


            '---FIND AND REPLACE SPECIFIC COLUMN: GROUP 2
            'Replace in one column
            If chkFindReplaceGroup2.Checked = False Then
                'Replace what with what
                InOutput = ReplaceInColumn(InOutput, strFind2, strReplace2, intColumnToReplaceGroup2).ToString

            End If



            '---WRITE THE ENTIRE CONTENTS USING THE SAME FILENAME
            Dim sw As New StreamWriter(strFileName, False) 'False = No Dot Append, Overwrite
            sw.Write(InOutput)
            sw.Flush()
            sw.Close()

            '---CLEAR CSV CONTENTS AND VALIDATION AREA, THEN REPOPOULATE CSVCONTENTS
            txtCSVContent.Clear()
            txtValidation.Clear()
            subReadFile(strFileName, "")

How would I convert the output of GetField or ReplaceInColumn to a normal string?
ASKER CERTIFIED SOLUTION
Sancler

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question