Solved

Looking for sample code to combine CSV files and skipping the header row of files 2 through *

Posted on 2011-09-15
8
485 Views
Last Modified: 2012-05-12
Friends,

I'm not sure where to begin.  I need to combine several (large sized > 10 Meg) CSV files.  I want to keep the header for the first, but I want to skip the header row after the first CSV file.  Does anyone have some sample code?  I need a starting place, but I haven't found anything this specific and efficient.

Looking for Syntax code please!

Thanks in advance!
0
Comment
Question by:indy500fan
8 Comments
 
LVL 17

Expert Comment

by:nepaluz
ID: 36543561
you'll need to give us something to work on. But try this first ..
Dim File1 = File.ReadAllLines("YouFirstFile")
Dim File2 = File.ReadAllLines("YouSecondFile")
File.WriteAllLines("NewFile", File1.Union(File2).Distinct())

Open in new window

0
 

Author Comment

by:indy500fan
ID: 36543601
Interesting...

I see what you are doing here, and that is an interesting concept; however, I may have duplicate rows that I will need (other than the header row of the first file), so I'm concerned what that would do.

As a starting point:

  Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary")

            For Each file In filesInDir
                System.IO.File.AppendAllText("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary\common2.csv", System.IO.File.ReadAllText(file))

            Next

This will put all the rows into a single CSV file; however, I need to not include the header.  So, I have considered, using stream reader to read each line of each file and write it to a master file, skipping any row that matches the very first row.  I am concerned that it won't be very efficient to do this, but...



0
 

Author Comment

by:indy500fan
ID: 36543812
This is another option, but i cannot yet get it to loop through each line:


        Try

            Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary")
            Dim LineCounter As Int64 = 0
            Dim FirstHeader As String = Nothing
            Using writer As IO.StreamWriter = New IO.StreamWriter("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\MasterSummary.csv")
                For Each file In filesInDir
                    If file.Contains("summary") Then
                        Dim line As String
                        Using reader As IO.StreamReader = New IO.StreamReader(file)



                            ' Read first line.
                            line = reader.ReadLine
                            Do While (Not line Is Nothing)

                                LineCounter += 1

                                If LineCounter = 1 Then
                                    FirstHeader = line
                                    writer.WriteLine(line)
                                ElseIf FirstHeader <> line Then
                                    writer.WriteLine(line)
                                End If
                            Loop
                        End Using
                    End If

                Next
            End Using
        Catch ex As Exception

        End Try
0
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

 
LVL 13

Accepted Solution

by:
Naman Goel earned 500 total points
ID: 36543839
so here is code for you,just pass the array containing file names as parameter to this method:
private static void CombineCSVFiles(string[] fileNames)
        {
            if (fileNames.Length > 1)
            {
                for (int i = 1; i < fileNames.Length; i++)
                {
                    using (StreamWriter streamWriter = File.AppendText(fileNames[0]))
                    {

                        using (StreamReader streamReader = new StreamReader(fileNames[i]))
                        {
                            while (streamReader.EndOfStream == false)
                            {
                                string line = streamReader.ReadLine();
                                if (line.Trim().Length != 0 && line.StartsWith("#") == false)
                                {
                                    if (line.Contains(","))
                                    {
                                        string[] words = line.Split(new char[] { ',' });
                                        if (words.Length > 0)
                                        {
                                            streamWriter.WriteLine(line);
                                        }
                                    }
                                    else // line is not seperated with comma
                                    {
                                        streamWriter.WriteLine(line);
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }

Open in new window

0
 

Author Comment

by:indy500fan
ID: 36543888
This worked and is really fast!

        Try

            Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Detail")
            Dim LineCounter As Int64 = 0
            Dim FirstHeader As String = Nothing
            Using writer As IO.StreamWriter = New IO.StreamWriter("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\MasterDetail.csv")
                For Each file In filesInDir
                    If file.Contains("detail") Then
                        Dim line As String
                        Using reader As IO.StreamReader = New IO.StreamReader(file)
                            ' Read first line.
                            Do While (reader.Peek() >= 0)

                                line = reader.ReadLine
                                LineCounter += 1

                                If LineCounter = 1 Then
                                    FirstHeader = line
                                    writer.WriteLine(line)
                                ElseIf FirstHeader <> line Then
                                    writer.WriteLine(line)
                                End If
                            Loop
                        End Using
                    End If

                Next
            End Using
        Catch ex As Exception

        End Try
0
 
LVL 13

Expert Comment

by:Naman Goel
ID: 36544052
cool, the other method is whatever I had written and this is optimized too...
0
 

Author Comment

by:indy500fan
ID: 36544355
naman,

I hadn't seen your post before I added mine.  I will try it.  Thanks!
0
 

Expert Comment

by:haggard
ID: 36545009
Is this what you are looking for?

Assume that there are two source csv files, 1.csv and 2.csv.
1.csv looks like this:
Col1,Col2,Col3
11,12,13
21,22,23
31,32,33

2.csv looks like this:
C1,C2,C3
aa,ab,ac
ba,bb,bc
ca,cb,cc

After the sample code runs a third file, 3.csv is created. It looks like this:
Col1,Col2,Col3
11,12,13
21,22,23
31,32,33
aa,ab,ac
ba,bb,bc
ca,cb,cc

Here's the sample code that produced the above result.

Module Module1

    Sub Main()
        ' Hardcoding the names of the two input and one output file
        Dim strInput1 As String = "c:\1.csv"
        Dim strInput2 As String = "c:\2.csv"
        Dim strOutput As String = "c:\3.csv"

        ' Create a file writer which will be used to
        ' write text to the output file.
        ' This code does not explicitly define how it will handle an
        ' existing output file or a lot of other things it needs to.
        Dim objWriter As New System.IO.StreamWriter(strOutput, False)

        ' Open the first input file and write its entire
        ' contents to the output file one line at a time.
        Dim objReader As New System.IO.StreamReader(strInput1)
        Dim strLine As String = objReader.ReadLine
        While (String.IsNullOrEmpty(strLine) <> True)
            objWriter.WriteLine(strLine)

            ' Read the next line, if any.
            strLine = objReader.ReadLine
        End While

        ' OK, now the entire contents of the first file have
        ' been written to the destination file.
        ' Close the first input file.
        objReader.Close()

        'Open the second input file.
        objReader = New System.IO.StreamReader(strInput2)

        'Read and discard the first line.
        strLine = objReader.ReadLine()

        ' Read the second line and start looping on
        ' transferring the contents of the source file to dest file.
        strLine = objReader.ReadLine()
        While (String.IsNullOrEmpty(strLine) <> True)
            objWriter.WriteLine(strLine)

            ' Read the next line, if any.
            strLine = objReader.ReadLine
        End While

        ' Close all open files.
        objReader.Close()
        objWriter.Close()

    End Sub

End Module
0

Featured Post

Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Welcome my friends to the second instalment and follow-up to our Minify and Concatenate Your Scripts and Stylesheets (http://www.experts-exchange.com/Programming/Languages/.NET/ASP.NET/A_4334-Minify-and-Concatenate-Your-Scripts-and-Stylesheets.html)…
In my previous two articles we discussed Binary Serialization (http://www.experts-exchange.com/A_4362.html) and XML Serialization (http://www.experts-exchange.com/A_4425.html). In this article we will try to know more about SOAP (Simple Object Acces…

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question