Solved

Looking for sample code to combine CSV files and skipping the header row of files 2 through *

Posted on 2011-09-15
8
484 Views
Last Modified: 2012-05-12
Friends,

I'm not sure where to begin.  I need to combine several (large sized > 10 Meg) CSV files.  I want to keep the header for the first, but I want to skip the header row after the first CSV file.  Does anyone have some sample code?  I need a starting place, but I haven't found anything this specific and efficient.

Looking for Syntax code please!

Thanks in advance!
0
Comment
Question by:indy500fan
8 Comments
 
LVL 17

Expert Comment

by:nepaluz
ID: 36543561
you'll need to give us something to work on. But try this first ..
Dim File1 = File.ReadAllLines("YouFirstFile")
Dim File2 = File.ReadAllLines("YouSecondFile")
File.WriteAllLines("NewFile", File1.Union(File2).Distinct())

Open in new window

0
 

Author Comment

by:indy500fan
ID: 36543601
Interesting...

I see what you are doing here, and that is an interesting concept; however, I may have duplicate rows that I will need (other than the header row of the first file), so I'm concerned what that would do.

As a starting point:

  Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary")

            For Each file In filesInDir
                System.IO.File.AppendAllText("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary\common2.csv", System.IO.File.ReadAllText(file))

            Next

This will put all the rows into a single CSV file; however, I need to not include the header.  So, I have considered, using stream reader to read each line of each file and write it to a master file, skipping any row that matches the very first row.  I am concerned that it won't be very efficient to do this, but...



0
 

Author Comment

by:indy500fan
ID: 36543812
This is another option, but i cannot yet get it to loop through each line:


        Try

            Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary")
            Dim LineCounter As Int64 = 0
            Dim FirstHeader As String = Nothing
            Using writer As IO.StreamWriter = New IO.StreamWriter("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\MasterSummary.csv")
                For Each file In filesInDir
                    If file.Contains("summary") Then
                        Dim line As String
                        Using reader As IO.StreamReader = New IO.StreamReader(file)



                            ' Read first line.
                            line = reader.ReadLine
                            Do While (Not line Is Nothing)

                                LineCounter += 1

                                If LineCounter = 1 Then
                                    FirstHeader = line
                                    writer.WriteLine(line)
                                ElseIf FirstHeader <> line Then
                                    writer.WriteLine(line)
                                End If
                            Loop
                        End Using
                    End If

                Next
            End Using
        Catch ex As Exception

        End Try
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 13

Accepted Solution

by:
Naman Goel earned 500 total points
ID: 36543839
so here is code for you,just pass the array containing file names as parameter to this method:
private static void CombineCSVFiles(string[] fileNames)
        {
            if (fileNames.Length > 1)
            {
                for (int i = 1; i < fileNames.Length; i++)
                {
                    using (StreamWriter streamWriter = File.AppendText(fileNames[0]))
                    {

                        using (StreamReader streamReader = new StreamReader(fileNames[i]))
                        {
                            while (streamReader.EndOfStream == false)
                            {
                                string line = streamReader.ReadLine();
                                if (line.Trim().Length != 0 && line.StartsWith("#") == false)
                                {
                                    if (line.Contains(","))
                                    {
                                        string[] words = line.Split(new char[] { ',' });
                                        if (words.Length > 0)
                                        {
                                            streamWriter.WriteLine(line);
                                        }
                                    }
                                    else // line is not seperated with comma
                                    {
                                        streamWriter.WriteLine(line);
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }

Open in new window

0
 

Author Comment

by:indy500fan
ID: 36543888
This worked and is really fast!

        Try

            Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Detail")
            Dim LineCounter As Int64 = 0
            Dim FirstHeader As String = Nothing
            Using writer As IO.StreamWriter = New IO.StreamWriter("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\MasterDetail.csv")
                For Each file In filesInDir
                    If file.Contains("detail") Then
                        Dim line As String
                        Using reader As IO.StreamReader = New IO.StreamReader(file)
                            ' Read first line.
                            Do While (reader.Peek() >= 0)

                                line = reader.ReadLine
                                LineCounter += 1

                                If LineCounter = 1 Then
                                    FirstHeader = line
                                    writer.WriteLine(line)
                                ElseIf FirstHeader <> line Then
                                    writer.WriteLine(line)
                                End If
                            Loop
                        End Using
                    End If

                Next
            End Using
        Catch ex As Exception

        End Try
0
 
LVL 13

Expert Comment

by:Naman Goel
ID: 36544052
cool, the other method is whatever I had written and this is optimized too...
0
 

Author Comment

by:indy500fan
ID: 36544355
naman,

I hadn't seen your post before I added mine.  I will try it.  Thanks!
0
 

Expert Comment

by:haggard
ID: 36545009
Is this what you are looking for?

Assume that there are two source csv files, 1.csv and 2.csv.
1.csv looks like this:
Col1,Col2,Col3
11,12,13
21,22,23
31,32,33

2.csv looks like this:
C1,C2,C3
aa,ab,ac
ba,bb,bc
ca,cb,cc

After the sample code runs a third file, 3.csv is created. It looks like this:
Col1,Col2,Col3
11,12,13
21,22,23
31,32,33
aa,ab,ac
ba,bb,bc
ca,cb,cc

Here's the sample code that produced the above result.

Module Module1

    Sub Main()
        ' Hardcoding the names of the two input and one output file
        Dim strInput1 As String = "c:\1.csv"
        Dim strInput2 As String = "c:\2.csv"
        Dim strOutput As String = "c:\3.csv"

        ' Create a file writer which will be used to
        ' write text to the output file.
        ' This code does not explicitly define how it will handle an
        ' existing output file or a lot of other things it needs to.
        Dim objWriter As New System.IO.StreamWriter(strOutput, False)

        ' Open the first input file and write its entire
        ' contents to the output file one line at a time.
        Dim objReader As New System.IO.StreamReader(strInput1)
        Dim strLine As String = objReader.ReadLine
        While (String.IsNullOrEmpty(strLine) <> True)
            objWriter.WriteLine(strLine)

            ' Read the next line, if any.
            strLine = objReader.ReadLine
        End While

        ' OK, now the entire contents of the first file have
        ' been written to the destination file.
        ' Close the first input file.
        objReader.Close()

        'Open the second input file.
        objReader = New System.IO.StreamReader(strInput2)

        'Read and discard the first line.
        strLine = objReader.ReadLine()

        ' Read the second line and start looping on
        ' transferring the contents of the source file to dest file.
        strLine = objReader.ReadLine()
        While (String.IsNullOrEmpty(strLine) <> True)
            objWriter.WriteLine(strLine)

            ' Read the next line, if any.
            strLine = objReader.ReadLine
        End While

        ' Close all open files.
        objReader.Close()
        objWriter.Close()

    End Sub

End Module
0

Featured Post

Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In my previous article (http://www.experts-exchange.com/Programming/Languages/.NET/.NET_Framework_3.x/A_4362-Serialization-in-NET-1.html) we saw the basics of serialization and how types/objects can be serialized to Binary format. In this blog we wi…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

808 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question