Solved

Looking for sample code to combine CSV files and skipping the header row of files 2 through *

Posted on 2011-09-15
8
487 Views
Last Modified: 2012-05-12
Friends,

I'm not sure where to begin.  I need to combine several (large sized > 10 Meg) CSV files.  I want to keep the header for the first, but I want to skip the header row after the first CSV file.  Does anyone have some sample code?  I need a starting place, but I haven't found anything this specific and efficient.

Looking for Syntax code please!

Thanks in advance!
0
Comment
Question by:indy500fan
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 17

Expert Comment

by:nepaluz
ID: 36543561
you'll need to give us something to work on. But try this first ..
Dim File1 = File.ReadAllLines("YouFirstFile")
Dim File2 = File.ReadAllLines("YouSecondFile")
File.WriteAllLines("NewFile", File1.Union(File2).Distinct())

Open in new window

0
 

Author Comment

by:indy500fan
ID: 36543601
Interesting...

I see what you are doing here, and that is an interesting concept; however, I may have duplicate rows that I will need (other than the header row of the first file), so I'm concerned what that would do.

As a starting point:

  Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary")

            For Each file In filesInDir
                System.IO.File.AppendAllText("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary\common2.csv", System.IO.File.ReadAllText(file))

            Next

This will put all the rows into a single CSV file; however, I need to not include the header.  So, I have considered, using stream reader to read each line of each file and write it to a master file, skipping any row that matches the very first row.  I am concerned that it won't be very efficient to do this, but...



0
 

Author Comment

by:indy500fan
ID: 36543812
This is another option, but i cannot yet get it to loop through each line:


        Try

            Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary")
            Dim LineCounter As Int64 = 0
            Dim FirstHeader As String = Nothing
            Using writer As IO.StreamWriter = New IO.StreamWriter("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\MasterSummary.csv")
                For Each file In filesInDir
                    If file.Contains("summary") Then
                        Dim line As String
                        Using reader As IO.StreamReader = New IO.StreamReader(file)



                            ' Read first line.
                            line = reader.ReadLine
                            Do While (Not line Is Nothing)

                                LineCounter += 1

                                If LineCounter = 1 Then
                                    FirstHeader = line
                                    writer.WriteLine(line)
                                ElseIf FirstHeader <> line Then
                                    writer.WriteLine(line)
                                End If
                            Loop
                        End Using
                    End If

                Next
            End Using
        Catch ex As Exception

        End Try
0
SharePoint Admin?

Enable Your Employees To Focus On The Core With Intuitive Onscreen Guidance That is With You At The Moment of Need.

 
LVL 13

Accepted Solution

by:
Naman Goel earned 500 total points
ID: 36543839
so here is code for you,just pass the array containing file names as parameter to this method:
private static void CombineCSVFiles(string[] fileNames)
        {
            if (fileNames.Length > 1)
            {
                for (int i = 1; i < fileNames.Length; i++)
                {
                    using (StreamWriter streamWriter = File.AppendText(fileNames[0]))
                    {

                        using (StreamReader streamReader = new StreamReader(fileNames[i]))
                        {
                            while (streamReader.EndOfStream == false)
                            {
                                string line = streamReader.ReadLine();
                                if (line.Trim().Length != 0 && line.StartsWith("#") == false)
                                {
                                    if (line.Contains(","))
                                    {
                                        string[] words = line.Split(new char[] { ',' });
                                        if (words.Length > 0)
                                        {
                                            streamWriter.WriteLine(line);
                                        }
                                    }
                                    else // line is not seperated with comma
                                    {
                                        streamWriter.WriteLine(line);
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }

Open in new window

0
 

Author Comment

by:indy500fan
ID: 36543888
This worked and is really fast!

        Try

            Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Detail")
            Dim LineCounter As Int64 = 0
            Dim FirstHeader As String = Nothing
            Using writer As IO.StreamWriter = New IO.StreamWriter("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\MasterDetail.csv")
                For Each file In filesInDir
                    If file.Contains("detail") Then
                        Dim line As String
                        Using reader As IO.StreamReader = New IO.StreamReader(file)
                            ' Read first line.
                            Do While (reader.Peek() >= 0)

                                line = reader.ReadLine
                                LineCounter += 1

                                If LineCounter = 1 Then
                                    FirstHeader = line
                                    writer.WriteLine(line)
                                ElseIf FirstHeader <> line Then
                                    writer.WriteLine(line)
                                End If
                            Loop
                        End Using
                    End If

                Next
            End Using
        Catch ex As Exception

        End Try
0
 
LVL 13

Expert Comment

by:Naman Goel
ID: 36544052
cool, the other method is whatever I had written and this is optimized too...
0
 

Author Comment

by:indy500fan
ID: 36544355
naman,

I hadn't seen your post before I added mine.  I will try it.  Thanks!
0
 

Expert Comment

by:haggard
ID: 36545009
Is this what you are looking for?

Assume that there are two source csv files, 1.csv and 2.csv.
1.csv looks like this:
Col1,Col2,Col3
11,12,13
21,22,23
31,32,33

2.csv looks like this:
C1,C2,C3
aa,ab,ac
ba,bb,bc
ca,cb,cc

After the sample code runs a third file, 3.csv is created. It looks like this:
Col1,Col2,Col3
11,12,13
21,22,23
31,32,33
aa,ab,ac
ba,bb,bc
ca,cb,cc

Here's the sample code that produced the above result.

Module Module1

    Sub Main()
        ' Hardcoding the names of the two input and one output file
        Dim strInput1 As String = "c:\1.csv"
        Dim strInput2 As String = "c:\2.csv"
        Dim strOutput As String = "c:\3.csv"

        ' Create a file writer which will be used to
        ' write text to the output file.
        ' This code does not explicitly define how it will handle an
        ' existing output file or a lot of other things it needs to.
        Dim objWriter As New System.IO.StreamWriter(strOutput, False)

        ' Open the first input file and write its entire
        ' contents to the output file one line at a time.
        Dim objReader As New System.IO.StreamReader(strInput1)
        Dim strLine As String = objReader.ReadLine
        While (String.IsNullOrEmpty(strLine) <> True)
            objWriter.WriteLine(strLine)

            ' Read the next line, if any.
            strLine = objReader.ReadLine
        End While

        ' OK, now the entire contents of the first file have
        ' been written to the destination file.
        ' Close the first input file.
        objReader.Close()

        'Open the second input file.
        objReader = New System.IO.StreamReader(strInput2)

        'Read and discard the first line.
        strLine = objReader.ReadLine()

        ' Read the second line and start looping on
        ' transferring the contents of the source file to dest file.
        strLine = objReader.ReadLine()
        While (String.IsNullOrEmpty(strLine) <> True)
            objWriter.WriteLine(strLine)

            ' Read the next line, if any.
            strLine = objReader.ReadLine
        End While

        ' Close all open files.
        objReader.Close()
        objWriter.Close()

    End Sub

End Module
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In my previous article (http://www.experts-exchange.com/Programming/Languages/.NET/.NET_Framework_3.x/A_4362-Serialization-in-NET-1.html) we saw the basics of serialization and how types/objects can be serialized to Binary format. In this blog we wi…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Are you ready to implement Active Directory best practices without reading 300+ pages? You're in luck. In this webinar hosted by Skyport Systems, you gain insight into Microsoft's latest comprehensive guide, with tips on the best and easiest way…
Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question