Solved

Looking for sample code to combine CSV files and skipping the header row of files 2 through *

Posted on 2011-09-15
8
479 Views
Last Modified: 2012-05-12
Friends,

I'm not sure where to begin.  I need to combine several (large sized > 10 Meg) CSV files.  I want to keep the header for the first, but I want to skip the header row after the first CSV file.  Does anyone have some sample code?  I need a starting place, but I haven't found anything this specific and efficient.

Looking for Syntax code please!

Thanks in advance!
0
Comment
Question by:indy500fan
8 Comments
 
LVL 17

Expert Comment

by:nepaluz
ID: 36543561
you'll need to give us something to work on. But try this first ..
Dim File1 = File.ReadAllLines("YouFirstFile")
Dim File2 = File.ReadAllLines("YouSecondFile")
File.WriteAllLines("NewFile", File1.Union(File2).Distinct())

Open in new window

0
 

Author Comment

by:indy500fan
ID: 36543601
Interesting...

I see what you are doing here, and that is an interesting concept; however, I may have duplicate rows that I will need (other than the header row of the first file), so I'm concerned what that would do.

As a starting point:

  Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary")

            For Each file In filesInDir
                System.IO.File.AppendAllText("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary\common2.csv", System.IO.File.ReadAllText(file))

            Next

This will put all the rows into a single CSV file; however, I need to not include the header.  So, I have considered, using stream reader to read each line of each file and write it to a master file, skipping any row that matches the very first row.  I am concerned that it won't be very efficient to do this, but...



0
 

Author Comment

by:indy500fan
ID: 36543812
This is another option, but i cannot yet get it to loop through each line:


        Try

            Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Summary")
            Dim LineCounter As Int64 = 0
            Dim FirstHeader As String = Nothing
            Using writer As IO.StreamWriter = New IO.StreamWriter("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\MasterSummary.csv")
                For Each file In filesInDir
                    If file.Contains("summary") Then
                        Dim line As String
                        Using reader As IO.StreamReader = New IO.StreamReader(file)



                            ' Read first line.
                            line = reader.ReadLine
                            Do While (Not line Is Nothing)

                                LineCounter += 1

                                If LineCounter = 1 Then
                                    FirstHeader = line
                                    writer.WriteLine(line)
                                ElseIf FirstHeader <> line Then
                                    writer.WriteLine(line)
                                End If
                            Loop
                        End Using
                    End If

                Next
            End Using
        Catch ex As Exception

        End Try
0
 
LVL 13

Accepted Solution

by:
Naman Goel earned 500 total points
ID: 36543839
so here is code for you,just pass the array containing file names as parameter to this method:
private static void CombineCSVFiles(string[] fileNames)
        {
            if (fileNames.Length > 1)
            {
                for (int i = 1; i < fileNames.Length; i++)
                {
                    using (StreamWriter streamWriter = File.AppendText(fileNames[0]))
                    {

                        using (StreamReader streamReader = new StreamReader(fileNames[i]))
                        {
                            while (streamReader.EndOfStream == false)
                            {
                                string line = streamReader.ReadLine();
                                if (line.Trim().Length != 0 && line.StartsWith("#") == false)
                                {
                                    if (line.Contains(","))
                                    {
                                        string[] words = line.Split(new char[] { ',' });
                                        if (words.Length > 0)
                                        {
                                            streamWriter.WriteLine(line);
                                        }
                                    }
                                    else // line is not seperated with comma
                                    {
                                        streamWriter.WriteLine(line);
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }

Open in new window

0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:indy500fan
ID: 36543888
This worked and is really fast!

        Try

            Dim filesInDir() As String = System.IO.Directory.GetFiles("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\Detail")
            Dim LineCounter As Int64 = 0
            Dim FirstHeader As String = Nothing
            Using writer As IO.StreamWriter = New IO.StreamWriter("C:\CSV_FILE_HOLDER\CSVInput\CSVInput\MasterDetail.csv")
                For Each file In filesInDir
                    If file.Contains("detail") Then
                        Dim line As String
                        Using reader As IO.StreamReader = New IO.StreamReader(file)
                            ' Read first line.
                            Do While (reader.Peek() >= 0)

                                line = reader.ReadLine
                                LineCounter += 1

                                If LineCounter = 1 Then
                                    FirstHeader = line
                                    writer.WriteLine(line)
                                ElseIf FirstHeader <> line Then
                                    writer.WriteLine(line)
                                End If
                            Loop
                        End Using
                    End If

                Next
            End Using
        Catch ex As Exception

        End Try
0
 
LVL 13

Expert Comment

by:Naman Goel
ID: 36544052
cool, the other method is whatever I had written and this is optimized too...
0
 

Author Comment

by:indy500fan
ID: 36544355
naman,

I hadn't seen your post before I added mine.  I will try it.  Thanks!
0
 

Expert Comment

by:haggard
ID: 36545009
Is this what you are looking for?

Assume that there are two source csv files, 1.csv and 2.csv.
1.csv looks like this:
Col1,Col2,Col3
11,12,13
21,22,23
31,32,33

2.csv looks like this:
C1,C2,C3
aa,ab,ac
ba,bb,bc
ca,cb,cc

After the sample code runs a third file, 3.csv is created. It looks like this:
Col1,Col2,Col3
11,12,13
21,22,23
31,32,33
aa,ab,ac
ba,bb,bc
ca,cb,cc

Here's the sample code that produced the above result.

Module Module1

    Sub Main()
        ' Hardcoding the names of the two input and one output file
        Dim strInput1 As String = "c:\1.csv"
        Dim strInput2 As String = "c:\2.csv"
        Dim strOutput As String = "c:\3.csv"

        ' Create a file writer which will be used to
        ' write text to the output file.
        ' This code does not explicitly define how it will handle an
        ' existing output file or a lot of other things it needs to.
        Dim objWriter As New System.IO.StreamWriter(strOutput, False)

        ' Open the first input file and write its entire
        ' contents to the output file one line at a time.
        Dim objReader As New System.IO.StreamReader(strInput1)
        Dim strLine As String = objReader.ReadLine
        While (String.IsNullOrEmpty(strLine) <> True)
            objWriter.WriteLine(strLine)

            ' Read the next line, if any.
            strLine = objReader.ReadLine
        End While

        ' OK, now the entire contents of the first file have
        ' been written to the destination file.
        ' Close the first input file.
        objReader.Close()

        'Open the second input file.
        objReader = New System.IO.StreamReader(strInput2)

        'Read and discard the first line.
        strLine = objReader.ReadLine()

        ' Read the second line and start looping on
        ' transferring the contents of the source file to dest file.
        strLine = objReader.ReadLine()
        While (String.IsNullOrEmpty(strLine) <> True)
            objWriter.WriteLine(strLine)

            ' Read the next line, if any.
            strLine = objReader.ReadLine
        End While

        ' Close all open files.
        objReader.Close()
        objWriter.Close()

    End Sub

End Module
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Creating an analog clock UserControl seems fairly straight forward.  It is, after all, essentially just a circle with several lines in it!  Two common approaches for rendering an analog clock typically involve either manually calculating points with…
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now