Solved

help splitting text files

Posted on 2007-04-06
10
199 Views
Last Modified: 2010-04-23
Good morning everyone, I am trying to create a small application, (VB.NET 2003) for breaking larger text files into smaller ones.  The files need to be exactly 60,000 rows in length and the last file created amounts to the rest of the data.  For example, the original file has 190,000 rows and is named tempfile.  My desired result is the creation of 4 new files.  The first three have 60,000 rows in each and the last file has the remaining rows of data, (10,000 rows).  Also if possible I would like the files to keep the name of the original file and add a –part to the output files.  In the above example the result was the creation of four new files.  I would like them to be named tempfile-part1, tempfile-part2, tempfile-part3, tempfile-part4.  Another thing is that I never know how many rows are going to be in the files and I know they have been as large as 3 million rows.

Please see the commented sections of my “attempted” source code below.

Thank you very much for taking the time to read this post and for any technical advice you can help me with,  
Gr8life

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim inFile As String
        Dim outFile As String
        Dim openFileDialog1 As New OpenFileDialog

        openFileDialog1.InitialDirectory = "c:\"
        openFileDialog1.Filter = "txt files (*.txt|*.txt|All files(*.*)|*.*"
        openFileDialog1.FilterIndex = 2
        openFileDialog1.RestoreDirectory = True

        If openFileDialog1.ShowDialog() = DialogResult.OK Then
            inFile = openFileDialog1.FileName
            outFile = Mid(inFile, 1, inFile.LastIndexOf("."))

            'Need help creating the process of naming the files the OriginalFileName-part1, OriginalFileName-part2, etc
            outFile += "-part1.txt"

        End If

        Button1.Enabled = False
        Dim startDT As DateTime = DateTime.Now

        If (inFile.Length > 0) Then
            If (outFile.Length > 0) Then
                ConvertFiles(inFile, outFile)
                Dim stopDT As DateTime = DateTime.Now
                Dim elapsedTS As TimeSpan = stopDT.Subtract(startDT)
                Dim msg As String = elapsedTS.Hours & "h" & elapsedTS.Minutes & "m" & elapsedTS.Seconds & "s"
                MessageBox.Show("File Complete! " & ControlChars.CrLf & ControlChars.CrLf & "Process Duration:" & ControlChars.CrLf & msg)
                Button1.Enabled = True
            Else
                MsgBox("No output file specified!")
            End If
        Else
            MsgBox("No input file specified!")
        End If
    End Sub

    Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)
        Try
            If (IO.File.Exists(filein)) Then
                Dim sin As New IO.StreamReader(filein)
                Dim sout As New IO.StreamWriter(fileout, False)
                sout.AutoFlush = False ' output file is NOT updated after every WriteLine() call
                Dim lineCounter As Integer
                Dim items() As String
                Dim readline As String = sin.ReadLine
                While Not IsNothing(readline)
                    items = readline.Split(vbTab)
                    sout.WriteLine(String.Join(vbTab, items))
                    lineCounter = lineCounter + 1
                    If lineCounter Mod 60000 = 0 Then
                        'sout.write I am having trouble creating the files based on the row count
                    Else
                        'sout.write the remaining data to the last file
                    End If
                    readline = sin.ReadLine
                End While
                sin.Close()
                sout.Flush()
                sout.Close()
            End If
        Catch ex As Exception
            MsgBox("Problem Occurred" & ex.Message)
        End Try
    End Sub

0
Comment
Question by:gr8life
  • 4
  • 3
  • 3
10 Comments
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 18864779
The file you want to break up into smaller files, is each line in the file the same length or is each line in the file differ in length?
0
 

Author Comment

by:gr8life
ID: 18864882
Hi Fernando,
All the lines are the same length.

Thanks,
gr8life
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 18864944
Hi gr8life;

Seeming each line is the same length how many characters are on one line?

Fernando
0
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

 

Author Comment

by:gr8life
ID: 18865539
When I stated the lines are the same length, I should have stated the lines have the same number of fields, but the fields have different lengths.  Sorry I wasn’t clear in the earlier post.
Gr8life
0
 
LVL 27

Expert Comment

by:VBRocks
ID: 18867762
What I would recommend would be the following:

'In your ConvertFiles sub:
             
               Dim i as Integer = 1
               Dim sout As New IO.StreamWriter("c:\temp\"tempfile-part" & i & ".txt", False)  'where c:\temp is your path  (you may need to use i.ToString())

               'Further down
                readline = sin.ReadLine
                While sr.EndOfStream = False
                    items = readline.Split(vbTab)    
                            'why are you splitting the line and then joining it again?
                            'Just use your string:
                            'sout.WriteLine(readline)
                    sout.WriteLine(String.Join(vbTab, items))
                    lineCounter = lineCounter + 1
                    If lineCounter Mod 60000 = 0 Then
                        'sout.write I am having trouble creating the files based on the row count
                        sout.Flush()
                        sout.Close()
 
                       'Create a new instance of your file
                        i = i + 1
                        sout = New IO.StreamWriter("c:\temp\"tempfile-part" & i & ".txt", False)

                    End If

                    readline = sin.ReadLine

                End While

                'Leave these at the bottom to close your last file.
                sin.Close()
                sout.Flush()
                sout.Close()

Let me know if you have any questions!

0
 
LVL 27

Expert Comment

by:VBRocks
ID: 18867769
I just noticed a mistake I made:

Dim sout As New IO.StreamWriter("c:\temp\tempfile-part" & i & ".txt", False)

0
 
LVL 27

Expert Comment

by:VBRocks
ID: 18867780
One more mistake (it's the end of the day :)

While sin.EndOfStream = False

0
 
LVL 62

Accepted Solution

by:
Fernando Soto earned 500 total points
ID: 18867828
Hi gr8life;

This should work. Let me know if you need any help.

    ' Class level variables to make them avaiable to the other functions
    ' in the class.
    Dim filePart As Integer = 1     ' Used to give outFile a unique name
    Dim outFileTemplate As String   ' The part of the outFile which is constant

    Private Sub Button1_Click(ByVal sender As System.Object, _
        ByVal e As System.EventArgs) Handles Button1.Click

        Dim inFile As String
        Dim outFile As String
        Dim openFileDialog1 As New OpenFileDialog

        openFileDialog1.InitialDirectory = "c:\"
        openFileDialog1.Filter = "txt files (*.txt|*.txt|All files(*.*)|*.*"
        openFileDialog1.FilterIndex = 2
        openFileDialog1.RestoreDirectory = True

        If openFileDialog1.ShowDialog() = DialogResult.OK Then
            inFile = openFileDialog1.FileName
            outFileTemplate = Path.GetDirectoryName(inFile) & "\" & _
                Path.GetFileNameWithoutExtension(inFile) & "-Part"

            ' Need help creating the process of naming the files the
            ' OriginalFileName-part1, OriginalFileName-part2, etc
            outFile = outFileTemplate & filePart.ToString & ".txt"
            filePart += 1   ' Get ready for the next file if needed

        End If

        Button1.Enabled = False
        Dim startDT As DateTime = DateTime.Now

        If (inFile.Length > 0) Then
            If (outFile.Length > 0) Then
                ConvertFiles(inFile, outFile)
                Dim stopDT As DateTime = DateTime.Now
                Dim elapsedTS As TimeSpan = stopDT.Subtract(startDT)
                Dim msg As String = elapsedTS.Hours & "h" & elapsedTS.Minutes & "m" & elapsedTS.Seconds & "s"
                MessageBox.Show("File Complete! " & ControlChars.CrLf & ControlChars.CrLf & "Process Duration:" & ControlChars.CrLf & msg)
                Button1.Enabled = True
            Else
                MsgBox("No output file specified!")
            End If
        Else
            MsgBox("No input file specified!")
        End If
    End Sub

   Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)

        Try
            If (IO.File.Exists(filein)) Then
                Dim sin As New IO.StreamReader(filein)
                Dim sout As New IO.StreamWriter(fileout, False)
                sout.AutoFlush = False ' output file is NOT updated after every WriteLine() call
                Dim lineCounter As Integer
                ' Dim items() As String ' Not needed

                ' If you use a String object here with the number of times you will
                ' be changing it, once per line, you will make the process even slower
                ' this is because String objects are immutable and therefore a new string
                ' obect needs to be created each time and garbage collection will need to
                ' run more offten which is a heavy cost no processing. Using StringBuilder
                ' will make it more efficient.
                Dim readline As New StringBuilder
                While sin.Peek >= 0
                    ' Clear StringBuilder for the next line
                    readline.Remove(0, readline.Length)
                    readline.Append(sin.ReadLine())
                    sout.WriteLine(readline.ToString())
                    lineCounter = lineCounter + 1
                    If lineCounter Mod 60000 = 0 Then
                        sout.Flush()
                        sout.Close()
                        ' Check to see if there is any more data. If so then
                        ' create a new file otherwise do not.
                        If sin.Peek >= 0 Then
                            sout = New IO.StreamWriter(outFileTemplate & _
                                filePart.ToString & ".txt")
                            filePart += 1
                            lineCounter = 0
                        Else
                            sout = Nothing
                            Exit While
                        End If
                    End If
                End While
                sin.Close()
                If Not sout Is Nothing Then
                    sout.Flush()
                    sout.Close()
                End If
            End If
        Catch ex As Exception
            MsgBox("Problem Occurred" & ex.Message)
        End Try

    End Sub

Fernando
0
 

Author Comment

by:gr8life
ID: 18884401
Thank you for the excellent solution.
gr8life
0
 
LVL 62

Expert Comment

by:Fernando Soto
ID: 18885167
No problem always glad to help out. ;=)
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The ECB site provides FX rates for major currencies since its inception in 1999 in the form of an XML feed. The files have the following format (reducted for brevity) (CODE) There are three files available HERE (http://www.ecb.europa.eu/stats/exch…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
This tutorial gives a high-level tour of the interface of Marketo (a marketing automation tool to help businesses track and engage prospective customers and drive them to purchase). You will see the main areas including Marketing Activities, Design …
In this video I am going to show you how to back up and restore Office 365 mailboxes using CodeTwo Backup for Office 365. Learn more about the tool used in this video here: http://www.codetwo.com/backup-for-office-365/ (http://www.codetwo.com/ba…

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now