gr8life
asked on
help splitting text files
Good morning everyone, I am trying to create a small application, (VB.NET 2003) for breaking larger text files into smaller ones. The files need to be exactly 60,000 rows in length and the last file created amounts to the rest of the data. For example, the original file has 190,000 rows and is named tempfile. My desired result is the creation of 4 new files. The first three have 60,000 rows in each and the last file has the remaining rows of data, (10,000 rows). Also if possible I would like the files to keep the name of the original file and add a –part to the output files. In the above example the result was the creation of four new files. I would like them to be named tempfile-part1, tempfile-part2, tempfile-part3, tempfile-part4. Another thing is that I never know how many rows are going to be in the files and I know they have been as large as 3 million rows.
Please see the commented sections of my “attempted” source code below.
Thank you very much for taking the time to read this post and for any technical advice you can help me with,
Gr8life
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim inFile As String
Dim outFile As String
Dim openFileDialog1 As New OpenFileDialog
openFileDialog1.InitialDir ectory = "c:\"
openFileDialog1.Filter = "txt files (*.txt|*.txt|All files(*.*)|*.*"
openFileDialog1.FilterInde x = 2
openFileDialog1.RestoreDir ectory = True
If openFileDialog1.ShowDialog () = DialogResult.OK Then
inFile = openFileDialog1.FileName
outFile = Mid(inFile, 1, inFile.LastIndexOf("."))
'Need help creating the process of naming the files the OriginalFileName-part1, OriginalFileName-part2, etc
outFile += "-part1.txt"
End If
Button1.Enabled = False
Dim startDT As DateTime = DateTime.Now
If (inFile.Length > 0) Then
If (outFile.Length > 0) Then
ConvertFiles(inFile, outFile)
Dim stopDT As DateTime = DateTime.Now
Dim elapsedTS As TimeSpan = stopDT.Subtract(startDT)
Dim msg As String = elapsedTS.Hours & "h" & elapsedTS.Minutes & "m" & elapsedTS.Seconds & "s"
MessageBox.Show("File Complete! " & ControlChars.CrLf & ControlChars.CrLf & "Process Duration:" & ControlChars.CrLf & msg)
Button1.Enabled = True
Else
MsgBox("No output file specified!")
End If
Else
MsgBox("No input file specified!")
End If
End Sub
Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)
Try
If (IO.File.Exists(filein)) Then
Dim sin As New IO.StreamReader(filein)
Dim sout As New IO.StreamWriter(fileout, False)
sout.AutoFlush = False ' output file is NOT updated after every WriteLine() call
Dim lineCounter As Integer
Dim items() As String
Dim readline As String = sin.ReadLine
While Not IsNothing(readline)
items = readline.Split(vbTab)
sout.WriteLine(String.Join (vbTab, items))
lineCounter = lineCounter + 1
If lineCounter Mod 60000 = 0 Then
'sout.write I am having trouble creating the files based on the row count
Else
'sout.write the remaining data to the last file
End If
readline = sin.ReadLine
End While
sin.Close()
sout.Flush()
sout.Close()
End If
Catch ex As Exception
MsgBox("Problem Occurred" & ex.Message)
End Try
End Sub
Please see the commented sections of my “attempted” source code below.
Thank you very much for taking the time to read this post and for any technical advice you can help me with,
Gr8life
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim inFile As String
Dim outFile As String
Dim openFileDialog1 As New OpenFileDialog
openFileDialog1.InitialDir
openFileDialog1.Filter = "txt files (*.txt|*.txt|All files(*.*)|*.*"
openFileDialog1.FilterInde
openFileDialog1.RestoreDir
If openFileDialog1.ShowDialog
inFile = openFileDialog1.FileName
outFile = Mid(inFile, 1, inFile.LastIndexOf("."))
'Need help creating the process of naming the files the OriginalFileName-part1, OriginalFileName-part2, etc
outFile += "-part1.txt"
End If
Button1.Enabled = False
Dim startDT As DateTime = DateTime.Now
If (inFile.Length > 0) Then
If (outFile.Length > 0) Then
ConvertFiles(inFile, outFile)
Dim stopDT As DateTime = DateTime.Now
Dim elapsedTS As TimeSpan = stopDT.Subtract(startDT)
Dim msg As String = elapsedTS.Hours & "h" & elapsedTS.Minutes & "m" & elapsedTS.Seconds & "s"
MessageBox.Show("File Complete! " & ControlChars.CrLf & ControlChars.CrLf & "Process Duration:" & ControlChars.CrLf & msg)
Button1.Enabled = True
Else
MsgBox("No output file specified!")
End If
Else
MsgBox("No input file specified!")
End If
End Sub
Public Sub ConvertFiles(ByVal filein As String, ByVal fileout As String)
Try
If (IO.File.Exists(filein)) Then
Dim sin As New IO.StreamReader(filein)
Dim sout As New IO.StreamWriter(fileout, False)
sout.AutoFlush = False ' output file is NOT updated after every WriteLine() call
Dim lineCounter As Integer
Dim items() As String
Dim readline As String = sin.ReadLine
While Not IsNothing(readline)
items = readline.Split(vbTab)
sout.WriteLine(String.Join
lineCounter = lineCounter + 1
If lineCounter Mod 60000 = 0 Then
'sout.write I am having trouble creating the files based on the row count
Else
'sout.write the remaining data to the last file
End If
readline = sin.ReadLine
End While
sin.Close()
sout.Flush()
sout.Close()
End If
Catch ex As Exception
MsgBox("Problem Occurred" & ex.Message)
End Try
End Sub
The file you want to break up into smaller files, is each line in the file the same length or is each line in the file differ in length?
ASKER
Hi Fernando,
All the lines are the same length.
Thanks,
gr8life
All the lines are the same length.
Thanks,
gr8life
Hi gr8life;
Seeming each line is the same length how many characters are on one line?
Fernando
Seeming each line is the same length how many characters are on one line?
Fernando
ASKER
When I stated the lines are the same length, I should have stated the lines have the same number of fields, but the fields have different lengths. Sorry I wasn’t clear in the earlier post.
Gr8life
Gr8life
What I would recommend would be the following:
'In your ConvertFiles sub:
Dim i as Integer = 1
Dim sout As New IO.StreamWriter("c:\temp\" tempfile-p art" & i & ".txt", False) 'where c:\temp is your path (you may need to use i.ToString())
'Further down
readline = sin.ReadLine
While sr.EndOfStream = False
items = readline.Split(vbTab)
'why are you splitting the line and then joining it again?
'Just use your string:
'sout.WriteLine(readline)
sout.WriteLine(String.Join (vbTab, items))
lineCounter = lineCounter + 1
If lineCounter Mod 60000 = 0 Then
'sout.write I am having trouble creating the files based on the row count
sout.Flush()
sout.Close()
'Create a new instance of your file
i = i + 1
sout = New IO.StreamWriter("c:\temp\" tempfile-p art" & i & ".txt", False)
End If
readline = sin.ReadLine
End While
'Leave these at the bottom to close your last file.
sin.Close()
sout.Flush()
sout.Close()
Let me know if you have any questions!
'In your ConvertFiles sub:
Dim i as Integer = 1
Dim sout As New IO.StreamWriter("c:\temp\"
'Further down
readline = sin.ReadLine
While sr.EndOfStream = False
items = readline.Split(vbTab)
'why are you splitting the line and then joining it again?
'Just use your string:
'sout.WriteLine(readline)
sout.WriteLine(String.Join
lineCounter = lineCounter + 1
If lineCounter Mod 60000 = 0 Then
'sout.write I am having trouble creating the files based on the row count
sout.Flush()
sout.Close()
'Create a new instance of your file
i = i + 1
sout = New IO.StreamWriter("c:\temp\"
End If
readline = sin.ReadLine
End While
'Leave these at the bottom to close your last file.
sin.Close()
sout.Flush()
sout.Close()
Let me know if you have any questions!
I just noticed a mistake I made:
Dim sout As New IO.StreamWriter("c:\temp\t empfile-pa rt" & i & ".txt", False)
Dim sout As New IO.StreamWriter("c:\temp\t
One more mistake (it's the end of the day :)
While sin.EndOfStream = False
While sin.EndOfStream = False
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you for the excellent solution.
gr8life
gr8life
No problem always glad to help out. ;=)