deadferret
asked on
Start reading a File from a Specific Row or Position
Basically I know how to loop through the file with StreamReader (i.e. sr.ReadLine) but what I want to do is start reading a row from a specific point in the file.
My routine basically reads a file into StreamReader, loops through the starting set of lines to get to my Starting Point, and then it reads the lines I want extracted to another file.
Here is some example code:
//iStart is declared as int32, line is a String
//oWrite is a class that creates the new file
Private Function openFile(ByVal fileName As String, ByVal getCount As Int32)
Try
Dim sr As StreamReader = New StreamReader(fileName)
Dim sLines As New System.Text.StringBuilder
Do While sCount <> iStart
line = sr.ReadLine
sCount += 1
Loop
sCount = 0
Do While sCount <> getCount
line = sr.ReadLine
sLines.Append(line & vbCrLf)
sCount += 1
Loop
oWrite.WriteParse(sLines.T oString(), sfileName)
sr.Close()
sCount = 0
Catch ex As Exception
Throw ex
End Try
End Function
The worst situations occur when I am loading a file that has 2 million rows and I need something that started after the 500,000th row. It takes a while to loop to the 500,000th position in the file, and when it finally gets there I am having memory problems slowing down my computer. Any suggestions on how to jump to the position without reading each line?
My routine basically reads a file into StreamReader, loops through the starting set of lines to get to my Starting Point, and then it reads the lines I want extracted to another file.
Here is some example code:
//iStart is declared as int32, line is a String
//oWrite is a class that creates the new file
Private Function openFile(ByVal fileName As String, ByVal getCount As Int32)
Try
Dim sr As StreamReader = New StreamReader(fileName)
Dim sLines As New System.Text.StringBuilder
Do While sCount <> iStart
line = sr.ReadLine
sCount += 1
Loop
sCount = 0
Do While sCount <> getCount
line = sr.ReadLine
sLines.Append(line & vbCrLf)
sCount += 1
Loop
oWrite.WriteParse(sLines.T
sr.Close()
sCount = 0
Catch ex As Exception
Throw ex
End Try
End Function
The worst situations occur when I am loading a file that has 2 million rows and I need something that started after the 500,000th row. It takes a while to loop to the 500,000th position in the file, and when it finally gets there I am having memory problems slowing down my computer. Any suggestions on how to jump to the position without reading each line?
ms-help://MS.MSDNQTR.2004J AN.1033/cp ref/html/f rlrfSystem IOStreamRe aderClassR eadTopic.h tm
MSDN help link
MSDN help link
ASKER
How would I go about doing this? "You could read a file fragment to a predefined buffer, say in 16kb chunks and analyze them manually to find CR/LFs. That would solve your problem with memory."
ASKER
I also continue to get this message when trying to read for a Cr/Lf.
"Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
Do I have to give it the full length of the file?
"Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
Do I have to give it the full length of the file?
Private Shared Sub Main(ByVal args As String())
Dim fs As FileStream
If (args.Length = 1) Then
fs = New FileStream(args(0), FileMode.Open)
Else
Console.Write("Enter file name: ")
Dim fileName As String = Console.ReadLine
fs = New FileStream(fileName, FileMode.Open)
End If
Dim buffer As Byte() = New Byte(262144) {}
Dim bufferSize As Integer = buffer.Length
Dim offset As Integer = 0
Dim readBytes As Integer = 0
Dim lines As Integer = 0
Do
readBytes = fs.Read(buffer, 0, bufferSize)
offset = (offset + readBytes)
Console.Write("Read {0} bytes... ", offset)
Dim counter As Integer = 0
Do While (counter < readBytes)
lines = (lines + IIf((buffer(counter++) = 10), 1, 0))
Loop
Console.WriteLine("{0} lines found", lines)
Loop ((readBytes = bufferSize) AndAlso (offset <> fs.Length))
fs.Close
End Sub
Dim fs As FileStream
If (args.Length = 1) Then
fs = New FileStream(args(0), FileMode.Open)
Else
Console.Write("Enter file name: ")
Dim fileName As String = Console.ReadLine
fs = New FileStream(fileName, FileMode.Open)
End If
Dim buffer As Byte() = New Byte(262144) {}
Dim bufferSize As Integer = buffer.Length
Dim offset As Integer = 0
Dim readBytes As Integer = 0
Dim lines As Integer = 0
Do
readBytes = fs.Read(buffer, 0, bufferSize)
offset = (offset + readBytes)
Console.Write("Read {0} bytes... ", offset)
Dim counter As Integer = 0
Do While (counter < readBytes)
lines = (lines + IIf((buffer(counter++) = 10), 1, 0))
Loop
Console.WriteLine("{0} lines found", lines)
Loop ((readBytes = bufferSize) AndAlso (offset <> fs.Length))
fs.Close
End Sub
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Overloads Overrides Public Function Read(Char(), Integer, Integer) As Integer
You could read a file fragment to a predefined buffer, say in 16kb chunks and analyze them manually to find CR/LFs. That would solve your problem with memory.
HTH