Reading last line of file to see if I should read more lines, does this make it faster? VB.NET

I've done some research, and this question has been answered before here and other forums I suppose, but there are some unanswered questions.

Ok, normally when I read a file line by line, I use the readline method. This is a bit time consuming. What I'm trying to determine is if I should read a whole "batch" file or just the beginning of it to get the necessary information. Let me give you a condensed example.
My file sizes vary from 1K to 19 megabytes in size. I'm using various techniques to avoid the larger files including this one.

Here is a sample text file:

BEGIN BATCH ID 777
REFERENCE NUMBER: 1234
......(Lots of lines of information I don't need here)....Up to 19M in size of this info
END OF BATCH 777

A given file can have more than one batch. If there is only one batch, I only need to read the first two lines to find the reference number that I"m searching for. Make sense? So in this first example, it's only one batch, and I only look at the first line.

But some files have multiple batches:

BEGIN BATCH ID 778
REFERENCE NUMBER: 6789
......(Lots of lines of information I don't need here)....Up to 19M of info.
END OF BATCH 778
BEGIN OF BATCH 779
REFERENCE NUMBER: 8910
....(More lines I don't need to read)
END OF BATCH 779

Ideally I would want to read the first two lines of every file, then if the reference number is not found in that file, I would want to read the last line of the file to see if the batch number of the first line matches the batch number at the end. This would save me alot of time reading lines in a file, I would only read all the lines in a file if there was more than one batch in a file (usually there is only one batch, so it's pretty fast).

But if I have to read the whole file anyway what is the point of reading the last line? That's what I've seen in other solutions.

I'm thinking perhaps the read method instead of the readline method should be used to make this determination, then the readline method should be used to read the first lines of the file if it is found that the file contains only one batch.


 
Private Sub SearchFile()
        ' This is un-tested code just to communicate what I mean
        Dim line As String
        Dim lastLine As String
        Dim onlyReadThisMuch AS Integer = 200
        Dim lengthReadSoFar As Double = 0
        Dim searchtext As String = "REFERENCE NUMBER: 8910""
        Dim sr As StreamReader("c:\myfile.txt")
        Do while sr.Peek() > 0
          line = sr.Readline()
          If InStr(line,searchtext) > 0 Then
             Messagebox.Show("Hey I found the line!")
             Exit Sub
          End If
          If InStr(line,"BEGIN OF BATCH") > 0 Then
          ' This is the beginning of the batch
          batchID = Mid(line,16,4)
          lengthReadSoFar = lengthReadSoFar + line.length
          If lengthReadSoFar > onlyReadThisMuch Then
        	'
        	' here's where I would read the last line
        	'
        	lastLine = ExpertsExchangeHelpMeHerePlease("c:\myfile.txt")
        	If InStr(lastLine,batchID) > 0 Then
        	  ' Same batch ID I'm done searching this file
        	   MessageBox.Show("Sorry I can't find your reference number in the file.")
        	   Exit Sub
        	Else
        	  Do while sr.Peek() > 0
        	    line = sr.ReadLIne()
        	    If InStr(line,searchtext) > 0 Then
        	     Messagebox.Show("Hey I found the line!")
        	     Exit Do
        	    End If
        	  Loop
        	End If
          End If
        Loop
        MessageBox.Show("Sorry I can't find your reference number in the file.")
        End Su

Open in new window

LVL 1
harmonoAsked:
Who is Participating?
 
käµfm³d 👽Connect With a Mentor Commented:
I think you're asking something like this:
Sub SearcFile(ByVal path As String)
    Const BGN_LINE = "BEGIN BATCH ID"
    Const END_LINE = "END OF BATCH"

    Using reader As New System.IO.StreamReader(path)
        Dim firstLine As String = reader.ReadLine().Replace(BGN_LINE, String.Empty)
        Dim lastLine As String = String.Empty

        Using file As New System.IO.FileStream(path, IO.FileMode.Open, IO.FileAccess.Read, IO.FileShare.Read)
            Dim accumulator As New StringBuilder()
            Dim current As Byte
            Dim lower As Byte = Convert.ToByte("0"c)
            Dim upper As Byte = Convert.ToByte("9"c)

            file.Seek(-1, IO.SeekOrigin.End)

            Do
                current = file.ReadByte()

                If current >= lower AndAlso current <= upper Then
                    accumulator.Insert(0, Convert.ToChar(current))
                End If

                file.Seek(-2, IO.SeekOrigin.Current)    ' Jump back two places since we just read one
            Loop While current >= lower AndAlso current <= upper

            lastLine = accumulator.ToString().Replace(END_LINE, String.Empty)
        End Using   ' implicit close FileStream

        If firstLine <> lastLine Then
            ' Multiple batches, continue with "reader"
            While Not reader.EndOfStream
                ' process each line
            End While
        Else
            ' Single batch, read next line from reader
            Dim refnum As String = reader.ReadLine()
        End If
    End Using   ' implicit close StreamReader
End Sub

Open in new window

0
 
harmonoAuthor Commented:
That looks like it may be it. I'll see if that works.
0
 
lenordisteCommented:
if at some point you need to read the whole file do this:
System.IO.File.ReadAllText("myfile.txt")

it's very fast for large files.
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

 
Mike TomlinsonMiddle School Assistant TeacherCommented:
"...Up to 19M in size of this info"

That could be a pretty slow and expensive operation using ReadAllText() with a 19MB+ file size!    =\
0
 
lenordisteCommented:
ouch you're right. StreamReader would probably be a better approach for a file that large, even if it makes code slightly less readable.
0
 
harmonoAuthor Commented:
Ok I have created a function which is similar to what kaufmed posted, but my search functionality stopped working, so I need to find out what happened, but this is the function I created from kaufmed's design. I probably messed up some logic when I plugged in this function, because the search is not just a text search but has all kinds of filtering features. So I'll see if it works. It didn't break crash though, so it must not be causing a problem.
Private Function IsMultiISA(ByVal myFile As String, ByVal myISA As String) As Boolean
        ' This determines if there are multiple ISA numbers in a file so that the whole file is searched 
        ' in case the reference number can be found in another ISA envelope instead of the first fiew lines of the file
        '
        IsMultiISA = True
        Dim accumulator As New StringBuilder()
        Dim current As Byte
        Dim lastline As String
        Dim lower As Byte = Convert.ToByte("0"c)
        Dim upper As Byte = Convert.ToByte("9"c)
        Using file As New System.IO.FileStream(myFile, IO.FileMode.Open, IO.FileAccess.Read, IO.FileShare.Read)
            file.Seek(-1, IO.SeekOrigin.End)

            Do
                current = file.ReadByte()

                If current >= lower AndAlso current <= upper Then
                    accumulator.Insert(0, Convert.ToChar(current))
                End If

                file.Seek(-2, IO.SeekOrigin.Current)    ' Jump back two places since we just read one
            Loop While current >= lower AndAlso current <= upper
            lastline = accumulator.ToString()

        End Using   ' implicit close FileStream

        If InStr(myISA, lastline) > 0 Then
            ' the last line has the same ISA control number, do not read the rest of the file
            IsMultiISA = False
        End If

    End Function

Open in new window

0
 
käµfm³d 👽Commented:
Should we assume the value of "myISA" is taken from the first line of the file, but at some other portion of code? Also, are the reference numbers always the same number of digits? I would hate to be searching for "6999" and have "99" as the value of lastline   ; )
0
 
harmonoAuthor Commented:
I'm not very well learned on Byte level operations, but it seems that the detection of numbers is not working.

This logic never returns true.

 current = file.ReadByte()

                If current >= lower AndAlso current <= upper Then
                    accumulator.Insert(0, Convert.ToChar(current))
                End If
It seems like this last line that is read is always null so when I compare the strings the string is never found.
I pasted my function above, so if you could take a look I would appreciate the help. I think I understand the code, but I"m not sure why it doesn't work. It's looking for character range from 0 to 9 and when it finds that it accumulates these numbers into a StringBuilder object, then converts that to string, then my modification is to search for my string within that string.
But I put a breakpoint in the accumulator.Insert line and that  breakpoint is never reached, so
the  If current >= lower AndAlso current <= upper Then is never true. lower would be 0 and upper would be 9.
I don't know what to change. I will monitor the file.ReadByte() method there to see if bytes are being read, maybe that does not read properly, or it's not accumulating bytes(?).
0
 
harmonoAuthor Commented:
As I suspected, it's not reading properly. It only reads blank. Probably because the last characters of the file are not numeric, so it's not looping(?). Something wrong with the

Loop While current >= lower AndAlso current <= upper

Line of code

The test file I'm using ends like this:
It has spaces at the end. So I think it's expecting numbers at the end, and that is not possible.
I'm not sure how to fix this.
1>GE~1~980>IEA~2~000001559>                                                     <CR><LF>
0
 
harmonoAuthor Commented:
Ok I got it to work.
I had to change one line of code in the function.

Loop While accumulator.Length < 80

It would not loop because the last characters were not numeric. It would work in my hypothetical scenario that I gave for the question but the actual file had spaces and delimiters at the end of it causing it to not loop. So now it works. This is great because not many files have multiple batches.

Any ideas for a better way of doing this loop?  
0
 
harmonoAuthor Commented:
kaufmed's solution once tailored to my needs worked perfectly. This is not a crucial piece of code so this is more than good enough.
0
 
lenordisteCommented:
you could had an additionnal flag and set it once you find your first number. For example:

       
 dim WaitForFirstDigit as boolean=true
         Do
                current = file.ReadByte()

                If current >= lower AndAlso current <= upper Then
                    WaitForFirstDigit=false
                    accumulator.Insert(0, Convert.ToChar(current))
                End If

                file.Seek(-2, IO.SeekOrigin.Current)    ' Jump back two places since we just read one
            Loop While WaitForFirstDigit orelse (DigitFound current >= lower AndAlso current <= upper)

Open in new window

0
 
harmonoAuthor Commented:
lenordiste that's a good idea
0
 
harmonoAuthor Commented:
kaufmed,

The isa number is always the same number of digits so an InStr function works perfect.
I didn't give you exact details on my question because I thought it would make it easier, but I should have explained that the last line would have had junk at the end, and that there was a delimiter. I was basically looking for how to read from the end of a file backwards, and your solution worked perfectly once I tweaked it to fit my needs.
0
 
lenordisteCommented:
a better approach could be to add this condition to the do loop:
if the accumulator's length is 0 than we have not found a number yet and need to continue
if the current char is a number we continue until we find a non digit

Do
                current = file.ReadByte()

                If current >= lower AndAlso current <= upper Then
                    accumulator.Insert(0, Convert.ToChar(current))
                End If

                file.Seek(-2, IO.SeekOrigin.Current)    ' Jump back two places since we just read one
            Loop While accumulator.length=0 orelse (current >= lower AndAlso current <= upper)

Open in new window

0
 
harmonoAuthor Commented:
Yeah that would work.
0
 
lenordisteCommented:
if it's always the same number of digits than you can also wait for the accumulator to be the size of your number. Good luck with your project.

          Do
                current = file.ReadByte()

                If current >= lower AndAlso current <= upper Then
                    accumulator.Insert(0, Convert.ToChar(current))
                End If

                file.Seek(-2, IO.SeekOrigin.Current)    ' Jump back two places since we just read one
            Loop While accumulator.length<IsaLength

Open in new window

0
 
harmonoAuthor Commented:
Thanks. I'm still tinkering with it. I think it's going to work.
0
 
harmonoAuthor Commented:
Yes I used the length comparison for the loop and it worked.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.