Link to home
Start Free TrialLog in
Avatar of mondintator
mondintator

asked on

VB.net - What is the fastest way to read a text file?

I need to read a text file line by line and pull different pieces of text from each line.

At the moment I'm using objreader line by line.  Is this the fastest way to be able to read specific information from each line from the entire file?

        Dim FileName As String = [myfilename]
        Dim objreader As New System.IO.StreamReader(FileName)

        Do While objreader.Peek() <> -1
            TextLine = objreader.ReadLine() & vbNewLine
            Line_Number = Line_Number + 1

        Loop
        objreader.Close()
Avatar of Karrtik Iyer
Karrtik Iyer
Flag of India image

There is file.readalllines method to read all lines.
Also there is textfieldparser class which helps to read if your file is comma separated or semi colon separated or any other separator for that matter.
Even there's a way to perform the read operation asynchronously so that it can happen in a separate thread while main thread can do its normal UI work in meantime. But if this file reading is something required for any other operation to happen in your app then you can do this at start up.
So the question is what do you want to do after reading all lines? Do you need to further parse each string using some separator then you can use text filed parser. If this work can happen in parallel then it can be done in a separate thread.
Else you can do File.readalllines.
Here is an example of file.readalllines
' Open the file to read from.
Dim path as string = "C:\sample.txt"
        Dim readText() As String = File.ReadAllLines(path)
        Dim s As String
        For Each s In readText
            Console.WriteLine(s)
        Next

How big is your text file, how much data will it contain in worst case?
Avatar of mondintator
mondintator

ASKER

I use readalllines first so I can see the progress of the file being read.  But when I read each line of the file, I need to pick up specific strings within each line and put them into table
How do you pick strings from each line to put into a table, is there a separator in each line based on which you will read the string?
No I pick up the text from it's position using mid(textline, starting number, length)
How big is your text file, how much data will it contain in worst case? Do you think it is currently taking lot of time?
Text files can be huge - usually under 250,000KB, but could be bigger
Ok, that's around 250 mb.
So coming back to your previous comment, can you please explain what was your observation if you used File.ReadAllLines?
You can use background worker thread to read all lines and read parts of a string using mid or by position in a table.
And till that operation is finished you can show the progress in your UI.
Do you see any issues with this approach?
I only use readalllines to get the total number of lines in the file, purely so I can track progress as I read line by line.
Can I use readalllines to read parts of a string using mid?  Is this faster than readline?
Readalllines returns a array of string. And size of the array would represent the number of lines. Yes,  I would tend to think that reading all the lines using read all lines and then performing the reading (using for each of this string array) of each part using mid or character would be faster then reading it line by line. But how much faster is relative.
Would it possible for you to provide a sample text file of your and also the logic of reading parts of each line to separate them to put it to a table?
I normally do this exercise myself (log start time and end time) to determine which is faster.
Public Function Read_File()

        Read_Ahead()

        Line_Number = 0
        Dim FileName As String = Myfilename
        Dim objreader As New System.IO.StreamReader(FileName)

        Do While objreader.Peek() <> -1
            TextLine = objreader.ReadLine() & vbNewLine
            Line_Number = Line_Number + 1

                MyCroppedField1 = Mid(TextLine, 46, 11)
                MyCroppedField2 = Mid(TextLine, 62, 6)
                MyCroppedField3 = Mid(TextLine, 73, 2)
           
        Loop
        objreader.Close()

        Return True

    End Function

    Private Sub Read_Ahead()

        Dim FileName As String = Myfilename
        Dim Total_Recs_ReadAhead As Integer = System.IO.File.ReadAllLines(FileName).Length

        Total_recs = Total_Recs_ReadAhead

    End Sub
Thanks for sharing, give me sometime, I shall get back to you with my findings.
Can you also please clarify below?

1) Which .net framework are you targeting? Will your process (exe) be 64 bit or 32 bit?
2) what is your target deployment environment?  Which processor, how many cores, how much RAM, which OS is some of the information that I'm looking for?
vb.net.  64 bit.  Not sure about target deployment environment / processors, ram etc.  Windows 7 or 10.
Ok, thanks, would it be .net framework 3.5, 4.0 or 4.5?


Fyi : For the current project which .Net framework is being used can be seen by right click on project, properties, compiler options, advance options.
3.5
Avatar of it_saige
I would use something like:
Sub Read_File()
	Dim lines = File.ReadAllLines(MyfileName)
	Total_recs = lines.Count
	Dim parsed = (From line In lines Select New With {.Field1 = Mid(line, 46, 11), .Field2 = Mid(line, 62, 6), .Field3 = Mid(line, 73, 2)})
End Sub

Open in new window

The second read of the file is not needed as you know the line count from the first ReadAllLines.  You simply need to combine both methods.  The linq method allows for you to parse your lines into a readily readable Enumerable.

Proof of concept:
Imports System.IO

Module Module1
	Sub Main()
		Dim lines = File.ReadAllLines("EE_Q28908699.txt")
		Console.WriteLine("File has {0} lines.", lines.Count)
		Dim parsed = (From line In lines Select New With {.Field1 = Mid(line, 46, 11), .Field2 = Mid(line, 62, 6), .Field3 = Mid(line, 73, 2)})
		For Each line In parsed
			Console.WriteLine(line)
		Next
		Console.ReadLine()
	End Sub
End Module

Open in new window

User generated image-saige-
Simply read the whole file and use the Split method on the result to split it into an array of string, taking a line change as the separator. Each element of the array will be a line in the file.

		Dim lines() As String
		Dim reader As New StreamReader("C:\Users\Jacques\Desktop\Test.txt")

		lines = reader.ReadToEnd.Split(Environment.NewLine)

		For Each line As String In lines
			'Do what you want with the line
			Debug.WriteLine(line)
		Next

Open in new window

Hi mondintator;

As stated above reading through the file more then once will cause longer processing time so read once and cash the data. The below sample code will help you do that as well as allow you to track the process of which line it is up to.
Imports System.IO
Imports System.Linq

Public Class Form1
    '' update file location on the next line
    Dim Myfilename As String = "C:\PathToFile\FilenameHere.txt"

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        Dim Line_Number = 0
        Dim dataFields As New List(Of MyData)
        '' Lambda Expression to process the input file, this line of code is not executed
        '' until it is called below.
        Dim processFile As Action(Of String) =
            Sub(inputFile)
                '' Get the file name from the input parameter
                Dim FileName As String = inputFile
                '' Used to hold all the input lines from the file as a list of strings
                Dim Lines As List(Of String)
                '' Using statement Opens and Close and Dispose the file when finished
                Using objReader As New StreamReader(FileName)
                    '' Removes carrage returns and empty lines from the file
                    Lines = objReader.ReadToEnd().Split(New Char() {vbLf}, StringSplitOptions.RemoveEmptyEntries).ToList()
                End Using
                '' Parses the line as needed
                For Each line In Lines
                    '' you can track progress and update progress in the For Each loop.
                    '' I used a custome class but you can use your table object in its place
                    Dim lineData As New MyData
                    lineData.Field1 = line.Substring(45, 11)
                    lineData.Field2 = line.Substring(61, 6)
                    lineData.Field3 = line.Substring(72, 2)
                    '' Add the parsed data to the list
                    dataFields.Add(lineData)
                Next
            End Sub

        '' Runs the above lambda expression 
        processFile(Myfilename)
    End Sub

End Class

'' Helps in parseing the file.
Public Class MyData
    Public Property Field1 As String
    Public Property Field2 As String
    Public Property Field3 As String
End Class

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Karrtik Iyer
Karrtik Iyer
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Top class once again