paulsidebottom
asked on
Buffered Stream Read for a CSV file
Hi,
I've written some code to use a buffered filestream to read a file. However, I was wondering, whats the best way of processing a CSV file with the buffered reader? Traditionally I used the ReadLine method of the streamreader object (splitting on line feeds/commas), however this method is obviously slower since its not a buffered read.
The (simplified, therefore no error handling shown) code I have thus far is :
Sub ReadBuffered(ByVal buffersize As Integer)
Dim f As IO.FileStream = System.IO.File.OpenRead(fi le)
Dim buf(buffersize) As Byte
Dim intread As Integer
Dim x As Integer
intread = f.Read(buf, 0, buffersize)
While (intread) > 0
intread = f.Read(buf, 0, buffersize)
End While
f.Close()
end if
I guess I could look at the byte array to detect carriage returns but I'm not sure.
As an example, how would I search a column within the CSV file for a particular value and return the relevant line. e.g. a name and address entry searched by a unique id?
Points will be awarded to whoever offers a fairly concise answer i.e. no links to web pages please (unless they directly answer the question in hand).
Thanks for any input.
I've written some code to use a buffered filestream to read a file. However, I was wondering, whats the best way of processing a CSV file with the buffered reader? Traditionally I used the ReadLine method of the streamreader object (splitting on line feeds/commas), however this method is obviously slower since its not a buffered read.
The (simplified, therefore no error handling shown) code I have thus far is :
Sub ReadBuffered(ByVal buffersize As Integer)
Dim f As IO.FileStream = System.IO.File.OpenRead(fi
Dim buf(buffersize) As Byte
Dim intread As Integer
Dim x As Integer
intread = f.Read(buf, 0, buffersize)
While (intread) > 0
intread = f.Read(buf, 0, buffersize)
End While
f.Close()
end if
I guess I could look at the byte array to detect carriage returns but I'm not sure.
As an example, how would I search a column within the CSV file for a particular value and return the relevant line. e.g. a name and address entry searched by a unique id?
Points will be awarded to whoever offers a fairly concise answer i.e. no links to web pages please (unless they directly answer the question in hand).
Thanks for any input.
ASKER
Many thanks, I'll give the code a try and report back - on a semi side note: when using filestreams, is it best to create the StreamReader/FileStream within the procedure or at a global level within the class?
I figured if it was created on a global level then it would improve performance since it wouldnt have to recreate the stream object each time the procedure was called, however, I guess this wouldnt necessarily be the most robust/structured option?
Thanks again.
I figured if it was created on a global level then it would improve performance since it wouldnt have to recreate the stream object each time the procedure was called, however, I guess this wouldnt necessarily be the most robust/structured option?
Thanks again.
ASKER
One quick question regarding the code you have supplied: I see your using the string split method to split the buffer text on line feeds, however, one of the concerns I had when implementing this was dealing with broken lines ie. since the line length in the CSV file will vary and I'm reading a fixed block of bytes upon each read, how will it know that it has read a complete line? if the line wasnt complete, then I couldnt reliably load it into a datatable for later use (for example).
Thanks.
Thanks.
ASKER
Sorry for the repeated posts but looking at the code it doesnt seem to offer the solution I'm after.
The reason I was only retreiving a block of bytes at a time was to make searching a large file as fast as possible, the code you have read appears to function in a similar fashion to the StreamReaders readtoend method, ie. you are reading the entire file first. This will not work very well when dealing with files of circa 100mb.
The reason I was only retreiving a block of bytes at a time was to make searching a large file as fast as possible, the code you have read appears to function in a similar fashion to the StreamReaders readtoend method, ie. you are reading the entire file first. This will not work very well when dealing with files of circa 100mb.
Yes, is NOT recommended to read the whole file when the size is 100 Mb.
In this case, you should use an OleDbConnection to the file.
'*** Change the file property to your file location
Dim file As String = "..."
Dim conn_string As String = String.Format("Provider=Mi crosoft.Je t.OLEDB.4. 0;Data Source={0}; Extended Properties=""Text; HDR=No;FMT=Delimited""", System.IO.Path.GetDirector yName(file ))
Dim conn As OleDb.OleDbConnection
Try
conn = New OleDb.OleDbConnection(conn _string)
conn.Open()
Catch ex As Exception
System.Windows.Forms.Messa geBox.Show (ex.ToStri ng, ex.Source, MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
Dim statement As String = String.Format("Select * From {0}", System.IO.Path.GetFileName (file))
Dim DA As New OleDb.OleDbDataAdapter(sta tement, conn)
Dim DT As DataTable = New DataTable
DA.Fill(DT)
'*** At this point, the DataTable contains all the data, continue reading each row to find your ID
'*** The connection string contains the expression HDR=No
'*** When HDR=Yes, the first line of the file contains the column headers, otherwise, the file doesn't contain column headers
I hope that helps.
In this case, you should use an OleDbConnection to the file.
'*** Change the file property to your file location
Dim file As String = "..."
Dim conn_string As String = String.Format("Provider=Mi
Dim conn As OleDb.OleDbConnection
Try
conn = New OleDb.OleDbConnection(conn
conn.Open()
Catch ex As Exception
System.Windows.Forms.Messa
End Try
Dim statement As String = String.Format("Select * From {0}", System.IO.Path.GetFileName
Dim DA As New OleDb.OleDbDataAdapter(sta
Dim DT As DataTable = New DataTable
DA.Fill(DT)
'*** At this point, the DataTable contains all the data, continue reading each row to find your ID
'*** The connection string contains the expression HDR=No
'*** When HDR=Yes, the first line of the file contains the column headers, otherwise, the file doesn't contain column headers
I hope that helps.
ASKER
Apologies, I probably should have made my question slightly clearer.
The only problem with the solution you have proposed is that ideally I was planning on the code being ported to the .NET compact framework which unfortunately doesnt seem to have oledb support. I have obtained good results using the buffering previously described (I can read a file of 15-20mb) in 80ms or so.
thanks
The only problem with the solution you have proposed is that ideally I was planning on the code being ported to the .NET compact framework which unfortunately doesnt seem to have oledb support. I have obtained good results using the buffering previously described (I can read a file of 15-20mb) in 80ms or so.
thanks
ASKER
Upped to the point value to stir interest...
ASKER
Upped the points again - no takers?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Apologies, after vacation, I forgot I posted this! - yes I'm happy with the resolution. Thanks for your answer.
First, read all the data, evaluate the information you are reading and write that information to a DataTable.
Dim i As Integer
Dim lines() As String, line As String, data() As String
Dim fs As System.IO.FileStream
Dim Buffer As String
Try
fs = New System.IO.FileStream(Promp
Dim b(fs.Length - 1) As Byte
fs.Read(b, 0, fs.Length)
Buffer = System.Text.Encoding.Defau
lines = Buffer.Split(ControlChars.
For i = 0 To lines.Length - 1
line = lines(i).TrimStart(Control
data = line.Split(","c)
' * * data array contains all the columns, you should check the information to be correct.
' * * option 1. check the id at the specified column index.
' * * option 2. add a new row to the datatable to insert the information read.
Next
Catch ex As Exception
System.Windows.Forms.Messa
Finally
If Not fs Is Nothing Then fs.Close()
End Try
I hope this information is helpfull
Victor