Link to home
Start Free TrialLog in
Avatar of paulsidebottom
paulsidebottom

asked on

Buffered Stream Read for a CSV file

Hi,

I've written some code to use a buffered filestream to read a file. However, I was wondering, whats the best way of processing a CSV file with the buffered reader? Traditionally I used the ReadLine method of the streamreader object (splitting on line feeds/commas), however this method is obviously slower since its not a buffered read.

The (simplified, therefore no error handling shown) code I have thus far is :

Sub ReadBuffered(ByVal buffersize As Integer)
        Dim f As IO.FileStream = System.IO.File.OpenRead(file)
        Dim buf(buffersize) As Byte
        Dim intread As Integer
        Dim x As Integer
        intread = f.Read(buf, 0, buffersize)
        While (intread) > 0
            intread = f.Read(buf, 0, buffersize)
        End While
        f.Close()
end if

I guess I could look at the byte array to detect carriage returns but I'm not sure.

As an example, how would I search a column within the CSV file for a particular value and return the relevant line. e.g. a name and address entry searched by a unique id?

Points will be awarded to whoever offers a fairly concise answer i.e. no links to web pages please (unless they directly answer the question in hand).

Thanks for any input.

Avatar of Pigtor
Pigtor
Flag of Mexico image

To find primary key values, sort columns and other functions, you should export the information to a DataTable.
First, read all the data, evaluate the information you are reading and write that information to a DataTable.

Dim i As Integer
Dim lines() As String, line As String, data() As String
Dim fs As System.IO.FileStream
Dim Buffer As String
Try
      fs = New System.IO.FileStream(Prompt.FileName, IO.FileMode.Open)
      Dim b(fs.Length - 1) As Byte
      fs.Read(b, 0, fs.Length)
      Buffer = System.Text.Encoding.Default.GetString(b)
      lines = Buffer.Split(ControlChars.Cr)
      For i = 0 To lines.Length - 1
            line = lines(i).TrimStart(ControlChars.Lf)
            data = line.Split(","c)
                                ' * * data array contains all the columns, you should check the information to be correct.
                                ' * * option 1.  check the id at the specified column index.
                                ' * * option 2.  add a new row to the datatable to insert the information read.
      Next
Catch ex As Exception
      System.Windows.Forms.MessageBox.Show(ex.Message, ex.Source, MessageBoxButtons.OK, MessageBoxIcon.Error)
Finally
      If Not fs Is Nothing Then fs.Close()
End Try


I hope this information is helpfull

Victor

Avatar of paulsidebottom
paulsidebottom

ASKER

Many thanks, I'll give the code a try and report back - on a semi side note: when using filestreams, is it best to create the StreamReader/FileStream within the procedure or at a global level within the class?

I figured if it was created on a global level then it would improve performance since it wouldnt have to recreate the stream object each time the procedure was called, however, I guess this wouldnt necessarily be the most robust/structured option?

Thanks again.
One quick question regarding the code you have supplied: I see your using the string split method to split the buffer text on line feeds, however, one of the concerns I had when implementing this was dealing with broken lines ie. since the line length in the CSV file will vary and I'm reading a fixed block of bytes upon each read, how will it know that it has read a complete line? if the line wasnt complete, then I couldnt reliably load it into a datatable for later use (for example).

Thanks.
Sorry for the repeated posts but looking at the code it doesnt seem to offer the solution I'm after.

The reason I was only retreiving a block of bytes at a time was to make searching a large file as fast as possible, the code you have read appears to function in a similar fashion to the StreamReaders readtoend method, ie. you are reading the entire file first. This will not work very well when dealing with files of circa 100mb.
Yes, is NOT recommended to read the whole file when the size is 100 Mb.

In this case, you should use an OleDbConnection to the file.

'*** Change the file property to your file location
Dim file As String = "..."
Dim conn_string As String = String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0}; Extended Properties=""Text; HDR=No;FMT=Delimited""", System.IO.Path.GetDirectoryName(file))

Dim conn As OleDb.OleDbConnection
Try
     conn = New OleDb.OleDbConnection(conn_string)
     conn.Open()
Catch ex As Exception
     System.Windows.Forms.MessageBox.Show(ex.ToString, ex.Source, MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try

Dim statement As String = String.Format("Select * From {0}", System.IO.Path.GetFileName(file))
Dim DA As New OleDb.OleDbDataAdapter(statement, conn)
Dim DT As DataTable = New DataTable
DA.Fill(DT)

'*** At this point, the DataTable contains all the data, continue reading each row to find your ID

'*** The connection string contains the expression HDR=No
'*** When HDR=Yes, the first line of the file contains the column headers, otherwise, the file doesn't contain column headers


I hope that helps.

Apologies, I probably should have made my question slightly clearer.

The only problem with the solution you have proposed is that ideally I was planning on the code being ported to the .NET compact framework which unfortunately doesnt seem to have oledb support.  I have obtained good results using the buffering previously described (I can read a file of 15-20mb) in 80ms or so.

thanks
Upped to the point value to stir interest...
Upped the points again - no takers?
ASKER CERTIFIED SOLUTION
Avatar of Pigtor
Pigtor
Flag of Mexico image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Apologies, after vacation, I forgot I posted this! - yes I'm happy with the resolution. Thanks for your answer.