• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 455
  • Last Modified:

Buffered Stream Read for a CSV file

Hi,

I've written some code to use a buffered filestream to read a file. However, I was wondering, whats the best way of processing a CSV file with the buffered reader? Traditionally I used the ReadLine method of the streamreader object (splitting on line feeds/commas), however this method is obviously slower since its not a buffered read.

The (simplified, therefore no error handling shown) code I have thus far is :

Sub ReadBuffered(ByVal buffersize As Integer)
        Dim f As IO.FileStream = System.IO.File.OpenRead(file)
        Dim buf(buffersize) As Byte
        Dim intread As Integer
        Dim x As Integer
        intread = f.Read(buf, 0, buffersize)
        While (intread) > 0
            intread = f.Read(buf, 0, buffersize)
        End While
        f.Close()
end if

I guess I could look at the byte array to detect carriage returns but I'm not sure.

As an example, how would I search a column within the CSV file for a particular value and return the relevant line. e.g. a name and address entry searched by a unique id?

Points will be awarded to whoever offers a fairly concise answer i.e. no links to web pages please (unless they directly answer the question in hand).

Thanks for any input.

0
paulsidebottom
Asked:
paulsidebottom
  • 7
  • 3
1 Solution
 
PigtorCommented:
To find primary key values, sort columns and other functions, you should export the information to a DataTable.
First, read all the data, evaluate the information you are reading and write that information to a DataTable.

Dim i As Integer
Dim lines() As String, line As String, data() As String
Dim fs As System.IO.FileStream
Dim Buffer As String
Try
      fs = New System.IO.FileStream(Prompt.FileName, IO.FileMode.Open)
      Dim b(fs.Length - 1) As Byte
      fs.Read(b, 0, fs.Length)
      Buffer = System.Text.Encoding.Default.GetString(b)
      lines = Buffer.Split(ControlChars.Cr)
      For i = 0 To lines.Length - 1
            line = lines(i).TrimStart(ControlChars.Lf)
            data = line.Split(","c)
                                ' * * data array contains all the columns, you should check the information to be correct.
                                ' * * option 1.  check the id at the specified column index.
                                ' * * option 2.  add a new row to the datatable to insert the information read.
      Next
Catch ex As Exception
      System.Windows.Forms.MessageBox.Show(ex.Message, ex.Source, MessageBoxButtons.OK, MessageBoxIcon.Error)
Finally
      If Not fs Is Nothing Then fs.Close()
End Try


I hope this information is helpfull

Victor

0
 
paulsidebottomAuthor Commented:
Many thanks, I'll give the code a try and report back - on a semi side note: when using filestreams, is it best to create the StreamReader/FileStream within the procedure or at a global level within the class?

I figured if it was created on a global level then it would improve performance since it wouldnt have to recreate the stream object each time the procedure was called, however, I guess this wouldnt necessarily be the most robust/structured option?

Thanks again.
0
 
paulsidebottomAuthor Commented:
One quick question regarding the code you have supplied: I see your using the string split method to split the buffer text on line feeds, however, one of the concerns I had when implementing this was dealing with broken lines ie. since the line length in the CSV file will vary and I'm reading a fixed block of bytes upon each read, how will it know that it has read a complete line? if the line wasnt complete, then I couldnt reliably load it into a datatable for later use (for example).

Thanks.
0
Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

 
paulsidebottomAuthor Commented:
Sorry for the repeated posts but looking at the code it doesnt seem to offer the solution I'm after.

The reason I was only retreiving a block of bytes at a time was to make searching a large file as fast as possible, the code you have read appears to function in a similar fashion to the StreamReaders readtoend method, ie. you are reading the entire file first. This will not work very well when dealing with files of circa 100mb.
0
 
PigtorCommented:
Yes, is NOT recommended to read the whole file when the size is 100 Mb.

In this case, you should use an OleDbConnection to the file.

'*** Change the file property to your file location
Dim file As String = "..."
Dim conn_string As String = String.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0}; Extended Properties=""Text; HDR=No;FMT=Delimited""", System.IO.Path.GetDirectoryName(file))

Dim conn As OleDb.OleDbConnection
Try
     conn = New OleDb.OleDbConnection(conn_string)
     conn.Open()
Catch ex As Exception
     System.Windows.Forms.MessageBox.Show(ex.ToString, ex.Source, MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try

Dim statement As String = String.Format("Select * From {0}", System.IO.Path.GetFileName(file))
Dim DA As New OleDb.OleDbDataAdapter(statement, conn)
Dim DT As DataTable = New DataTable
DA.Fill(DT)

'*** At this point, the DataTable contains all the data, continue reading each row to find your ID

'*** The connection string contains the expression HDR=No
'*** When HDR=Yes, the first line of the file contains the column headers, otherwise, the file doesn't contain column headers


I hope that helps.

0
 
paulsidebottomAuthor Commented:
Apologies, I probably should have made my question slightly clearer.

The only problem with the solution you have proposed is that ideally I was planning on the code being ported to the .NET compact framework which unfortunately doesnt seem to have oledb support.  I have obtained good results using the buffering previously described (I can read a file of 15-20mb) in 80ms or so.

thanks
0
 
paulsidebottomAuthor Commented:
Upped to the point value to stir interest...
0
 
paulsidebottomAuthor Commented:
Upped the points again - no takers?
0
 
PigtorCommented:
Try using the same example that I posted in my first comment, but instead of reading the complete file, read blocks of 1024 bytes.
Copy and paste the following code, and set a breakpoint at the line "data = line.Split(","c)", and check the result.
--------------------------------------------------------------------------------------------------------------------------------

Public Sub ReadMyFile(byval filename as string)
Dim fs As New System.IO.FileStream(fileName, IO.FileMode.Open)
Dim Index As Integer, NextIndex As Integer, MaxLength As Integer, i As Integer
Dim b(1023) As Byte
Dim Buffer As New System.Text.StringBuilder(1024)
Dim lines() As String, line As String
Dim data() As String
Dim LF As Byte = 10
For Index = 0 To fs.Length - 1
      'Read a block of 1024 bytes
      MaxLength = b.Length
      'If the remaining part of the file is less than 1024, change to the correct size
      If MaxLength > (fs.Length - Index) Then MaxLength = (fs.Length - Index)
      'Read that block
      fs.Read(b, 0, MaxLength)
      'Find the last end of line found (CRLF = CHR(13) + CHR(10))
      NextIndex = b.LastIndexOf(b, LF)

      Buffer.Remove(0, Buffer.Length)
      Buffer.Append(System.Text.Encoding.Default.GetString(b))
      lines = Buffer.ToString.Split(ControlChars.Cr)
      For i = 0 To lines.Length - 1
            line = lines(i).TrimStart(ControlChars.Lf)
            data = line.Split(","c)
            ' * * data array contains all the columns, you should check the information to be correct.
            ' * * option 1.  check the id at the specified column index.
            ' * * option 2.  add a new row to the datatable to insert the information read.
      Next


      'Move the cursor to that point
      If NextIndex = -1 Then Exit For
      Index += (NextIndex + 1)
      fs.Seek(Index, IO.SeekOrigin.Begin)
Next
fs.Close()
End Sub

--------------------------------------------------------------------------------------------------------------------------------

About StringBuilder class:
Using a StringBuilder variable is better than using strings, check the msdn documentation:

A String is called immutable because its value cannot be modified once it has been created. Methods that appear to modify a String actually return a new String containing the modification.   (using a 100 Mb file, more memory is used when working with 100,000 strings of 1024 bytes)

The StringBuilder class represents a string-like object whose value is a mutable sequence of characters. The value is said to be mutable because it can be modified once it has been created by appending, removing, replacing, or inserting characters. (the memory used for the 100,000 strings of 1024 bytes, is 1024 bytes at all, because you are overwriting the existing bytes)
0
 
paulsidebottomAuthor Commented:
Apologies, after vacation, I forgot I posted this! - yes I'm happy with the resolution. Thanks for your answer.
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

  • 7
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now