asked on

File Reading

Let me preface my question by stating up-front that I could be going about this in entirely the wrong way..

I am trying to read, in chunks, a very-large file. Large enough, anyway, that I would rather open it, read a chunk of data, and close it until I need access to it again.

I am trying to accomplish this like so:
private int CurrentByte = 0;
private int LastByte = 0;

public void GetNextaFilePage()
{
string input = null;
FileStream aFile =
new FileStream("C:\\projects\\test.txt",
FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader Line = new StreamReader(aFile);

aFilePage.Text = "";

aFile.Seek(CurrentByte, SeekOrigin.Begin);
input = Line.ReadLine() + "\r";

while(CurrentLine < 25)
{
input += Line.ReadLine() + "\r";
CurrentLine += 1;
}

LastByte = CurrentByte;
CurrentByte = aFile.Position;

// begin test code
// aFile.Seek(CurrentByte, SeekOrigin.Begin);
// input += "\r" + Line.ReadLine();
// end test code

aFilePage.Text = input;
input = null;
}
aFile.Close();
}

If I leave in the two lines I have commented out, I get exactly what I expect: a carriage return followed by the next line of the file.

However, if I close the file and then try to re-open it and Seek to the CurrentByte value from the Beginning of the file, I get another line altogether (several lines further down in the file).

Any ideas?

ASKER CERTIFIED SOLUTION

_TAD_

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

_TAD_

Here's a quick snippet I have of using a buffer to read a file.

public void someFunction()
{

string myStr;

Stream rdStream = new FileStream(@"C:\temp\test.txt", FileMode.Open);

BufferedStream rdBuff = new BufferedStream(rdStream);

StreamReader rdRead = new StreamReader(rdBuff, Encoding.ASCII, false, 1000);

myStr = rdRead.ReadLine();

Console.WriteLine(myStr);

rdRead.Close();
rdBuff.Close();
rdStream.Close();

myStr = "Adding this Line to the end of the File";

Stream wtStream = new FileStream(@"C:\temp\test.txt", FileMode.Append);

BufferedStream wtBuff = new BufferedStream(wtStream);

StreamWriter wtWrite = new StreamWriter(wtBuff, Encoding.ASCII, 1000);

wtWrite.AutoFlush = true;

wtWrite.WriteLine("\n");
wtWrite.WriteLine(myStr);

wtWrite.Close();
wtBuff.Close();
wtStream.Close();
}

Oh yes, and one more thing... if the file is very large and you know you are completely done with the filestreams you may want to explicitly call the garbage collector.

GC.Collect();

This is because the garbage collector only comes when it deems necessary, but your file stream and buffer stream objects can be very, very large and take up a lot of memory. You will take a performance hit when you call the GC, but it may be worth it if you can reclaim all that memory that is being used up by out of scope objects.

_TAD_

One last thing (for now) if you slap an @ symbol in front of your text source ("C:\\projects\\test.txt") you don't need to use the escape slash '\'

"C:\\projects\\test.txt"
is the same as
@"C:\projects\test.txt"

not a real big difference if your filename is hardcoded in your project. But if you decide to use the windows file navigator component you can load your destination file pathway into a string variable and then slap an @ symbol in front of the variable when you use it in a function.

//assume myFiles.Pathway = "C:\temp\myText.txt"
myString = myFiles.Pathway;

Stream rdStream = new FileStream(@myStr, FileMode.Open);

//or to make it even easier
Stream rdStream = new FileStream(@myFiles.Pathway, FileMode.Open);

Personally I prefer to do the first option (loading a string, and then pass the string around). I think the code is a little easier to follow.

ignatiusst

ASKER

Thanks for the advice!

Your method looks like it has huge advantages over what I was trying.. I've just started really looking at the whole concept of file I/O on a non-trivial scale.. I didn't imagined it would be much different than from the books/tutorials I've read elsewhere - I guess I was wrong!

_TAD_

I hate posting large blocks of code because it really clutters things up... but I think this might help.

Here is my read class (you may want to change the namespace):

using System;
using System.IO;
using System.Text;

namespace JSI.TAD.ReadingFiles
{
/// <summary>
/// Summary description for fileRead.
/// </summary>
public class fileRead
{
#region -- Variables --
private Stream readStream;
private BufferedStream buffStream;

private string fileText;
private int buffSize; // must be an integer, cannot be a long
private long dataToRead;

private bool isReady;
private bool isRead;

private byte[] myBuffer;
private int bytesRead;

#endregion

#region -- Constructors --
public fileRead(string filePathway)
{
//variable defaults default
buffSize = 4096; //4K bytes

//defaults that should not be changed
fileText = null;
isReady = true;
isRead = true;
myBuffer = new byte[buffSize];
bytesRead = 1; // deafult, if 0, then is EOF

readStream = new FileStream(@filePathway, FileMode.Open);
buffStream = new BufferedStream(readStream, buffSize);

dataToRead = readStream.Length;
}

~fileRead()
{
readStream.Close();
buffStream.Close();
}

#endregion

#region -- Properties --
public string FileText
{
get
{
string tempText;
// set tempText = fileText
tempText = fileText;
//clear File Text
fileText = null;
// allow for more data
isReady = true;
// has the data been read already?
isRead = true;
//return what was in fileText
return tempText;
}
}

//number of bytes read
public int BytesRead
{
get{return bytesRead;}
}

// size of buffer
public int BufferSize
{
get {return buffSize;}
set {buffSize = value;}
}

#endregion

#region -- Methods --
public void LoadData()
{
if(isReady)
{
//resets the buffer size for efficiency
if (buffSize>dataToRead)
buffSize = (int)dataToRead;
//Read some data
bytesRead = buffStream.Read(myBuffer,0,buffSize);
//sets data left to read
dataToRead -= buffSize;
//sets flags for multithreading
//is ready for loading
isReady = false;
//has been read
isRead = false;
}
}

#endregion

#region -- Functions --
public string ReadData()
{
// if data has not been read yet
if(!isRead)
// get data from buffer
for(int x=0;x<buffSize;x++)
// convert to ascii characters
fileText += Convert.ToChar(myBuffer[x]);
// return value
return FileText;
}

#endregion
}
}

_TAD_

I then have a class that creates a 'reading' variable by calling this class.

//this is a class level variable
readFiles myReader = new readFiles(@"C:\temp\test.txt");

I then have a thread whose sole purpose is to go into while loop and runs the LoadData() method

private void Loading()
{
while(myReader.BytesRead>0)
{
myReader.LoadData();
Thread.Sleep(1);
}
}

I have a second thread that does the exact same thing, only it reads the data.

private void Loading()
{
while(myReader.BytesRead>0)
{
Console.Write(myReader.ReadData());
Thread.Sleep(1);
}
}

I put in the Thread.Sleep(1) so that each thread pauses for 1 ms. This is fast enough so the user doesn't notice a slow down, but long enough so that the two threads swap control of the CPU back and forth.

_TAD_

The only real downside to multi-threading is that the total overall time is longer. What I mean is, if I were to do a single thread straight read and dump of data I could do a 100 MB file in 30 to 45 seconds. Of course the program is locked for that time and the entire CPU and program memory space is devoted to that process.

On the flip side, if I multi-thread that same 100 MB file might take me up to 3 minutes to load in its entirety. However, portions of that file are available on demand and I can go about my job while the rest of the file loads in the background.

So obviously there is some give and take. Another factor in the loading process is the size of your buffer. A 1K buffer size makes data access instantaneous, 5 K there is a half second delay, 10 K and there is a noticeble delay (but not bad). Depending on how much data is needed in order to start working it may be more benificial to have a larger buffer size.

Just remember, the larger the buffer size the faster the total file loads, but the longer it takes before the user sees any results at all.

Oh, and the largest buffer size can only be 2,147,483,647 bytes in size (roughly 2 GB), but then again why would you want a buffer that size?

Pretty much any buffer larger than 65K isn't going to improve performance anyway, because then you start getting into virtual memory and paging and... well a whole lot of things.

One last thing. If you are going to do some performance testing and timing to determine best buffer size, etc. Make sure run compile and run as a release not in debug. debug adds a whole lot of overhead you won't always notice, but when timing large file reads I notice a huge difference between the debug and the release versions.

Ok, I'm done now. If you have any other questions feel free to ask.