asked on

StreamReader slow on large files

I'm using a streamreader to parse through RADIUS logs and extract info for billing. The log's stop events come in various forms and I've managed to work out the logic to grab what I need, no problem. When I run it on a small file (1 Mb) it runs very fast, processing over 25 events per second. However when I run it on a months worth of data (30 Mb), it is extremely slow, processing 3 events per second. This leads me to believe it is not a bottleneck in my logic to process the files, but rather a problem with StreamReader on large files.

I don't think it can be threaded because I need to move line by line, grabbing what I need. Would it be better to use FileStream, which I've researched and supposedly runs slower than streamreader. Or is there any way to consistently break up a large file into smaller pieces for processing. Or am I completely wrong and there is still a better way to do this.

Thanks.

(Coming soon, www.ITBrainPower.com, earn computer gear for answering questions)

DaniPro

Is very difficult to help you without more details but, if you have enougth memory, you can try to use the BufferedStrem class for reduce the access to OS.

ex.:
string fileName = @"c:\miofile.txt";
System.IO.FileInfo fi = new System.IO.FileInfo(fileName);
System.IO.FileStream fs = new System.IO.FileStream(fileName, System.IO.FileMode.Open);
System.IO.BufferedStream bs = new System.IO.BufferedStream(fs);
Byte[] bytes = new byte[fi.Length];
bs.Read(bytes, 0, (int)fi.Length);
string s1 = System.Text.Encoding.ASCII.GetString(bytes);

_TAD_

ok... First, StreamReader Reads a stream. FileStream is a stream. Typically you use streamreader to read from a filestream or a memorystream or even a binarystream.

reading line by line is always going to be very slow. I would suggest reading in 5K to 20K into a byte array (a buffer) and then parsing the buffer to find where your line ends.

With performance issues like this it would be very helpfull if you could post the code in question, we may find something wrong that is completely unrelated to what you are focusing on.

coopey247

ASKER

The handling of the streamreader is very simple.

string fileName = "C:\\AVOC\\032904detail";
StreamReader sr = File.OpenText(fileName);

while ((line=sr.ReadLine())!=null)
{
//this determines where a unique event starts
if ((line.StartsWith("Sun")) || line.StartsWith("Mon") || (line.StatsWith...... )
{
//A bunch of processing, most of which looks like this, basically moving line by line
//looking for the value I want. I wanted to use regex for this, but the stop events
//all differ in the # of lines and values of these lines

while (statusLine.IndexOf("Acct-Status-Type")==-1)
{                                                statusLine = sr.ReadLine();
      }

//I then get all of the values and pass them to an SQL stored procedure
sqlDataAdapter1.InsertCommand.Parameters.Add("@UserID", SqlDbType.VarChar, 50).Value = user;
try
{
this.sqlDataAdapter1.InsertCommand.ExecuteNonQuery();
}
catch (Exception er)
} //Close the If
} //Close the while
sr.close;

Here is an example of an event in the file I'm reading

Wed Jan 28 09:19:26 2004
      User-Name = "userid"
      NAS-Identifier = 1.1.1.1
      NAS-Port = 10205
      NAS-Port-Type = Sync
      Acct-Status-Type = Stop
      Acct-Delay-Time = 0
      Acct-Session-Id = "344998566"
      Acct-Authentic = RADIUS
      Acct-Session-Time = 244
      Acct-Input-Octets = 3988
      Acct-Output-Octets = 27952
      Acct-Input-Packets = 197
      Acct-Output-Packets = 211
      Ascend-Disconnect-Cause = 45
      Ascend-Connect-Progress = 60
      Ascend-Xmit-Rate = 64000
      Ascend-Data-Rate = 64000
      Ascend-PreSession-Time = 8
      Ascend-Pre-Input-Octets = 252
      Ascend-Pre-Output-Octets = 227
      Ascend-Pre-Input-Packets = 11
      Ascend-Pre-Output-Packets = 11
      Ascend-First-Dest = 1.1.1.1
      Ascend-Multilink-ID = 572
      Ascend-Num-In-Multilink = 0
      Caller-Id = "5555555555"
      Client-Port-DNIS = "68414"
      Framed-Protocol = MPP
      Framed-Address = 1.1.1.1

Hope this helps

AdrianJMartin

its a bit tricky to tell but.... you have this line inside a loop

sqlDataAdapter1.InsertCommand.Parameters.Add("@UserID", SqlDbType.VarChar, 50).Value = user;

where you are adding a parameter every time you find a user?? Instead of setting the value of the parameter. You may have clipped the code that sorts all this....

coopey247

ASKER

Okay, well I moved the definition of the parameters outside of the loop and now just set the value inside the loop. The problem definately lies within the SQL stored procedure. When I take the SQL call out it runs great. But with the SQL insert command runs fast at first, then when it gets to about 2000 records it just crashes.

I tried threading the SQL insert command and it runs good for a little bit but then starts going crazy with all kinds of errors, again around 2000 records. I'm 99% sure that I'm not taxing the server. I'm not sure where to go from here.......

ptmcomp

You should use an SqlConnection only for one thread. You may get a problem when you open a transaction and insert many records without commiting them since the rollback segment will overflow.

_TAD_

also one thing to considder is that things like cursors are very slow. If at all possible you may want to try and batch your processes together. Perhaps load a dataset in memory and then transfer the whole dataset over to the sql server

TheAvenger

For your original question: this code will make it faster and you can still keep all your functionality. The difference for large files is enormous:

StreamReader sr = new StreamReader (new BufferedStream (new FileStream (fileName, FileMode.Open)));

For the SQL Server problem: can you post the stored procedure code so that we can review it?

coopey247

ASKER

Stored Procedure:
I had to do the IF NOT EXISTS because the detail files get merged from 4 servers and there are duplicate records.

CREATE PROCEDURE [ASICUSER].[AVOC_UpdateDetails]
(@UserID varchar(50), @Weekday varchar(50), @Month varchar(50),
@MonthNum int, @Day int, @Year int, @Time varchar(50),
@Type varchar(50), @SessionTime varchar(50),
@SessionID bigint, @NASPort bigint, @CallerID bigint) AS

IF NOT EXISTS (select UserID, Weekday, _Month, MonthNum, _Day,
_Year, _Time, Type, SessionTime, SessionID, NASPort, CallerID
from asic.asicuser.tbl_AVOC_Details
where Weekday = @Weekday and
_Month = @Month and MonthNum = @MonthNum and _Day = @Day and
_Year = @Year and _Time = @Time and
Type = @Type and SessionTime = @SessionTime and
SessionID = @SessionID and NASPort = @NASPort
and CallerID = @CallerID)

Insert Into asic.asicuser.tbl_AVOC_Details
(UserID, Weekday, _Month, MonthNum, _Day, _Year, _Time, Type, SessionTime,
SessionID, NASPort, CallerID)
Values (@UserID, @Weekday, @Month, @MonthNum, @Day, @Year, @Time, @Type,
@SessionTime, @SessionID, @NASPort, @CallerID)
GO

TheAvenger

OK, this seams fine. Do you use transactions?

coopey247

ASKER

Nope, thats it. I think populating the dataset and then merging like TAD said might be the way to go, any arguements against that Avenger?

SOLUTION

TheAvenger

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

AdrianJMartin

Can you define the IF NOT EXIST with less checks ie just check Session_id & maybe time(preferably indexed columns), or drop the if not and just insert the info and let the a unique index handle the duplication?

ptmcomp

Merging datasets is quite slow and gets extremly slow with lots of data.

TheAvenger

I think coopey247 meant update the database by "merge"

coopey247

ASKER

I did mean update, however populating a dataset still slows down the process significantly. I think my next (and maybe last) option may be to write an XML file on the fly and then dump that file up to the SQL box when it finishes.

Thanks for all the help folks, very much appreciated.

ASKER CERTIFIED SOLUTION

AdrianJMartin

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ptmcomp

Why are you checking all the fields in IF NOT EXISTS? If you have a key you only need to check the key.

coopey247

ASKER

Yup, its around 40K inserts a month. The reason I have to do an IF NOT EXISTS is because these files are pulled from several RADIUS servers, some of which I believe are mirrored because the merged file I get has at least two identical entries for every Event (I.E. Start and Stop events). The key is an identity column that I have in there for later processing.

Thanks.

ptmcomp

May be you could speed it up by passing the file to the stored procedure and do all the processing in the stored procedure. However it will be hard to debug and monitor.

coopey247

ASKER

Keen eye on the Stored Procedure, I moved the table structure around a little and did away with the identity column and made a Primary key out of two columns. The IF NOT EXISTS still works, but runs much faster since I'm only scanning two columns instead of 10. And its now fast enough to be acceptable.

You guys (maybe gals) are great, thanks for the help.

now I've got to figure out how to divy the points.........

ptmcomp

Now the lookup is made in an index (similar to Hashtable) instead of a full table scan.