Solved

StreamReader slow on large files

Posted on 2004-04-20
22
2,973 Views
Last Modified: 2012-08-13
I'm using a streamreader to parse through RADIUS logs and extract info for billing.  The log's stop events come in various forms and I've managed to work out the logic to grab what I need, no problem.  When I run it on a small file (1 Mb) it runs very fast, processing over 25 events per second.  However when I run it on a months worth of data (30 Mb), it is extremely slow, processing 3 events per second.  This leads me to believe it is not a bottleneck in my logic to process the files, but rather a problem with StreamReader on large files.  

I don't think it can be threaded because I need to move line by line, grabbing what I need.  Would it be better to use FileStream, which I've researched and supposedly runs slower than streamreader.  Or is there any way to consistently break up a large file into smaller pieces for processing.  Or am I completely wrong and there is still a better way to do this.

Thanks.

(Coming soon, www.ITBrainPower.com, earn computer gear for answering questions)
0
Comment
Question by:coopey247
  • 7
  • 5
  • 4
  • +3
22 Comments
 
LVL 6

Expert Comment

by:DaniPro
ID: 10868793
Is very difficult to help you without more details but, if you have enougth memory, you can try to use the BufferedStrem class for reduce the access to OS.

ex.:
string fileName = @"c:\miofile.txt";
System.IO.FileInfo fi = new System.IO.FileInfo(fileName);
System.IO.FileStream fs = new System.IO.FileStream(fileName, System.IO.FileMode.Open);
System.IO.BufferedStream bs = new System.IO.BufferedStream(fs);
Byte[] bytes = new byte[fi.Length];
bs.Read(bytes, 0, (int)fi.Length);
string s1 = System.Text.Encoding.ASCII.GetString(bytes);
0
 
LVL 22

Expert Comment

by:_TAD_
ID: 10868824


ok... First, StreamReader Reads a stream.  FileStream is a stream.  Typically you use streamreader to read from a filestream or a memorystream or even a binarystream.

reading line by line is always going to be very slow.  I would suggest reading in 5K to 20K into a byte array (a buffer) and then parsing the buffer to find where your line ends.


With performance issues like this it would be very helpfull if you could post the code in question, we may find something wrong that is completely unrelated to what you are focusing on.


0
 
LVL 2

Author Comment

by:coopey247
ID: 10869087
The handling of the streamreader is very simple.


string fileName = "C:\\AVOC\\032904detail";
StreamReader sr =  File.OpenText(fileName);

while ((line=sr.ReadLine())!=null)
    {
     //this determines where a unique event starts
     if ((line.StartsWith("Sun")) || line.StartsWith("Mon")  || (line.StatsWith...... )
       {
        //A bunch of processing, most of which looks like this, basically moving line by line
       //looking for the value I want.  I wanted to use regex for this, but the stop events
       //all differ in the # of lines and values of these lines

        while (statusLine.IndexOf("Acct-Status-Type")==-1)
               {                                                statusLine = sr.ReadLine();
      }

       //I then get all of the values and pass them to an SQL stored procedure
       sqlDataAdapter1.InsertCommand.Parameters.Add("@UserID", SqlDbType.VarChar, 50).Value = user;
      try
       {
       this.sqlDataAdapter1.InsertCommand.ExecuteNonQuery();
       }
     catch (Exception er)
   }  //Close the If
  }  //Close the while
sr.close;


Here is an example of an event in the file I'm reading

Wed Jan 28 09:19:26 2004
      User-Name = "userid"
      NAS-Identifier = 1.1.1.1
      NAS-Port = 10205
      NAS-Port-Type = Sync
      Acct-Status-Type = Stop
      Acct-Delay-Time = 0
      Acct-Session-Id = "344998566"
      Acct-Authentic = RADIUS
      Acct-Session-Time = 244
      Acct-Input-Octets = 3988
      Acct-Output-Octets = 27952
      Acct-Input-Packets = 197
      Acct-Output-Packets = 211
      Ascend-Disconnect-Cause = 45
      Ascend-Connect-Progress = 60
      Ascend-Xmit-Rate = 64000
      Ascend-Data-Rate = 64000
      Ascend-PreSession-Time = 8
      Ascend-Pre-Input-Octets = 252
      Ascend-Pre-Output-Octets = 227
      Ascend-Pre-Input-Packets = 11
      Ascend-Pre-Output-Packets = 11
      Ascend-First-Dest = 1.1.1.1
      Ascend-Multilink-ID = 572
      Ascend-Num-In-Multilink = 0
      Caller-Id = "5555555555"
      Client-Port-DNIS = "68414"
      Framed-Protocol = MPP
      Framed-Address = 1.1.1.1

Hope this helps
0
 
LVL 5

Expert Comment

by:AdrianJMartin
ID: 10869511
its a bit tricky to tell but.... you have this line inside a loop

 sqlDataAdapter1.InsertCommand.Parameters.Add("@UserID", SqlDbType.VarChar, 50).Value = user;

where you are adding a parameter every time you find a user?? Instead of setting the value of the parameter. You may have clipped the code that sorts all this....
0
 
LVL 2

Author Comment

by:coopey247
ID: 10871533
Okay, well I moved the definition of the parameters outside of the loop and now just set the value inside the loop.  The problem definately lies within the SQL stored procedure.  When I take the SQL call out it runs great.  But with the SQL insert command runs fast at first, then when it gets to about 2000 records it just crashes.

I tried threading the SQL insert command and it runs good for a little bit but then starts going crazy with all kinds of errors, again around 2000 records.  I'm 99% sure that I'm not taxing the server.  I'm not sure where to go from here.......
0
 
LVL 10

Expert Comment

by:ptmcomp
ID: 10873032
You should use an SqlConnection only for one thread. You may get a problem when you open a transaction and insert many records without commiting them since the rollback segment will overflow.
0
 
LVL 22

Expert Comment

by:_TAD_
ID: 10873849



also one thing to considder is that things like cursors are very slow.  If at all possible you may want to try and batch your processes together.  Perhaps load a dataset in memory and then transfer the whole dataset over to the sql server
0
 
LVL 20

Expert Comment

by:TheAvenger
ID: 10875773
For your original question: this code will make it faster and you can still keep all your functionality. The difference for large files is enormous:

StreamReader sr =  new StreamReader (new BufferedStream (new FileStream (fileName, FileMode.Open)));

For the SQL Server problem: can you post the stored procedure code so that we can review it?
0
 
LVL 2

Author Comment

by:coopey247
ID: 10877132
Stored Procedure:
I had to do the IF NOT EXISTS because the detail files get merged from 4 servers and there are duplicate records.


CREATE PROCEDURE [ASICUSER].[AVOC_UpdateDetails]
(@UserID varchar(50), @Weekday varchar(50), @Month varchar(50),
@MonthNum int, @Day int, @Year int, @Time varchar(50),
@Type varchar(50), @SessionTime varchar(50),
@SessionID bigint, @NASPort bigint, @CallerID bigint) AS

IF NOT EXISTS (select UserID, Weekday, _Month, MonthNum,  _Day,
_Year, _Time, Type, SessionTime, SessionID, NASPort, CallerID
from asic.asicuser.tbl_AVOC_Details
where Weekday = @Weekday and
_Month = @Month and MonthNum = @MonthNum and _Day = @Day and
_Year = @Year and _Time = @Time and
Type = @Type and SessionTime = @SessionTime and
SessionID = @SessionID and NASPort = @NASPort
and CallerID = @CallerID)

Insert Into asic.asicuser.tbl_AVOC_Details
(UserID, Weekday, _Month, MonthNum,  _Day, _Year, _Time, Type, SessionTime,
SessionID, NASPort, CallerID)
Values (@UserID, @Weekday, @Month, @MonthNum, @Day, @Year, @Time, @Type,
@SessionTime, @SessionID, @NASPort, @CallerID)
GO
0
 
LVL 20

Expert Comment

by:TheAvenger
ID: 10877140
OK, this seams fine. Do you use transactions?
0
 
LVL 2

Author Comment

by:coopey247
ID: 10877195
Nope, thats it.  I think populating the dataset and then merging like TAD said might be the way to go, any arguements against that Avenger?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 20

Assisted Solution

by:TheAvenger
TheAvenger earned 100 total points
ID: 10877211
You can give it a try. This will make your life easier (it's simpler for development) but actually the dataset makes the same thing - calls insert for every new record.
0
 
LVL 5

Expert Comment

by:AdrianJMartin
ID: 10877261
Can you define the IF NOT EXIST with less checks ie just check Session_id & maybe time(preferably indexed columns), or drop the if not and just insert the info and let the a unique index handle the duplication?
0
 
LVL 10

Expert Comment

by:ptmcomp
ID: 10877846
Merging datasets is quite slow and gets extremly slow with lots of data.
0
 
LVL 20

Expert Comment

by:TheAvenger
ID: 10877927
I think coopey247 meant update the database by "merge"
0
 
LVL 2

Author Comment

by:coopey247
ID: 10878655
I did mean update, however populating a dataset still slows down the process significantly.  I think my next (and maybe last) option may be to write an XML file on the fly and then dump that file up to the SQL box when it finishes.

Thanks for all the help folks, very much appreciated.
0
 
LVL 5

Accepted Solution

by:
AdrianJMartin earned 300 total points
ID: 10879006
How many inserts are you trying to do? I guessing from you data example, 30k - 40k inserts. which will take a while.....

The IF NOT EXISTS will cause a table scan every time the sp is called and each time take a little longer, each time. It all adds up...which is the behaviour you are seeing?

if the table is set up with a unique index( on a single col or multiple column) then the insert will fail, ignore it and carry on
0
 
LVL 10

Expert Comment

by:ptmcomp
ID: 10880117
Why are you checking all the fields in IF NOT EXISTS? If you have a key you only need to check the key.
0
 
LVL 2

Author Comment

by:coopey247
ID: 10881439
Yup, its around 40K inserts a month.  The reason I have to do an IF NOT EXISTS is because these files are pulled from several RADIUS servers, some of which I believe are mirrored because the merged file I get has at least two identical entries for every Event (I.E. Start and Stop events).  The key is an identity column that I have in there for later processing.

Thanks.
0
 
LVL 10

Expert Comment

by:ptmcomp
ID: 10881569
May be you could speed it up by passing the file to the stored procedure and do all the processing in the stored procedure. However it will be hard to debug and monitor.
0
 
LVL 2

Author Comment

by:coopey247
ID: 10881804
Keen eye on the Stored Procedure, I moved the table structure around a little and did away with the identity column and made a Primary key out of two columns.  The IF NOT EXISTS still works, but runs much faster since I'm only scanning two columns instead of 10.  And its now fast enough to be acceptable.

You guys (maybe gals) are great, thanks for the help.

now I've got to figure out how to divy the points.........
0
 
LVL 10

Expert Comment

by:ptmcomp
ID: 10882475
Now the lookup is made in an index (similar to Hashtable) instead of a full table scan.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

This article describes a simple method to resize a control at runtime.  It includes ready-to-use source code and a complete sample demonstration application.  We'll also talk about C# Extension Methods. Introduction In one of my applications…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now