Fastest way to read files

Posted on 2008-01-31
Medium Priority
Last Modified: 2012-05-05

I have to read a lot of files  (like 500 or more, they are all text files; each file is little like 8 or 10 KB) in a directory.
Foreach file I have to open it, reading chars from 15 to 19, if chars are =="F24A0", i need to compute the has of the file and store it in a database. If !="F24A0" file will be skipped. What is the fastest way to do it? Any suggestion? I need a very fast way, cause i need to read a lot of files...
I made this: (pseudo code)
foreach (File file in Dir.GetFiles())
using (System.IO.FileStream fs = file.OpenRead()))
                StreamReader sr = new StreamReader(fs);
                char[] buff = new char[20];
                fs.Position = 15;
                sr.ReadBlock(buff, 0, 5);
                fs.Position = 0;
                string t = new String(buff);
                if (t=="F24A0)
                System.Security.Cryptography.MD5 sscMD5 = System.Security.Cryptography.MD5.Create();
                byte[] mHash = sscMD5.ComputeHash(fs);
                retValue = Convert.ToBase64String(mHash); // diventa stringa da 24 caratteri!!!
Question by:puckkk
  • 2
LVL 11

Expert Comment

ID: 20787751
I doubt you can improve much on the way you're doing it, without writing it in vc++, which could improve performance. But I suspect your bottleneck will be much more on the file I/O side, and less on the code execution speed.
LVL 22

Accepted Solution

JimBrandley earned 2000 total points
ID: 20789538
It will be somewhat faster if you move these lines outside the loop.
System.Security.Cryptography.MD5 sscMD5 = System.Security.Cryptography.MD5.Create();
char[] buff = new char[20];
byte[] mHash = null; - Initialize to null, then set from the hash.

You will generate less garbage to be collected that way.

Also, it's critical to issue
as soon as you are finished with the StreamReader.

Next, buf is awfully small. I would experiment with a block size of 8K. Or, you can just declare a StringBuilder of 10 - 12K initial size, and use sr.ReadToEnd() to fill it. I would set up two or three different algorithms and code them. Then use System.Diagnostics.Stopwatch to time them.

When timing, vary the order in which you time them, and run as many iterations as you have patience for, then compute averages for each algorithm. Garbage collection and other threads can generate spurious results in the times.


LVL 22

Expert Comment

ID: 20789703
One more thought: The expected ratio of those you need to hash versus those you do not may help determine the best way to attack the problem. Reading the entire file into memory then checking the 5 bytes will be, I think, much faster for those you hash, and a bit slower for those you do not. It will depend partly on how the OS decides to cache buffers for the files you are reading. You want to avoid waiting for the disk to make another revolution before loading the next block into a buffer as much as possible.


Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction Although it is an old technology, serial ports are still being used by many hardware manufacturers. If you develop applications in C#, Microsoft .NET framework has SerialPort class to communicate with the serial ports.  I needed to…
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
There may be issues when you are trying to access Outlook or send & receive emails or due to Outlook crash which leads to corrupt or damaged PST file. To eliminate the corruption from your PST file, you need to repair the corrupt Outlook PST file. U…
In this video I will demonstrate how to set up Nine, which I now consider the best alternative email app to Touchdown.

601 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question