Fastest way to read files


I have to read a lot of files  (like 500 or more, they are all text files; each file is little like 8 or 10 KB) in a directory.
Foreach file I have to open it, reading chars from 15 to 19, if chars are =="F24A0", i need to compute the has of the file and store it in a database. If !="F24A0" file will be skipped. What is the fastest way to do it? Any suggestion? I need a very fast way, cause i need to read a lot of files...
I made this: (pseudo code)
foreach (File file in Dir.GetFiles())
using (System.IO.FileStream fs = file.OpenRead()))
                StreamReader sr = new StreamReader(fs);
                char[] buff = new char[20];
                fs.Position = 15;
                sr.ReadBlock(buff, 0, 5);
                fs.Position = 0;
                string t = new String(buff);
                if (t=="F24A0)
                System.Security.Cryptography.MD5 sscMD5 = System.Security.Cryptography.MD5.Create();
                byte[] mHash = sscMD5.ComputeHash(fs);
                retValue = Convert.ToBase64String(mHash); // diventa stringa da 24 caratteri!!!
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I doubt you can improve much on the way you're doing it, without writing it in vc++, which could improve performance. But I suspect your bottleneck will be much more on the file I/O side, and less on the code execution speed.
It will be somewhat faster if you move these lines outside the loop.
System.Security.Cryptography.MD5 sscMD5 = System.Security.Cryptography.MD5.Create();
char[] buff = new char[20];
byte[] mHash = null; - Initialize to null, then set from the hash.

You will generate less garbage to be collected that way.

Also, it's critical to issue
as soon as you are finished with the StreamReader.

Next, buf is awfully small. I would experiment with a block size of 8K. Or, you can just declare a StringBuilder of 10 - 12K initial size, and use sr.ReadToEnd() to fill it. I would set up two or three different algorithms and code them. Then use System.Diagnostics.Stopwatch to time them.

When timing, vary the order in which you time them, and run as many iterations as you have patience for, then compute averages for each algorithm. Garbage collection and other threads can generate spurious results in the times.



Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
One more thought: The expected ratio of those you need to hash versus those you do not may help determine the best way to attack the problem. Reading the entire file into memory then checking the 5 bytes will be, I think, much faster for those you hash, and a bit slower for those you do not. It will depend partly on how the OS decides to cache buffers for the files you are reading. You want to avoid waiting for the disk to make another revolution before loading the next block into a buffer as much as possible.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.