Solved

Multi-threaded file / directory search

Posted on 2012-04-03
8
599 Views
Last Modified: 2012-04-04
I want to design / develop a multi-threaded file search process that will search a drived for files that match a name token.  The search should begin in folder c:\xxxxxx\.... and search all sub-folders.  However, I want to be able to start a thread to search at each sub folder that is one child below the begining folder.  So:
C:\xxxxx\a\...
C:\xxxxx\b\...
C:\xxxxx\c\...
C:\xxxxx\d\.....
Searching c:\xxxx above will start 4 threads, one for a, b, c, d.
Then the results from each thread should be aggregated for presentation.
My problem is that I've tried this in the past and I believe that I had a problem with the file system, using multi threaded access.  In other words, can the file system - be searched in a multi-threaded manner - does it support this type of search.  I seem to remember finding that it did not... Since the file system is a hardware device; can this type of search - if it can be done - be expected to be slower than a single thread?  I'm thinking of the drive being accessed by multiple thread search different location and the seek time being increased... Any way, I need some way to speed up file searching in a large cdn environment - where it is possible for users to have thousands/millions of files.
Any ideas?
0
Comment
Question by:jparlato
8 Comments
 
LVL 23

Accepted Solution

by:
wdosanjos earned 200 total points
ID: 37802481
You can use a Parallel.ForEach to handle the threading.  The file system does support multi threaded access.  Here is some sample code:
var folder = @"C:\temp\";
var subfolders = Directory.GetDirectories(folder);
var allfiles = new List<string>();

Parallel.ForEach(subfolders, subfolder => 
    {
        var files = Directory.GetFiles(subfolder);
        
        lock(allfiles)
        {
            allfiles.AddRange(files);
        }
    }
);

foreach (var file in allfiles)
{
    Console.WriteLine(file);
}

Open in new window


I hope this helps.
0
 
LVL 16

Assisted Solution

by:HooKooDooKu
HooKooDooKu earned 200 total points
ID: 37802668
The general rule is that making an application multi-threaded doesn't make it run faster UNLESS you've got the hardware to support multiple threads.

In other words, back in the days of single core CPU's, making an application multi-threaded did NOT make it run faster, because only one of the threads could ever run at one time.  But what it did do was to make the application more responsive, because it allowed the application to interact with the user while some background task was executing.

In today's world of multi-core CPU's, from what I understand, multiple threads DO run at the same time now.  But if you only have one physical hard-drive, the two threads can't access the drive at the same time.

Now where multi-threading CAN speed things up for even access to a single disk is if you have a bunch of things that have to be done between accessing the drive.  In that situation, you hopefully get one thread accessing the disk while the other thread is processing the data.

The thing I would think might make the application run the fastest might be to set up a situation where only one thread accesses the disk while another thread does any processing.  Set yourself of some "hand-shaking" between the two threads and have the 1st one search for files, and if files need to be read, have the 1st thread load a file into a memory buffer.  Then while the secondary thread(s) process the file(s) from the memory buffer, the 1st thread can do additional file searching and file reading.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37804116
Definitely agree with HooKooDooKu on this:  the disk drive is the bottleneck. If you were searching multiple drives, then maybe you could get some benefit from threading.
0
 
LVL 23

Expert Comment

by:wdosanjos
ID: 37804364
I think the comments around limitations on multi-threaded access to disk devices does not take into account high-end RAID-5 and RAID-10 devices that do support concurrent access.

I would avoid early optimization of the code.  Try to make it work multi-threaded, and only if the performance is not the expected add thread synchronization around the directory/file operations as recommended by HooKooDooKu.  It's possible that due to buffering and your hardware configuration multi-threaded directory/file services will work just fine.
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37804380
I think the comments around limitations on multi-threaded access to disk devices does not take into account high-end RAID-5 and RAID-10 devices that do support concurrent access.
...and you know the author is using a "high-end RAID-5 [or] RAID-10" device how?
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 37804402
Besides...

From http://en.wikipedia.org/wiki/RAID :
RAID 5 requires at least three disks.
RAID 1+0: (a.k.a. RAID 10) mirrored sets in a striped set (minimum four drives; even number of drives)...

Both of which would corroborate my claim of:

If you were searching multiple drives, then maybe you could get some benefit from threading.

= )
0
 
LVL 23

Expert Comment

by:wdosanjos
ID: 37804429
Hi @kaufmed,

I was talking in generic terms as the first comments give the impression that all disk devices cannot support simultaneous access, which is not the case.

I think the author should try without synchronization between the threads, and there is a chance it will work with good performance.  And synchronizing the directory/file operations may actually slow down the process depending on the particular hw configuration.  If possible the author should try both to determine which one works better for his/her case.
0
 

Author Closing Comment

by:jparlato
ID: 37805435
Both answers were very helpful... as well as the comments from others.  I got exactly what I needed and am testing the code now.  I will post back as soon as I know if I was able to improve performance.  Thanks to everyone that contributed.
0

Featured Post

Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

this article is a guided solution for most of the common server issues in server hardware tasks we are facing in our routine job works. the topics in the following article covered are, 1) dell hardware raidlevel (Perc) 2) adding HDD 3) how t…
Real-time is more about the business, not the technology. In day-to-day life, to make real-time decisions like buying or investing, business needs the latest information(e.g. Gold Rate/Stock Rate). Unlike traditional days, you need not wait for a fe…
Internet Business Fax to Email Made Easy - With  eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, f…
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

912 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now