Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 726
  • Last Modified:

Multi-threaded file / directory search

I want to design / develop a multi-threaded file search process that will search a drived for files that match a name token.  The search should begin in folder c:\xxxxxx\.... and search all sub-folders.  However, I want to be able to start a thread to search at each sub folder that is one child below the begining folder.  So:
C:\xxxxx\a\...
C:\xxxxx\b\...
C:\xxxxx\c\...
C:\xxxxx\d\.....
Searching c:\xxxx above will start 4 threads, one for a, b, c, d.
Then the results from each thread should be aggregated for presentation.
My problem is that I've tried this in the past and I believe that I had a problem with the file system, using multi threaded access.  In other words, can the file system - be searched in a multi-threaded manner - does it support this type of search.  I seem to remember finding that it did not... Since the file system is a hardware device; can this type of search - if it can be done - be expected to be slower than a single thread?  I'm thinking of the drive being accessed by multiple thread search different location and the seek time being increased... Any way, I need some way to speed up file searching in a large cdn environment - where it is possible for users to have thousands/millions of files.
Any ideas?
0
jparlato
Asked:
jparlato
2 Solutions
 
wdosanjosCommented:
You can use a Parallel.ForEach to handle the threading.  The file system does support multi threaded access.  Here is some sample code:
var folder = @"C:\temp\";
var subfolders = Directory.GetDirectories(folder);
var allfiles = new List<string>();

Parallel.ForEach(subfolders, subfolder => 
    {
        var files = Directory.GetFiles(subfolder);
        
        lock(allfiles)
        {
            allfiles.AddRange(files);
        }
    }
);

foreach (var file in allfiles)
{
    Console.WriteLine(file);
}

Open in new window


I hope this helps.
0
 
HooKooDooKuCommented:
The general rule is that making an application multi-threaded doesn't make it run faster UNLESS you've got the hardware to support multiple threads.

In other words, back in the days of single core CPU's, making an application multi-threaded did NOT make it run faster, because only one of the threads could ever run at one time.  But what it did do was to make the application more responsive, because it allowed the application to interact with the user while some background task was executing.

In today's world of multi-core CPU's, from what I understand, multiple threads DO run at the same time now.  But if you only have one physical hard-drive, the two threads can't access the drive at the same time.

Now where multi-threading CAN speed things up for even access to a single disk is if you have a bunch of things that have to be done between accessing the drive.  In that situation, you hopefully get one thread accessing the disk while the other thread is processing the data.

The thing I would think might make the application run the fastest might be to set up a situation where only one thread accesses the disk while another thread does any processing.  Set yourself of some "hand-shaking" between the two threads and have the 1st one search for files, and if files need to be read, have the 1st thread load a file into a memory buffer.  Then while the secondary thread(s) process the file(s) from the memory buffer, the 1st thread can do additional file searching and file reading.
0
 
käµfm³d 👽Commented:
Definitely agree with HooKooDooKu on this:  the disk drive is the bottleneck. If you were searching multiple drives, then maybe you could get some benefit from threading.
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
wdosanjosCommented:
I think the comments around limitations on multi-threaded access to disk devices does not take into account high-end RAID-5 and RAID-10 devices that do support concurrent access.

I would avoid early optimization of the code.  Try to make it work multi-threaded, and only if the performance is not the expected add thread synchronization around the directory/file operations as recommended by HooKooDooKu.  It's possible that due to buffering and your hardware configuration multi-threaded directory/file services will work just fine.
0
 
käµfm³d 👽Commented:
I think the comments around limitations on multi-threaded access to disk devices does not take into account high-end RAID-5 and RAID-10 devices that do support concurrent access.
...and you know the author is using a "high-end RAID-5 [or] RAID-10" device how?
0
 
käµfm³d 👽Commented:
Besides...

From http://en.wikipedia.org/wiki/RAID :
RAID 5 requires at least three disks.
RAID 1+0: (a.k.a. RAID 10) mirrored sets in a striped set (minimum four drives; even number of drives)...

Both of which would corroborate my claim of:

If you were searching multiple drives, then maybe you could get some benefit from threading.

= )
0
 
wdosanjosCommented:
Hi @kaufmed,

I was talking in generic terms as the first comments give the impression that all disk devices cannot support simultaneous access, which is not the case.

I think the author should try without synchronization between the threads, and there is a chance it will work with good performance.  And synchronizing the directory/file operations may actually slow down the process depending on the particular hw configuration.  If possible the author should try both to determine which one works better for his/her case.
0
 
jparlatoAuthor Commented:
Both answers were very helpful... as well as the comments from others.  I got exactly what I needed and am testing the code now.  I will post back as soon as I know if I was able to improve performance.  Thanks to everyone that contributed.
0

Featured Post

Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now