[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

FAST file search by Modified Date range

Posted on 2014-08-06
4
Medium Priority
?
296 Views
Last Modified: 2014-08-07
One of our clients has a folder on a server, containing a lovely, branchy tree of sub-folders and a total of about 5 million documents dispersed through said folder tree.

So the scenario is that I know that the path to all these documents is something like \\TheServer\TheRootFolder

Now I want to retrieve all the documents in the whole tree that were modified in the last week.

I went into Windows Explorer and did a search filtering on "Date Modified: last week". This has now been running for half an hour and it has found 8.5 thousand documents that were modified in the last week ... .so far

still, though. We have a database that lists all the documents in that folder tree. I wrote a bit of code that took 10,000 entries out of that database, and then used System.IO.File.Exists and System.IO.File.GetLastWriteTime to check whether a) the file still existed and b) when it was modified. It took about 2 minutes for it to get that information for 10,000 entries. For 5 million files.... you do the maths. It's not really workable. So even if the aforementioned Windows Explorer "Date Modified" search took a whole hour (which it might), it would still be preferable to my other solution.

My problem is that I have no idea how to create an EFFICIENT and FAST algorithm to achieve this, in VB.NET

So effectively I need an fast algorithm, in VB.NET that will give me a list of files within a UNC path and all its subfolders that were last modified within a certain date range.
0
Comment
Question by:WernerVonBraun
4 Comments
 
LVL 36

Expert Comment

by:ste5an
ID: 40243625
There's no free lunch here. Either you can use an index for this operation or not.. As you can use e.g. datemodified:>=07/01/12 in the Explorer, it's indexed for the folders part of the index. So I guess you may experiment with adding your folder hierarchy to the search index. But building the index the first time will also be a slow process.

Caveat: this can be a resource consuming process..
0
 
LVL 25

Accepted Solution

by:
Luis Pérez earned 2000 total points
ID: 40243628
Well, you proposed a really difficult thing here, because direct readings to the file system will be always time expesive, and with a huge number of files it can be a painful issue.

The ideal solution (if you can implement it, because I don't know if you can modify the database) would be:
- Add a field in the database to store the modify date for each file, and do a initial load of this value into the database for all files. This would solve the time problem, because you could do all your searches directly into the database, not in the file system (which is time expensive).
- Create a Windows Service or background application that uses a FileSystemWatcher .Net component to monitorize changes to the files, updating the database with the modify date for each modified file.

Hope that helps.
0
 
LVL 11

Expert Comment

by:LordWabbit
ID: 40244123
You could speed things up by using the Parallel class, for example...

        private static void ScanDirectory(string location)
        {
            WriteLine("");
            Write(location.Substring(location.LastIndexOf('\\') + 1), ConsoleColor.White);
            string[] files = System.IO.Directory.GetFiles(location);
            totalFiles += files.Length;
            Parallel.ForEach(files, file =>
            {
                DateTime createTime = System.IO.File.GetCreationTime(file);
                TimeSpan ts = DateTime.Now.Subtract(createTime);
                if (ts.TotalDays < 7)
                {
                    WriteLine(file + " is new");
                }
            });
            string[] folders = System.IO.Directory.GetDirectories(location);
            Parallel.ForEach(folders, folder =>
            {
                ScanDirectory(folder);
            });
        }

Open in new window

However this will probably put some serious strain on the machine it runs on.
0
 
LVL 4

Author Closing Comment

by:WernerVonBraun
ID: 40247597
FileSystemWatcher

Thanks for the tip
0

Featured Post

How to Use the Help Bell

Need to boost the visibility of your question for solutions? Use the Experts Exchange Help Bell to confirm priority levels and contact subject-matter experts for question attention.  Check out this how-to article for more information.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s quite interesting for me as I worked with Excel using vb.net for some time. Here are some topics which I know want to share with others whom this might help. First of all if you are working with Excel then you need to Download the Following …
More often than not, we developers are confronted with a need: a need to make some kind of magic happen via code. Whether it is for a client, for the boss, or for our own personal projects, the need must be satisfied. Most of the time, the Framework…
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses
Course of the Month19 days, 22 hours left to enroll

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question