FAST file search by Modified Date range

One of our clients has a folder on a server, containing a lovely, branchy tree of sub-folders and a total of about 5 million documents dispersed through said folder tree.

So the scenario is that I know that the path to all these documents is something like \\TheServer\TheRootFolder

Now I want to retrieve all the documents in the whole tree that were modified in the last week.

I went into Windows Explorer and did a search filtering on "Date Modified: last week". This has now been running for half an hour and it has found 8.5 thousand documents that were modified in the last week ... .so far

still, though. We have a database that lists all the documents in that folder tree. I wrote a bit of code that took 10,000 entries out of that database, and then used System.IO.File.Exists and System.IO.File.GetLastWriteTime to check whether a) the file still existed and b) when it was modified. It took about 2 minutes for it to get that information for 10,000 entries. For 5 million files.... you do the maths. It's not really workable. So even if the aforementioned Windows Explorer "Date Modified" search took a whole hour (which it might), it would still be preferable to my other solution.

My problem is that I have no idea how to create an EFFICIENT and FAST algorithm to achieve this, in VB.NET

So effectively I need an fast algorithm, in VB.NET that will give me a list of files within a UNC path and all its subfolders that were last modified within a certain date range.
LVL 4
WernerVonBraunAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ste5anSenior DeveloperCommented:
There's no free lunch here. Either you can use an index for this operation or not.. As you can use e.g. datemodified:>=07/01/12 in the Explorer, it's indexed for the folders part of the index. So I guess you may experiment with adding your folder hierarchy to the search index. But building the index the first time will also be a slow process.

Caveat: this can be a resource consuming process..
0
Luis PérezSoftware Architect in .NetCommented:
Well, you proposed a really difficult thing here, because direct readings to the file system will be always time expesive, and with a huge number of files it can be a painful issue.

The ideal solution (if you can implement it, because I don't know if you can modify the database) would be:
- Add a field in the database to store the modify date for each file, and do a initial load of this value into the database for all files. This would solve the time problem, because you could do all your searches directly into the database, not in the file system (which is time expensive).
- Create a Windows Service or background application that uses a FileSystemWatcher .Net component to monitorize changes to the files, updating the database with the modify date for each modified file.

Hope that helps.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
LordWabbitCommented:
You could speed things up by using the Parallel class, for example...

        private static void ScanDirectory(string location)
        {
            WriteLine("");
            Write(location.Substring(location.LastIndexOf('\\') + 1), ConsoleColor.White);
            string[] files = System.IO.Directory.GetFiles(location);
            totalFiles += files.Length;
            Parallel.ForEach(files, file =>
            {
                DateTime createTime = System.IO.File.GetCreationTime(file);
                TimeSpan ts = DateTime.Now.Subtract(createTime);
                if (ts.TotalDays < 7)
                {
                    WriteLine(file + " is new");
                }
            });
            string[] folders = System.IO.Directory.GetDirectories(location);
            Parallel.ForEach(folders, folder =>
            {
                ScanDirectory(folder);
            });
        }

Open in new window

However this will probably put some serious strain on the machine it runs on.
0
WernerVonBraunAuthor Commented:
FileSystemWatcher

Thanks for the tip
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.