FAST file search by Modified Date range

Posted on 2014-08-06
Last Modified: 2014-08-07
One of our clients has a folder on a server, containing a lovely, branchy tree of sub-folders and a total of about 5 million documents dispersed through said folder tree.

So the scenario is that I know that the path to all these documents is something like \\TheServer\TheRootFolder

Now I want to retrieve all the documents in the whole tree that were modified in the last week.

I went into Windows Explorer and did a search filtering on "Date Modified: last week". This has now been running for half an hour and it has found 8.5 thousand documents that were modified in the last week ... .so far

still, though. We have a database that lists all the documents in that folder tree. I wrote a bit of code that took 10,000 entries out of that database, and then used System.IO.File.Exists and System.IO.File.GetLastWriteTime to check whether a) the file still existed and b) when it was modified. It took about 2 minutes for it to get that information for 10,000 entries. For 5 million files.... you do the maths. It's not really workable. So even if the aforementioned Windows Explorer "Date Modified" search took a whole hour (which it might), it would still be preferable to my other solution.

My problem is that I have no idea how to create an EFFICIENT and FAST algorithm to achieve this, in VB.NET

So effectively I need an fast algorithm, in VB.NET that will give me a list of files within a UNC path and all its subfolders that were last modified within a certain date range.
Question by:WernerVonBraun
    LVL 32

    Expert Comment

    There's no free lunch here. Either you can use an index for this operation or not.. As you can use e.g. datemodified:>=07/01/12 in the Explorer, it's indexed for the folders part of the index. So I guess you may experiment with adding your folder hierarchy to the search index. But building the index the first time will also be a slow process.

    Caveat: this can be a resource consuming process..
    LVL 25

    Accepted Solution

    Well, you proposed a really difficult thing here, because direct readings to the file system will be always time expesive, and with a huge number of files it can be a painful issue.

    The ideal solution (if you can implement it, because I don't know if you can modify the database) would be:
    - Add a field in the database to store the modify date for each file, and do a initial load of this value into the database for all files. This would solve the time problem, because you could do all your searches directly into the database, not in the file system (which is time expensive).
    - Create a Windows Service or background application that uses a FileSystemWatcher .Net component to monitorize changes to the files, updating the database with the modify date for each modified file.

    Hope that helps.
    LVL 11

    Expert Comment

    You could speed things up by using the Parallel class, for example...

            private static void ScanDirectory(string location)
                Write(location.Substring(location.LastIndexOf('\\') + 1), ConsoleColor.White);
                string[] files = System.IO.Directory.GetFiles(location);
                totalFiles += files.Length;
                Parallel.ForEach(files, file =>
                    DateTime createTime = System.IO.File.GetCreationTime(file);
                    TimeSpan ts = DateTime.Now.Subtract(createTime);
                    if (ts.TotalDays < 7)
                        WriteLine(file + " is new");
                string[] folders = System.IO.Directory.GetDirectories(location);
                Parallel.ForEach(folders, folder =>

    Open in new window

    However this will probably put some serious strain on the machine it runs on.
    LVL 4

    Author Closing Comment


    Thanks for the tip

    Featured Post

    Why You Should Analyze Threat Actor TTPs

    After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

    Join & Write a Comment

    Welcome my friends to the second instalment and follow-up to our Minify and Concatenate Your Scripts and Stylesheets (…
    Today I had a very interesting conundrum that had to get solved quickly. Needless to say, it wasn't resolved quickly because when we needed it we were very rushed, but as soon as the conference call was over and I took a step back I saw the correct …
    This video is in connection to the article "The case of a missing mobile phone (". It will help one to understand clearly the steps to track a lost android phone.
    Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

    754 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    25 Experts available now in Live!

    Get 1:1 Help Now