FAST file search by Modified Date range
Posted on 2014-08-06
One of our clients has a folder on a server, containing a lovely, branchy tree of sub-folders and a total of about 5 million documents dispersed through said folder tree.
So the scenario is that I know that the path to all these documents is something like \\TheServer\TheRootFolder
Now I want to retrieve all the documents in the whole tree that were modified in the last week.
I went into Windows Explorer and did a search filtering on "Date Modified: last week". This has now been running for half an hour and it has found 8.5 thousand documents that were modified in the last week ... .so far
still, though. We have a database that lists all the documents in that folder tree. I wrote a bit of code that took 10,000 entries out of that database, and then used System.IO.File.Exists and System.IO.File.GetLastWriteTime to check whether a) the file still existed and b) when it was modified. It took about 2 minutes for it to get that information for 10,000 entries. For 5 million files.... you do the maths. It's not really workable. So even if the aforementioned Windows Explorer "Date Modified" search took a whole hour (which it might), it would still be preferable to my other solution.
My problem is that I have no idea how to create an EFFICIENT and FAST algorithm to achieve this, in VB.NET
So effectively I need an fast algorithm, in VB.NET that will give me a list of files within a UNC path and all its subfolders that were last modified within a certain date range.