Order of Magnitude of SELECT query.
Posted on 2013-02-01
I have a good algorithm question for you.
First off, I'm trying to implement incremental updating differently than before. At the moment, the code would walk over every file in a folder using Depth First Search, and track where I was and what version of the file I already documented using a hierarchy based data structure (folder and file representations as objects).
I'll call the hierarchy based data structure - I.E. - The object based folder and file representations the "Directory Representation".
So, as the code stepped through the filesystem, it would also step through my Directory Representation as well, comparing files as I went - and picking out differences. this would allow me to determine what has changed, been added, or deleted since the last sync when we were all done. Cool stuff.
a) So, in a worst case scenario, the directory is 1 level deep only with n files in it. This would mean that for each file, I would have to search my Directory Representation's +-n items, making it ~O(n^2) search to compare every file.
b) This doesn't usually happen because usually there is a lot of folders, and I'm in the correct folder in the Directory Representation so I only need to to search that specific Directory's files in the Directory Representation to compare. I think this is what makes it MUCH more efficient. (It is very quick)
In reality, the Directory Representation uses a Sorted List, so the search is more like log(n), so searching it n times would be more like nLog(n) for each directory's files. (nice)
SO: Now to my question.
We are thinking about replacing this hierarchy based Directory Representation with a Database.
This is troublesome because I think of a database as a 1-D structure, so every time the code needs to check the previous state of a file, a SELECT query would be used, and the ENTIRE database of files would need be searched. In my view, this would make the incremental updater much slower when there are large amounts of files invloved - and it would make the problem more like the worst case scenario (a).
So, can anyone suggest ways to search a database n times when the database is of size n (database represents a folder and file hierarchy) so that the overall compare is an Order of Magnitude or two faster than O(n^2)? (n*n=n^2)?
So I guess this is a database sorting/optimization problem...
We're using SQL lite for now...
Thanks for any help or opinions!