Excel VBA Efficient Method to Preform Multiple Quick Searches
Posted on 2012-09-03
Hi Experts, I have kind of a challenging problem that I'm trying to figure out. I need an efficient way to perform a series of fast searches through a list of about 100,000 items in an Excel workbook. I'm writing a program that will return a set of the closest matches to a given string of characters within a list; it will eventually iterate through all 100,000 items and return the top five closest matches based on a similar matching algorithm.
The issue that I'm having is getting the program to run efficiently. If I have a search string of 11 characters (for example 'TEST STRING') I need to scan my entire list searching for each possible sub string within the search string (limiting it to a minimum five consecutive characters). (e.g., 'TEST STRING', 'TEST STRIN', 'EST STRING', 'TEST STRI', 'EST STRIN', 'ST STRING'... all the way down to 5 consecutive characters... 'TEST ', 'EST S', 'ST ST', 'T STR', ' STRI', 'STRIN', 'TRING'). And return a set of items that contains each of those sub strings.
As you can see this requires A LOT of searching, and the number of searches grows exponentially depending on the length of the search string. Initially I did this with brute force by looping through my entire list for each search (which actually worked OK when I only had about 2,000 items). I've tried creating a database of the items in access then executing SQL statements with a LIKE clause, but these are still pretty inefficient. I've had some luck using the .find method, but this still takes nearly 10 hours to process the entire list (and I've been seeing some buggy results).
I'm looking for suggestions on different methods of preforming the searches. I think that there should be some way that I can create a hash which will help me quickly return a set of all items that contain specific subset of characters. But I haven't worked much with hashes and I haven't been able to figure something out.
-Apologies for the long request; I've actually been asking smaller chunks of this question along the way, but I just keep hitting roadblocks. Thanks in advance