Better Approach for full text ranking
Posted on 2007-09-30
I have a program that builds a database of keywords to relate to a certain email thread. Users can select sets of keywords that uniquely identify an email thread and then allow the program to run through the database of these "sets of keywords" against every email to create a ranking of possible threads the email is related to.
Now I notice that my searches are getting slower and slower. My algorithim is very simple: for each Keyword, do an instr function against the email text and count the numnber of times a set of keywords matches. because the number of defined keyword sets is increasing, this is taking longer and longer.
Even if I branch this out to multiple threads or "mothball" old keyword sets it's clear to me that I am in need of a different approach if I expect performance to be acceptable for the long term. I marvel at the searches that google does, quick and thourogh, with ranking, all on a few keywords. How can I extend this kind of functionality to my application? I am writting in Visual Studio, my keywords sets are stored in a disconnected ADO.NET dataset, so I don't have a real database backend with full text capabilities.