I am going to be building a system soon that will be managing large quantites of documents. It's going to rely on a MySQL database to store basic information about the documents, and then the user will want to do a keyword search to try and find documents matching his search query.
The problem is there are potentially more than 700,000 documents active at any given time, and each document can have many keywords assigned - upwards of 1000.
I'll need to index the documents somehow so that a search query will be fast, as well as accurate and responsive. I don't expect anyone here to just hand me the solution, but can anyone suggest somewhere where I can read up on the theory? What kinds of indexing techniques exist? Which ones are scaleable to such large numbers? Can anyone recommend a website or book that goes into detail on this topic?
Start Free Trial