I have to create a text indexing/searching engine for a document management application.
I remember reading quite a while ago about an engine that used inverted indexing, (I think it was called), with dictionary coding to implement keyword retrieval and compression in a sort of "double whammy". I think it indexed all occurences of strings with a dictionary 'key', and then shrank the source by storing the dictionary keys rather than the associated strings. This had the advantage of compression of original text, with being able to use the compressed text to search - by looking up the dictionary key(s) rather than the associated strings.
I could probably get that up and running but wanted to get some comments before I start what is going to be quite a long project. If anybody has done anything similar, or implemented a different solution I would be very glad if they could point me in the right direction / share experiences.