Posted on 1998-12-22
Actually I've asked this question before, but I was pretty much stumbling in the dark, so I didn't get the terms right, and my qustion was therefore sort of impossible to answer (though BigRat did give me some ideas, thanks!). Anyway, this time I'm more specific and I'm hoping that reposting will make some more people notice it. Here it comes:
I want to retrieve sorted documents from a _static_ database, which I will build myself. Simplified example: Imagine a lot of rather small documents (maybe 50-800 words each). I want to place them all in one big file, and build an index with all unique words. When this is done, I can lookup any word in the index, and it will show me which documents this particular word exists in. So, I lookup say, 'exchange', in the index, and I have stored the information that this word exists in document 1, 4, 6.... and so on.
This can be done in several ways, and is NOT the problem.
Now imagine that my documents all have some fields in common (let's pretend the documents are bookreviews), eg. writer, publisher, and maybe the year it was written. I would make an index for each field, making it possible to limit the search to writer or publisher. This isn't the problem either.
The problem is that I NEED to be able to display the result sorted in different ways. The above will retrieve the documents in the order they where stored (at least with the implementation methods I can think of). So how do I sort the retrieved result? If I got 500.000 hits, I really don't want to make a sort, which will require a lookup in each document! I would probably need some presorted lookup-tables, which is very possible since the database is 100% static, once it's build.
What I need is information on how to do this. I already bought 2 books, and despite promising titles they where of absolutely no use.
This is a very specific use of information retrieval, but my hope is that some of you out there have knowledge about this, or know where to get it! Any comments, links or booktitles appreciated.