Posted on 1999-01-21
I got a lot of words (millions!), which I'm going to index to gain fast access into a large file. Which data structure could I use? The problem is that I need fast access to single words, AND I need to be able to extract sequentially (sorted alfabetically) starting with a random word.
I was thinking of a B-tree of some sort, but running through the tree to get 100 index words (in-order), would mean reading A LOT of diskblocks, since the next word will usually be in a new block. The B-tree is very good for the single search, though.
Both space used and seek time is very important. The indexwords will probably have to be compressed (I was thinking of Huffman coding), which complicates things further.