I am trying to create a parser that parsers documents such as .html, .asp, or .doc (word).
So if a document had words such as: Today we will be going over the new APPLE technology. EXPERT is a new technology coming out in 06.
It would find words like Apple and Expert.
I have looked into the IFilter technology, and it seems like it would work.
But i was wondering, would it be faster to use a technology such as IFilter, or to just parse the text word by word (that is, groups of letters inbetween each space : ex Hi APPLE bye <-- apple is inbetween spaces), and then look at that word, and then see if it exists in my list of known words.