Is IFilter or Parsing Faster?


I am trying to create a parser that parsers documents such as .html, .asp, or .doc (word).

So if a document had words such as:    Today we will be going over the new APPLE technology.  EXPERT is a new technology coming out in 06.  
It would find words like Apple and Expert.

I have looked into the IFilter technology, and it seems like it would work.  

But i was wondering, would it be faster to use a technology such as IFilter, or to just parse the text word by word (that is, groups of letters inbetween each space : ex    Hi APPLE bye  <-- apple is inbetween spaces), and then look at that word, and then see if it exists in my list of known words.


Who is Participating?
IFilter has some overheads that you should count on.
- Calling a COM component is slower than calling your own method, but if you use in-process components, this performance loss is negligable
- The component that implements IFilter may check the validity of the document and do some extra tasks that you don't need, I think this takes the most time.

If high performance is a very important factor, html parsing can be done faster by native code designed explicitely to extract words from a html document. But a Word document is very complex, so IFilter is the only economic solution.

CPU time spent in certain methods can be measured by CLR Profiler, try this first before you start implementing custom parsers.
alexthecodepoetAuthor Commented:
Thank you.

Is IFilter the norm?
IFilter is a widely spread standard interface. The manufacturers of all popular document types offer COM components with that interface.
Naturally there exist individual solutions for certain document types, especially for html, but the ability of handling html and doc in the same way means more advantage than a small performace gain using third party components.
alexthecodepoetAuthor Commented:
Thank you Pallosp
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.