Is IFilter or Parsing Faster?

Posted on 2006-05-05
Last Modified: 2008-02-26

I am trying to create a parser that parsers documents such as .html, .asp, or .doc (word).

So if a document had words such as:    Today we will be going over the new APPLE technology.  EXPERT is a new technology coming out in 06.  
It would find words like Apple and Expert.

I have looked into the IFilter technology, and it seems like it would work.  

But i was wondering, would it be faster to use a technology such as IFilter, or to just parse the text word by word (that is, groups of letters inbetween each space : ex    Hi APPLE bye  <-- apple is inbetween spaces), and then look at that word, and then see if it exists in my list of known words.


Question by:alexthecodepoet
    LVL 9

    Accepted Solution

    IFilter has some overheads that you should count on.
    - Calling a COM component is slower than calling your own method, but if you use in-process components, this performance loss is negligable
    - The component that implements IFilter may check the validity of the document and do some extra tasks that you don't need, I think this takes the most time.

    If high performance is a very important factor, html parsing can be done faster by native code designed explicitely to extract words from a html document. But a Word document is very complex, so IFilter is the only economic solution.

    CPU time spent in certain methods can be measured by CLR Profiler, try this first before you start implementing custom parsers.

    Author Comment

    Thank you.

    Is IFilter the norm?
    LVL 9

    Expert Comment

    IFilter is a widely spread standard interface. The manufacturers of all popular document types offer COM components with that interface.
    Naturally there exist individual solutions for certain document types, especially for html, but the ability of handling html and doc in the same way means more advantage than a small performace gain using third party components.

    Author Comment

    Thank you Pallosp

    Featured Post

    6 Surprising Benefits of Threat Intelligence

    All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

    Join & Write a Comment

    Extention Methods in C# 3.0 by Ivo Stoykov C# 3.0 offers extension methods. They allow extending existing classes without changing the class's source code or relying on inheritance. These are static methods invoked as instance method. This…
    Article by: Ivo
    Anonymous Types in C# by Ivo Stoykov Anonymous Types are useful when  we do not need to follow usual work-flow -- creating object of some type, assign some read-only values and then doing something with them. Instead we can encapsulate this read…
    Internet Business Fax to Email Made Easy - With eFax Corporate (, you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
    This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor ( If you're looking for how to monitor bandwidth using netflow or packet s…

    731 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now