Link to home
Start Free TrialLog in
Avatar of chenwei
chenwei

asked on

Looking for HTML Parser for C/C++

Who knows where can I get a good HTML-Parser for C/C++ for free?
Avatar of dorward
dorward

Avatar of chenwei

ASKER

Thanks for the info. I forget to say what I am looking for is an HTML Parser for C/C++ for MS Visual Sutdio. That means it could be compiled with MS Visual Studio.

The dillo seems for GNU-C

http://www.thefreecountry.com/sourcecode/cpp.shtml

try c++ class library

"html parser" c++ in a google search brings up all sort of parsers
There's a GPL piece of software called HTMLDOC. It's made for converting HTML to PDF files (or RTF) but has a pretty good HTML parser in it and can handle all the basic tags. While it's not exactly made for what you're trying to do, once you suck in the HTML it's all strung up in a object tree that you can do whatever you want with.

You can find the source here:
http://www.easysw.com/htmldoc/software.php

An alternative is to first run your HTML through HTMLTidy to create XHMTL then use a regular XML parser (like Xerces) to parse out what you want.

You can find HTML Tidy here:
http://tidy.sourceforge.net/

and Xerces here:
http://xml.apache.org/xerces-c/

Everything is open source and can probably give you what you want.

-Bil
Avatar of chenwei

ASKER

Thanks to all sites. I've found out an HTML Parser, libxml2.

Please don't answer my question any more.
ASKER CERTIFIED SOLUTION
Avatar of Computer101
Computer101
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial