Hi, I have actually already made a version of this but Im not completely satisfied, and Im thinking of rewriting if from scratch.
Given an HTML file I need to extract all visible text, and all links. Im using this to do some web crawling.
Anyone know of a good way do to this? I will most likely have to use an HTML parser (I use TLegHTMLParser now), but I have to do a LOT of string parsing myself when using that one. I would like it much more simple, like getting all links in a TStringList, or something like that. And that is _qualified_ links, btw. A link like '/main.asp' should be 'www.somewebsite.com/main.asp'
. The parser could have a source property or something like that, so it can complete the links.
I hope you understand what Im looking for ;)