<div>
Hey, stop listening and find me a parser :-p<br />
<br />
Anyway, the function CombineURL in unit UrlMon will do be combining (what I call qualifying above), so I just need to get all links (and text, but all parsers do that). I cant imagine there isnt a component for this, but I havent found one!
</div>


Hey, stop listening and find me a parser :-p

Anyway, the function CombineURL in unit UrlMon will do be combining (what I call qualifying above), so I just need to get all links (and text, but all parsers do that). I cant imagine there isnt a component for this, but I havent found one!

<div>
Hi Hinnack, that one looks promising - thanks!<br />

</div>


Hi Hinnack, that one looks promising - thanks!


<div>
<div class="content wysiwyg-content">
Hi, I have actually already made a version of this but Im not completely satisfied, and Im thinking of rewriting if from scratch.<br />
<br />
Given an HTML file I need to extract all visible text, and all links. Im using this to do some web crawling.<br />
<br />
Anyone know of a good way do to this? I will most likely have to use an HTML parser (I use TLegHTMLParser now), but I have to do a LOT of string parsing myself when using that one. I would like it much more simple, like getting all links in a TStringList, or something like that. And that is _qualified_ links, btw. A link like '/main.asp' should be '<a href="http://www.somewebsite.com/main.asp'" rel="ugc">www.somewebsite.com/main.asp'</a>. The parser could have a source property or something like that, so it can complete the links.<br />
<br />
I hope you understand what Im looking for ;)<br />

</div>
</div>


Hi, I have actually already made a version of this but Im not completely satisfied, and Im thinking of rewriting if from scratch.

Given an HTML file I need to extract all visible text, and all links. Im using this to do some web crawling.

Anyone know of a good way do to this? I will most likely have to use an HTML parser (I use TLegHTMLParser now), but I have to do a LOT of string parsing myself when using that one. I would like it much more simple, like getting all links in a TStringList, or something like that. And that is _qualified_ links, btw. A link like '/main.asp' should be 'www.somewebsite.com/main.asp'. The parser could have a source property or something like that, so it can complete the links.

I hope you understand what Im looking for ;)


Extracting link, frames and text from HTML

Delphi is the most powerful Object Pascal IDE and component library for cross-platform Native App Development with flexible Cloud services and broad IoT connectivity. It provides powerful VCL controls for Windows 10 and enables FMX development for Windows, Mac and Mobile. Delphi is your choice for ultrafast Enterprise Strong Development™. Look for increased memory for large projects, extended multi-monitor support, improved Object Inspector and much more. Delphi is 5x faster for development and deployment across multiple desktop, mobile, cloud and database platforms including 32-bit and 64-bit Windows 10.