troubleshooting Question

Extracting link, frames and text from HTML

Avatar of pede
pede asked on
Delphi
4 Comments1 Solution145 ViewsLast Modified:
Hi, I have actually already made a version of this but Im not completely satisfied, and Im thinking of rewriting if from scratch.

Given an HTML file I need to extract all visible text, and all links. Im using this to do some web crawling.

Anyone know of a good way do to this? I will most likely have to use an HTML parser (I use TLegHTMLParser now), but I have to do a LOT of string parsing myself when using that one. I would like it much more simple, like getting all links in a TStringList, or something like that. And that is _qualified_ links, btw. A link like '/main.asp' should be 'www.somewebsite.com/main.asp'. The parser could have a source property or something like that, so it can complete the links.

I hope you understand what Im looking for ;)
ASKER CERTIFIED SOLUTION
hinnack

Our community of experts have been thoroughly vetted for their expertise and industry experience.

Join our community to see this answer!
Unlock 1 Answer and 4 Comments.
Start Free Trial
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 1 Answer and 4 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros