Esopo
asked on
Strip HTML tags - reliably
Hi,
I am making a little freeware app that requires working with an HTML page's plain text. I've found some code to do simple tag stripping (the same I was thinking, just remove everything between '<' and '>'), but I fear this may not be very reliable.
So my question is,
Should I just take out everything between tag openers and closers, or is there a more *intelligent solution for stripping all code from an HTML page and leaving only the readable text?
Any ideas are welcome. Thank you.
I am making a little freeware app that requires working with an HTML page's plain text. I've found some code to do simple tag stripping (the same I was thinking, just remove everything between '<' and '>'), but I fear this may not be very reliable.
So my question is,
Should I just take out everything between tag openers and closers, or is there a more *intelligent solution for stripping all code from an HTML page and leaving only the readable text?
Any ideas are welcome. Thank you.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I posted a follow up about the Jedi parser:
https://www.experts-exchange.com/questions/21419805/Jedi-JvHTMLParser-Parsing-META-tags.html