jay28lee
asked on
How do I remove all HTML tags in a string using regular expression?
$mystring contains:
<div align="center"><a href="http://www.domain.com/">DOMAIN NAME</a></div>Some random text.<br><a href="http://www.domain2.com/">ANOTHER DOMAIN NAME</a>
I want to use Perl and regular expression to manipulate $mystring so that it removes all the HTML elements and hyperlinks, so that $mystring contains only "Some random text."
How can this be done?
<div align="center"><a href="http://www.domain.com/">DOMAIN NAME</a></div>Some random text.<br><a href="http://www.domain2.com/">ANOTHER DOMAIN NAME</a>
I want to use Perl and regular expression to manipulate $mystring so that it removes all the HTML elements and hyperlinks, so that $mystring contains only "Some random text."
How can this be done?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
What are the values of $C and $X?
ASKER
there's also the following code before the regular expression
$X=qr/[\x81-\xFE][\x40-\x7 E\x80-\xFE ]/;
$C=qr/$X|[^\x81-\xFE]/;
@s=(' - ',' ');
the above code was commented as handling for Chinese Big-5 charset.
$X=qr/[\x81-\xFE][\x40-\x7
$C=qr/$X|[^\x81-\xFE]/;
@s=(' - ',' ');
the above code was commented as handling for Chinese Big-5 charset.
ASKER
btw, ozo, could you help me look at another of my questions as of the following, a related question from what you've answered back in 2005.
https://www.experts-exchange.com/questions/27024825/string-manupulation-big5-characters-now-needs-HTML-Entity-support.html
btw, the solution works for me using: s/<(?:[^>'"]*|(['"]).*?\1) *>//gs
i'll simply ignore what was previously written by the original programmer of my script.
https://www.experts-exchange.com/questions/27024825/string-manupulation-big5-characters-now-needs-HTML-Entity-support.html
btw, the solution works for me using: s/<(?:[^>'"]*|(['"]).*?\1)
i'll simply ignore what was previously written by the original programmer of my script.
ASKER
s/\G($C*?)(?: +|($X)(-)|(-)(?=$X)|($X)(?
Can you tell me if there's something wrong with the above code? And what does it do?
Should I replace it with what you mentioned?
s/<(?:[^>'"]*|(['"]).*?\1)
Thanks.