mzehner
asked on
Pattern Matching
I am writing a Perl script that should strip HTML tags from a large string and substitute a space for them. Currently I am using the following command to do this:
$respstr =~ s/(<.*>)/ /g;
This strips anything between < and >. Unfortunately, this strips most of the HTML since the entire page is enclosed between < and >. I need to strip anything between < and > exclusive of content contain < or >. How can I modify the above code to do this?
Thanks for your help.
$respstr =~ s/(<.*>)/ /g;
This strips anything between < and >. Unfortunately, this strips most of the HTML since the entire page is enclosed between < and >. I need to strip anything between < and > exclusive of content contain < or >. How can I modify the above code to do this?
Thanks for your help.
$respstr =~ s/(<.*?>)/ /g;
$respstr =~ s/(<.*?>)/ /gi;
But this also fails if you have HTML like:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
But this also fails if you have HTML like:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
mzehner,
Could you give us a sample of the html you want to substitute?
Could you give us a sample of the html you want to substitute?
Sorry, I meant to say
$respstr =~ s/(<.*?>)/ /gs;
$respstr =~ s/(<.*?>)/ /gs;
$respstr=~ s/<[^>]+>//g;
Will get all the normal tags. You may need to do comments first if there is a risk of nested comments or worse comment round tags.
Will get all the normal tags. You may need to do comments first if there is a risk of nested comments or worse comment round tags.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks! All the answers worked quite well. The one I selected seemed to work the best. Thanks for the help.