Link to home
Start Free TrialLog in
Avatar of mzehner
mzehner

asked on

Pattern Matching

I am writing a Perl script that should strip HTML tags from a large string and substitute a space for them.  Currently I am using the following command to do this:

$respstr =~ s/(<.*>)/ /g;

This strips anything between < and >.  Unfortunately, this strips most of the HTML since the entire page is enclosed between < and >.  I need to strip anything between < and > exclusive of content contain < or >.  How can I modify the above code to do this?

Thanks for your help.
Avatar of ozo
ozo
Flag of United States of America image

$respstr =~ s/(<.*?>)/ /g;
$respstr =~ s/(<.*?>)/ /gi;
But this also fails if you have HTML like:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
Avatar of rj2
rj2

mzehner,
Could you give us a sample of the html you want to substitute?
Sorry, I meant to say
$respstr =~ s/(<.*?>)/ /gs;
$respstr=~ s/<[^>]+>//g;

Will get all the normal tags.  You may need to do comments first if there is a risk of nested comments or worse comment round tags.
ASKER CERTIFIED SOLUTION
Avatar of rj2
rj2

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mzehner

ASKER

Thanks!  All the answers worked quite well.  The one I selected seemed to work the best.  Thanks for the help.