Solved

Pattern Matching

Posted on 2002-06-14
7
193 Views
Last Modified: 2012-05-04
I am writing a Perl script that should strip HTML tags from a large string and substitute a space for them.  Currently I am using the following command to do this:

$respstr =~ s/(<.*>)/ /g;

This strips anything between < and >.  Unfortunately, this strips most of the HTML since the entire page is enclosed between < and >.  I need to strip anything between < and > exclusive of content contain < or >.  How can I modify the above code to do this?

Thanks for your help.
0
Comment
Question by:mzehner
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 7079106
$respstr =~ s/(<.*?>)/ /g;
0
 
LVL 84

Expert Comment

by:ozo
ID: 7079115
$respstr =~ s/(<.*?>)/ /gi;
But this also fails if you have HTML like:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
0
 
LVL 10

Expert Comment

by:rj2
ID: 7079241
mzehner,
Could you give us a sample of the html you want to substitute?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 84

Expert Comment

by:ozo
ID: 7079317
Sorry, I meant to say
$respstr =~ s/(<.*?>)/ /gs;
0
 
LVL 8

Expert Comment

by:jhurst
ID: 7081006
$respstr=~ s/<[^>]+>//g;

Will get all the normal tags.  You may need to do comments first if there is a risk of nested comments or worse comment round tags.
0
 
LVL 10

Accepted Solution

by:
rj2 earned 50 total points
ID: 7081331
Regexp below will also get things like
<img alt="-->" src="arrow.gif">

$respstr =~ s/<("[^"]*"|'[^']*'|[^"'>])*>/ /g;
0
 
LVL 2

Author Comment

by:mzehner
ID: 7084714
Thanks!  All the answers worked quite well.  The one I selected seemed to work the best.  Thanks for the help.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question