[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

How to Filter on unwanted text using RE

Posted on 2002-05-22
6
Medium Priority
?
203 Views
Last Modified: 2010-03-05
Im trying to filter out unwanted html start and end tags that do not have text or other empty tags inside them. For example, if the following html string contained:

FONT FACE="Arial Black" SIZE="+1">img src="../gd.gif">/FONT>Text 1P>
FONT FACE="Arial Narrow" SIZE="+1">Text 2/FONT>
FONT FACE="Bembo" SIZE="+1">/FONT>
FONT FACE="Bembo" SIZE="+1">
Text 3/FONT>
FONT FACE="Bembo" SIZE="+1">
/FONT>

Then I would expect the following returned:

FONT FACE="Arial Black" SIZE="+1">img src="../gd.gif">/FONT>Text 1P>
FONT FACE="Arial Narrow" SIZE="+1">Text 2/FONT>
FONT FACE="Bembo" SIZE="+1">
Text 3/FONT>

Does anyone have any ideas?

Thanks rj2 and ozo, your solution was bang on! :-)

I should have been more clearer in the question. What I really meant was the html text could be any valid html tags that need filtering i.e. table tags with no content,bold tags with no content,etc. (assuming that the document is fully html 4.0 compliant). The above was mearly an example of what may need filtering.
0
Comment
Question by:pdistant
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 12

Expert Comment

by:lexxwern
ID: 7026586
which editor made this crappy peice of html...
0
 
LVL 1

Expert Comment

by:GorGor1
ID: 7028064
The perl script is simple.  Coming up with your filter criteria is the hard part.  It's almost pointless to show the script without the proper filter (i.e. search/rejection criteria).  I'll keep thinking about this one...
0
 
LVL 10

Expert Comment

by:rj2
ID: 7028223
#!/usr/bin/perl

my $html=<<ENDHTML;
<FONT FACE="Arial Black" SIZE="+1"><img src="../gd.gif"></FONT>Text 1<P>
<FONT FACE="Arial Narrow" SIZE="+1">Text 2</FONT>
<FONT FACE="Bembo" SIZE="+1"></FONT>
<FONT FACE="Bembo" SIZE="+1"><br>Text 3</FONT>
<FONT FACE="Bembo" SIZE="+1"><br></FONT>
ENDHTML

print "Before: $html\n";

$html =~ s/<FONT[^>]*?>(<br>)+<\/FONT>//gmi; # first remove font tags with only <br> inside
$html =~ s/<FONT[^>]*?><\/FONT>//gmi; # then remove font tags with nothing inside
$html =~ s/\n\n/\n/gm; #replace two consecutive linefeeds with one

print "After: $html";
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 84

Expert Comment

by:ozo
ID: 7028989
or
$html =~ s/<FONT[^>]*>(<br>)*<\/FONT>\s*//gi;
0
 
LVL 10

Accepted Solution

by:
rj2 earned 300 total points
ID: 7030145
0
 

Author Comment

by:pdistant
ID: 7038613
That'll do!
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question