Solved

How to Filter on unwanted text using RE

Posted on 2002-05-22
6
174 Views
Last Modified: 2010-03-05
Im trying to filter out unwanted html start and end tags that do not have text or other empty tags inside them. For example, if the following html string contained:

FONT FACE="Arial Black" SIZE="+1">img src="../gd.gif">/FONT>Text 1P>
FONT FACE="Arial Narrow" SIZE="+1">Text 2/FONT>
FONT FACE="Bembo" SIZE="+1">/FONT>
FONT FACE="Bembo" SIZE="+1">
Text 3/FONT>
FONT FACE="Bembo" SIZE="+1">
/FONT>

Then I would expect the following returned:

FONT FACE="Arial Black" SIZE="+1">img src="../gd.gif">/FONT>Text 1P>
FONT FACE="Arial Narrow" SIZE="+1">Text 2/FONT>
FONT FACE="Bembo" SIZE="+1">
Text 3/FONT>

Does anyone have any ideas?

Thanks rj2 and ozo, your solution was bang on! :-)

I should have been more clearer in the question. What I really meant was the html text could be any valid html tags that need filtering i.e. table tags with no content,bold tags with no content,etc. (assuming that the document is fully html 4.0 compliant). The above was mearly an example of what may need filtering.
0
Comment
Question by:pdistant
6 Comments
 
LVL 12

Expert Comment

by:lexxwern
ID: 7026586
which editor made this crappy peice of html...
0
 
LVL 1

Expert Comment

by:GorGor1
ID: 7028064
The perl script is simple.  Coming up with your filter criteria is the hard part.  It's almost pointless to show the script without the proper filter (i.e. search/rejection criteria).  I'll keep thinking about this one...
0
 
LVL 10

Expert Comment

by:rj2
ID: 7028223
#!/usr/bin/perl

my $html=<<ENDHTML;
<FONT FACE="Arial Black" SIZE="+1"><img src="../gd.gif"></FONT>Text 1<P>
<FONT FACE="Arial Narrow" SIZE="+1">Text 2</FONT>
<FONT FACE="Bembo" SIZE="+1"></FONT>
<FONT FACE="Bembo" SIZE="+1"><br>Text 3</FONT>
<FONT FACE="Bembo" SIZE="+1"><br></FONT>
ENDHTML

print "Before: $html\n";

$html =~ s/<FONT[^>]*?>(<br>)+<\/FONT>//gmi; # first remove font tags with only <br> inside
$html =~ s/<FONT[^>]*?><\/FONT>//gmi; # then remove font tags with nothing inside
$html =~ s/\n\n/\n/gm; #replace two consecutive linefeeds with one

print "After: $html";
0
Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

 
LVL 84

Expert Comment

by:ozo
ID: 7028989
or
$html =~ s/<FONT[^>]*>(<br>)*<\/FONT>\s*//gi;
0
 
LVL 10

Accepted Solution

by:
rj2 earned 100 total points
ID: 7030145
0
 

Author Comment

by:pdistant
ID: 7038613
That'll do!
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
delete query using perl dbi 3 100
Question about @INC variable in perl 1 58
quoting a comma separated list 20 84
perl: Cleaning meta tags using RegEX 12 75
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

815 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now