Solved

Remove HTML Comment Tags Unless They Contain Certain Words

Posted on 2014-01-03
4
286 Views
Last Modified: 2014-01-03
I have HTML code stored in the variable, $page_entire_code. I am trying to strip out all HTML comment tags except ones that contain the word, "Test1" or "Test2". My current code is:

$page_entire_code =~ s/<!--.+-->//g;

Thanks!
0
Comment
Question by:webstuck5
  • 2
  • 2
4 Comments
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
What you have now is quite dangerous. You have a "greedy" pattern. What it will do is remove everything between the very first opening comment tag, to the very last closing comment tag, not the comment tag that closes the first opening tag. This is bad. You need a non-greedy version of your pattern. You simply need one additional character:

$page_entire_code =~ s/<!--.+?-->//g;

Open in new window


The addition question mark makes the pattern not greedy. Now the pattern will match an opening comment tag up until the very next closing comment tag.

Now, this doesn't solve your overall issue. We need to tweak your pattern just a bit more. To look for certain words you will need to employ a lookahead. For your needs, you will need a negative lookahead. Let's try the following pattern:

$page_entire_code =~ s/<!--(?!.*?(Test[12]).*?-->).*?-->//g;

Open in new window

0
 

Author Comment

by:webstuck5
Comment Utility
I had tested my greedy pattern on HTML code that had a lot of comment tags and it only looked to remove each comment tag and not the HTML in between. If I used the words, "TEST_ON" and "TEST_OFF" would your code still work, if I did:

$page_entire_code =~ s/<!--(?!.*?(TEST_[OFFON]).*?-->).*?-->//g;
0
 
LVL 74

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
Comment Utility
Ah, I forgot one important detail:  The dot will match, by default, everything that is not a newline character. So if each of your comments is contained on its own line, then you're fine.

For this new requirement, that could work, but I'd prefer an alternation (i.e. an OR condition--the vertical bar) myself:

$page_entire_code =~ s/<!--(?!.*?(TEST_(OFF|ON)).*?-->).*?-->//g;

Open in new window

0
 

Author Closing Comment

by:webstuck5
Comment Utility
Thanks so much!
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

Suggested Solutions

A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
In this tutorial viewers will learn how to embed an audio file in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: : The declaration should display (CODE) HTML5 is supported by the most recent versions of all major browsers…
In this tutorial viewers will learn how to embed Flash content in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <object> tag to embed Flash content.: To specify that the object is Flash content, d…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now