Solved

Remove HTML Comment Tags Unless They Contain Certain Words

Posted on 2014-01-03
4
296 Views
Last Modified: 2014-01-03
I have HTML code stored in the variable, $page_entire_code. I am trying to strip out all HTML comment tags except ones that contain the word, "Test1" or "Test2". My current code is:

$page_entire_code =~ s/<!--.+-->//g;

Thanks!
0
Comment
Question by:webstuck5
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 39754734
What you have now is quite dangerous. You have a "greedy" pattern. What it will do is remove everything between the very first opening comment tag, to the very last closing comment tag, not the comment tag that closes the first opening tag. This is bad. You need a non-greedy version of your pattern. You simply need one additional character:

$page_entire_code =~ s/<!--.+?-->//g;

Open in new window


The addition question mark makes the pattern not greedy. Now the pattern will match an opening comment tag up until the very next closing comment tag.

Now, this doesn't solve your overall issue. We need to tweak your pattern just a bit more. To look for certain words you will need to employ a lookahead. For your needs, you will need a negative lookahead. Let's try the following pattern:

$page_entire_code =~ s/<!--(?!.*?(Test[12]).*?-->).*?-->//g;

Open in new window

0
 

Author Comment

by:webstuck5
ID: 39754763
I had tested my greedy pattern on HTML code that had a lot of comment tags and it only looked to remove each comment tag and not the HTML in between. If I used the words, "TEST_ON" and "TEST_OFF" would your code still work, if I did:

$page_entire_code =~ s/<!--(?!.*?(TEST_[OFFON]).*?-->).*?-->//g;
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 39754785
Ah, I forgot one important detail:  The dot will match, by default, everything that is not a newline character. So if each of your comments is contained on its own line, then you're fine.

For this new requirement, that could work, but I'd prefer an alternation (i.e. an OR condition--the vertical bar) myself:

$page_entire_code =~ s/<!--(?!.*?(TEST_(OFF|ON)).*?-->).*?-->//g;

Open in new window

0
 

Author Closing Comment

by:webstuck5
ID: 39754835
Thanks so much!
0

Featured Post

Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
When crafting your “Why Us” page, there are a plethora of pitfalls to avoid. Follow these five tips, and you’ll be well on your way to creating an effective page.
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…

626 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question