Solved

Remove HTML Comment Tags Unless They Contain Certain Words

Posted on 2014-01-03
4
289 Views
Last Modified: 2014-01-03
I have HTML code stored in the variable, $page_entire_code. I am trying to strip out all HTML comment tags except ones that contain the word, "Test1" or "Test2". My current code is:

$page_entire_code =~ s/<!--.+-->//g;

Thanks!
0
Comment
Question by:webstuck5
  • 2
  • 2
4 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 39754734
What you have now is quite dangerous. You have a "greedy" pattern. What it will do is remove everything between the very first opening comment tag, to the very last closing comment tag, not the comment tag that closes the first opening tag. This is bad. You need a non-greedy version of your pattern. You simply need one additional character:

$page_entire_code =~ s/<!--.+?-->//g;

Open in new window


The addition question mark makes the pattern not greedy. Now the pattern will match an opening comment tag up until the very next closing comment tag.

Now, this doesn't solve your overall issue. We need to tweak your pattern just a bit more. To look for certain words you will need to employ a lookahead. For your needs, you will need a negative lookahead. Let's try the following pattern:

$page_entire_code =~ s/<!--(?!.*?(Test[12]).*?-->).*?-->//g;

Open in new window

0
 

Author Comment

by:webstuck5
ID: 39754763
I had tested my greedy pattern on HTML code that had a lot of comment tags and it only looked to remove each comment tag and not the HTML in between. If I used the words, "TEST_ON" and "TEST_OFF" would your code still work, if I did:

$page_entire_code =~ s/<!--(?!.*?(TEST_[OFFON]).*?-->).*?-->//g;
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 39754785
Ah, I forgot one important detail:  The dot will match, by default, everything that is not a newline character. So if each of your comments is contained on its own line, then you're fine.

For this new requirement, that could work, but I'd prefer an alternation (i.e. an OR condition--the vertical bar) myself:

$page_entire_code =~ s/<!--(?!.*?(TEST_(OFF|ON)).*?-->).*?-->//g;

Open in new window

0
 

Author Closing Comment

by:webstuck5
ID: 39754835
Thanks so much!
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
In this tutorial viewers will learn how to style elements, such a divs, with a "drop shadow" effect using the CSS box-shadow property Start with a normal styled element, such as a div.: In the element's style, type the box shadow property: "box-shad…
In this tutorial viewers will learn how to code links for mobile sites that, once clicked, send a call or text to a specified number. For a telephone link (once clicked, calls a number), begin with a normal "<a href=" link tag. For the href, specify…

792 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question