Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Remove HTML Comment Tags Unless They Contain Certain Words

Posted on 2014-01-03
4
Medium Priority
?
300 Views
Last Modified: 2014-01-03
I have HTML code stored in the variable, $page_entire_code. I am trying to strip out all HTML comment tags except ones that contain the word, "Test1" or "Test2". My current code is:

$page_entire_code =~ s/<!--.+-->//g;

Thanks!
0
Comment
Question by:webstuck5
  • 2
  • 2
4 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 39754734
What you have now is quite dangerous. You have a "greedy" pattern. What it will do is remove everything between the very first opening comment tag, to the very last closing comment tag, not the comment tag that closes the first opening tag. This is bad. You need a non-greedy version of your pattern. You simply need one additional character:

$page_entire_code =~ s/<!--.+?-->//g;

Open in new window


The addition question mark makes the pattern not greedy. Now the pattern will match an opening comment tag up until the very next closing comment tag.

Now, this doesn't solve your overall issue. We need to tweak your pattern just a bit more. To look for certain words you will need to employ a lookahead. For your needs, you will need a negative lookahead. Let's try the following pattern:

$page_entire_code =~ s/<!--(?!.*?(Test[12]).*?-->).*?-->//g;

Open in new window

0
 

Author Comment

by:webstuck5
ID: 39754763
I had tested my greedy pattern on HTML code that had a lot of comment tags and it only looked to remove each comment tag and not the HTML in between. If I used the words, "TEST_ON" and "TEST_OFF" would your code still work, if I did:

$page_entire_code =~ s/<!--(?!.*?(TEST_[OFFON]).*?-->).*?-->//g;
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 2000 total points
ID: 39754785
Ah, I forgot one important detail:  The dot will match, by default, everything that is not a newline character. So if each of your comments is contained on its own line, then you're fine.

For this new requirement, that could work, but I'd prefer an alternation (i.e. an OR condition--the vertical bar) myself:

$page_entire_code =~ s/<!--(?!.*?(TEST_(OFF|ON)).*?-->).*?-->//g;

Open in new window

0
 

Author Closing Comment

by:webstuck5
ID: 39754835
Thanks so much!
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Use these top 10 tips to master the art of email signature design. Create an email signature design that will easily wow recipients, promote your brand and highlight your professionalism.
Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
This video shows how to quickly and easily deploy an email signature for all users in Office 365 and prevent it from being added to replies and forwards. (the resulting signature is applied on the server level in Exchange Online) The email signat…

972 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question