Regular expression to replace tags

Posted on 2004-10-20
Last Modified: 2013-12-24
Does anyone have a regular expression that will remove any tag and the content between the tags and replace it with the blank tag value? For instance if I want to remove the "<head></head>" tags and any character and/or nested tags between them and replace them with "<head><title>My Title</title></head>".
Question by:McHack
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 35

Expert Comment

ID: 12362885
You could do it in two steps.

Step 1) Replace <head>....</head> with <head></head>

You could use a regular expression similar to:

Step 2) Then you can replace <head></head> with what you want using a simple replace.

You could probably combine into one step, up to you.

Author Comment

ID: 12363580
Ok, that didn't work. When I ran that example the page "<HEAD></HEAD>" tags were not removed and the content in between them was not removed of which there is a ton (javascripts, meta tags etc.).

Here is the code I was running:

       <cfhttp method="get" url="" resolveurl="yes">
            <CFSET ThrowError = true>
<cfset ApScriptEdit = #ReReplaceNoCase(cfhttp.FileContent, "<head>*</head>", "", "ALL")#>
LVL 10

Expert Comment

ID: 12366895

Almost there try this:

<cfset ApScriptEdit  = rereplacenocase(cfhttp.FileContent, "<head>.*?</head>","","ALL")>

Building an interactive eFuture classroom

Watch and learn how ATEN provided a total control system solution including seamless switching matrix switch, HDBaseT extenders, PDU, lighting control to build an interactive eFuture classroom.


Accepted Solution

umbrae earned 500 total points
ID: 12374351
       <cfhttp method="get" url="" resolveurl="yes">
            <CFSET ThrowError = true>
<cfset ApScriptEdit =
"<head[^>]*>.*</head[^>]*>","<head><title>My Title</title></head>","ALL")>

in order to catch all of the possibilities I'd probably using something like this. It'd catch if they have anything inside the head (standard html does not, but they may have a space or something like that).

Just my 2 cents.

Expert Comment

ID: 12374358
Whoops. Probably want to remove the htmlCodeFormat() from that output, was doing that for debugging.


Author Comment

ID: 12376494

Works great, just what I was looking for.

It looks so simple when I see the solution but some how I never seem to get the regular expressions right.

Thanks for the help!


Featured Post

Free NetCrunch network monitor licenses!

Only on Experts-Exchange: Sign-up for a free-trial and we'll send you your permanent license!

Here is what you get: 30 Nodes | Unlimited Sensors | No Time Restrictions | Absolutely FREE!

Act now. This offer ends July 14, 2017.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Periodically we have to update or add SSL certificates for customers. Depending upon your hosting plan you may be responsible for the installation and/or key generation. In the wake of Heartbleed many sites were forced to re-key. We will concen…
If you don't have the right permissions set for your WordPress location in IIS, you won't be able to perform automatic updates. Here's how to fix the problem.
There are cases when e.g. an IT administrator wants to have full access and view into selected mailboxes on Exchange server, directly from his own email account in Outlook or Outlook Web Access. This proves useful when for example administrator want…
Michael from AdRem Software explains how to view the most utilized and worst performing nodes in your network, by accessing the Top Charts view in NetCrunch network monitor ( Top Charts is a view in which you can set seve…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question