Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

regular expression replacements

Posted on 2005-04-24
7
Medium Priority
?
247 Views
Last Modified: 2008-01-09
I have a html document in a string, and I want to remove some tags from it. There are two basic cases
1. remove  a simple tag. examples:
   example 1.1  <sometag1 attribute="value"> needs to be removed.
   example 1.2 </sometag1> needs to be removed (if it exists)
   example 1.3 <sometag /> needs to be removed
2. remove tags and everything betweem. example:
  example 2.1 <sometag2>blah dont worry there are no sometag2s here blah</sometag2> needs to be removed entirely.

In this case all instances of sometag1 and sometag2 can be removed, allthough it would be better to have a solution that removes only those that are between the HEAD tags.
0
Comment
Question by:alberthendriks
6 Comments
 
LVL 49

Expert Comment

by:Roonaan
ID: 13853164
To remove only elements between <head> and </head> just use strpos to find those both tags. Make a slice using substring. In this substring replace the <sometags> and replace the original substring with the new one:

<?php

$head1 = strpos(strtolower($text, '<head'));
$head2 = strpos(strtolower($text, '</head'));

$slice = substr($text, $head1, $head2 - $head1);

$slice = preg_replace('/<sometag [^>]+>/i','',$slice);
$slice = preg_replace('/<[^>]+ sometag>/i','',$slice);

$new = substr($text, 0, $head1).$slice.substr($text, $head2);
?>

-r-
0
 
LVL 49

Expert Comment

by:Roonaan
ID: 13853173
Except for a typo in the $head1 = and $head2 = line, I also wrote you an example:

<?php
$text = '
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> 
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  <title>PHP: regular expression replacements</title>
  <link href="/images/ee.ico" rel="shortcut icon">
  <link href="/scripts/ee.6.css" rel="stylesheet" type="text/css">
  <link href="/scripts/eeExpert.css" rel="stylesheet" type="text/css">
<script src="/scripts/eeSubs.1.js" type="text/javascript"></script>
<meta name="description" content="I have a html document in a string, and I want to remove some tags from it. There are two basic cases 1. remove a simple tag. examples: example 1.1 <sometag1 attribute= value > needs to be removed....">
</head>
<body>

</body>
</html>
';

$head1 = strpos(strtolower($text), '<head');
$head2 = strpos(strtolower($text), '</head');

$slice = substr($text, $head1, $head2 - $head1);

$slice = preg_replace('/<link [^>]+>/i','',$slice);
$slice = preg_replace('/<[^>]+ link>/i','',$slice);

$new = substr($text, 0, $head1).$slice.substr($text, $head2);

echo '<pre>'.htmlspecialchars($text).'</pre>';
echo '<hr/>';
echo '<pre>'.htmlspecialchars($new).'</pre>';
?>

-r-
0
 
LVL 3

Expert Comment

by:designbai
ID: 13853216
use strip_tags.

whatever the tags you need to parse, specify those tags in the strip_tags, which will skip over all other tags.

then use the regular expression to achieve it.

hope this helps.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 2

Author Comment

by:alberthendriks
ID: 13854051
Roonan, what does the 2nd slice do?
$slice = preg_replace('/<[^>]+ link>/i','',$slice);

Also, I don't see a way that <link>bla</link> is entirely removed (the 2nd case in my description). Maybe you misinterpreted my question: the remark at the end applies enitrely around both cases.
0
 
LVL 49

Accepted Solution

by:
Roonaan earned 500 total points
ID: 13854116
You are correct. My code just remove the tags, and not the tag contents.

The second $cslice was to remove </link> tags. However when you are in need of removing also tag contents, we'd better use the following two preg_replace statements instead of the ones I wrote earlier:

$slice = preg_replace('/<link(.*)\/link>/i','',$slice);  //<link ...>......</link> tags with content
$slice = preg_replace('/<link [^>]+>/i','',$slice);    //<link  ...> (all remaining tags or <link  />)

-r-
0
 
LVL 14

Expert Comment

by:huji
ID: 16002849
No comment has been added to this question in more than 21 days, so it is now classified as abandoned..
I will leave the following recommendation for this question in the Cleanup topic area:
Accept: Roonaan {http:#13854116}

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

Huji
EE Cleanup Volunteer
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Suggested Courses

569 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question