regular expression replacements

I have a html document in a string, and I want to remove some tags from it. There are two basic cases
1. remove  a simple tag. examples:
   example 1.1  <sometag1 attribute="value"> needs to be removed.
   example 1.2 </sometag1> needs to be removed (if it exists)
   example 1.3 <sometag /> needs to be removed
2. remove tags and everything betweem. example:
  example 2.1 <sometag2>blah dont worry there are no sometag2s here blah</sometag2> needs to be removed entirely.

In this case all instances of sometag1 and sometag2 can be removed, allthough it would be better to have a solution that removes only those that are between the HEAD tags.
LVL 2
alberthendriksAsked:
Who is Participating?
 
RoonaanCommented:
You are correct. My code just remove the tags, and not the tag contents.

The second $cslice was to remove </link> tags. However when you are in need of removing also tag contents, we'd better use the following two preg_replace statements instead of the ones I wrote earlier:

$slice = preg_replace('/<link(.*)\/link>/i','',$slice);  //<link ...>......</link> tags with content
$slice = preg_replace('/<link [^>]+>/i','',$slice);    //<link  ...> (all remaining tags or <link  />)

-r-
0
 
RoonaanCommented:
To remove only elements between <head> and </head> just use strpos to find those both tags. Make a slice using substring. In this substring replace the <sometags> and replace the original substring with the new one:

<?php

$head1 = strpos(strtolower($text, '<head'));
$head2 = strpos(strtolower($text, '</head'));

$slice = substr($text, $head1, $head2 - $head1);

$slice = preg_replace('/<sometag [^>]+>/i','',$slice);
$slice = preg_replace('/<[^>]+ sometag>/i','',$slice);

$new = substr($text, 0, $head1).$slice.substr($text, $head2);
?>

-r-
0
 
RoonaanCommented:
Except for a typo in the $head1 = and $head2 = line, I also wrote you an example:

<?php
$text = '
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> 
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
  <title>PHP: regular expression replacements</title>
  <link href="/images/ee.ico" rel="shortcut icon">
  <link href="/scripts/ee.6.css" rel="stylesheet" type="text/css">
  <link href="/scripts/eeExpert.css" rel="stylesheet" type="text/css">
<script src="/scripts/eeSubs.1.js" type="text/javascript"></script>
<meta name="description" content="I have a html document in a string, and I want to remove some tags from it. There are two basic cases 1. remove a simple tag. examples: example 1.1 <sometag1 attribute= value > needs to be removed....">
</head>
<body>

</body>
</html>
';

$head1 = strpos(strtolower($text), '<head');
$head2 = strpos(strtolower($text), '</head');

$slice = substr($text, $head1, $head2 - $head1);

$slice = preg_replace('/<link [^>]+>/i','',$slice);
$slice = preg_replace('/<[^>]+ link>/i','',$slice);

$new = substr($text, 0, $head1).$slice.substr($text, $head2);

echo '<pre>'.htmlspecialchars($text).'</pre>';
echo '<hr/>';
echo '<pre>'.htmlspecialchars($new).'</pre>';
?>

-r-
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
designbaiCommented:
use strip_tags.

whatever the tags you need to parse, specify those tags in the strip_tags, which will skip over all other tags.

then use the regular expression to achieve it.

hope this helps.
0
 
alberthendriksAuthor Commented:
Roonan, what does the 2nd slice do?
$slice = preg_replace('/<[^>]+ link>/i','',$slice);

Also, I don't see a way that <link>bla</link> is entirely removed (the 2nd case in my description). Maybe you misinterpreted my question: the remark at the end applies enitrely around both cases.
0
 
hujiCommented:
No comment has been added to this question in more than 21 days, so it is now classified as abandoned..
I will leave the following recommendation for this question in the Cleanup topic area:
Accept: Roonaan {http:#13854116}

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

Huji
EE Cleanup Volunteer
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.