Link to home
Start Free TrialLog in
Avatar of mindlink
mindlink

asked on

Problem with preg_replace

I still have the same problem. I want to remove <br /> inside [tab][/tab]

Example:

$msg = "
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
";

I want to remove the <br /> tag, but only inside the places where [tab][/tab] exist.

I have a variable, $msg, that contains the string listed above. Anyone know a good solution?
Avatar of neester
neester
Flag of Australia image

hmm.

im usually pretty good with these.
but this one is out of my league :(

Sorry
Avatar of Marcus Bointon
Hm. Tricky. I've sorted one part of it out: this selects all text between [tab] and [/tab]:

 (?<=\[tab\])(.*?)(?=\[\/tab\])

The trick now is to come up with a pattern to match multiple instances of <br /> between those delimiters while ignoring other text. I had initially thought that this would do it:

(?<=\[tab\])((?<=.*?)(<br \/>)(?=.*?))+(?=\[\/tab\])

but unfortunately you can't have variable length lookbehind assertions like (?<=.*?). I know _GeG_ is into this stuff...
Avatar of inq123
inq123

Hi mindlink,

Squinky's on the right track.  Here's a Perl script that would do what you want (read on for php equivalent):

while($msg =~ s/(?<=\[tab\])(.*?)<br\s*\/>(?=.*?\[\/tab\])/$1/is) { }

Now php equivalent.  Tricky thing is that there seems to be a bug in my php version that made the pattern fail, which shouldn't happen.  I suspect versions higher than my php (4.2.3) would have it fixed.  But anyway here's the exact equivalent of the above perl script:

while(preg_match("/(?<=\[tab\])(.*?)<br\s*\/>(?=.*?\[\/tab\])/is", $msg))
  $msg = preg_replace("/(?<=\[tab\])(.*?)<br\s*\/>(?=.*?\[\/tab\])/is", "\$1", $msg);

Not particularly efficient (divide and conquer in more rounds of match and replace would be faster I think), but does what you ask for in one regex.

Cheers!
OK, here's a php script that definitely works:

<?
$msg = "
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
";
while(preg_match("/\[tab\].*?<br\s*\/>.*?\[\/tab\]/is", $msg))
  $msg = preg_replace("/(\[tab\].*?)<br\s*\/>(.*?\[\/tab\])/is", "\$1\$2", $msg);
print "replaced is:$msg\n";
?>
Oops, sorry, found the bug inside my script.  The problem is that you're using a multi-character delimiter, which cannot be fixed in just one or two regexes.  There's some modules in perl that deals with this situation, but for php, I guess we'll have to go the hard way.  I'll try to work it out when I have time later.
That will work, but will be a relatively expensive proposition, especially as you're evaluating the regex at least twice for each match. The core of the problem is in matching multiple occurrences of <br />. I started defining the central pattern to search for as (.*?<br />.*?)* which would match any number of occurrences of the tag, but then it's hard to separate it out from the surrounding text. It might be more efficient to pick out the [tab] sections with my first regex and process them separately, but keeping track of the structure might be messy.
Squinky: I agree with your assessment, which is why I said it's not particularly efficient.  And I also think divide and conquer's the more efficient way, and in fact, the only correct way in this case, as demonstrated by the script below:

mindlink,

Now this one truly works:

<?
$msg = "
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
";
$matches = array();
$newmsg = '';
while(preg_match("/(.*?)(\[tab\].*?\[\/tab\])(.*)/is", $msg, $matches))
{
  $msg = $matches[3];
  $newmsg .= $matches[1] . preg_replace("/<br\s*\/>/is", "", $matches[2]);
}
print "replaced is:$newmsg\n";
?>
I'd have to agree...  Picking out the tab sections and processing seperately is the most straightforward way to solve this one using php regexs.  My code below does the trick:

-----------------------------------BEGIN CODE-----------------------------------
<?php
$msg = "
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
Lots of text<br />
a new line<br />
[tab] this is a test2 <br /> and another test2 <br /> and another test2[/tab]<br />
";

//break to newline conversion function
function br2nl( $data ) {
   return preg_replace( '/<br.*>/iU', "\n", $data );
}

//find all tags w/ their offsets
preg_match_all( "/\[tab\].*?\[\/tab\]/", $msg, $tagArea, PREG_OFFSET_CAPTURE );

$tempstr = "";      //result string
$curOffset = 0;      //initialize offset

//loop through each tag found and append to result string
foreach( $tagArea[0] as $matchArray ) {
      $matchText = $matchArray[0];      //the tab text
      $newOffset = $matchArray[1];      //offset position of text
      //append text before the current tag
      $tempstr .= substr( $msg, $curOffset, $newOffset - $curOffset );
      $tempstr .= br2nl( $matchText ); //process & append our matched text
      //update offset
      $curOffset = $newOffset + strlen($matchText);
}
//append text after last tag
$tempstr .= substr( $msg, $curOffset );

echo "<pre>\n";
echo "\$tempstr is: $tempstr<br />\n";
echo "</pre>\n";
?>
-----------------------------------END CODE-----------------------------------
Inq123 obviously has the more elegant solution.  Only, it's stripping any text left over after the last tag.  Just need to add one line after the while loop:

<?
$msg = "
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
Lots of text<br />
a new line<br />
[tab] this is a test <br /> and another test <br /> and another test[/tab]<br />
";
$matches = array();
$newmsg = '';
while(preg_match("/(.*?)(\[tab\].*?\[\/tab\])(.*)/is", $msg, $matches))
{
  $msg = $matches[3];
  $newmsg .= $matches[1] . preg_replace("/<br\s*\/>/is", "", $matches[2]);
}
$newmsg .= $msg; //append remainder of text
print "replaced is:$newmsg\n";
?>
ASKER CERTIFIED SOLUTION
Avatar of inq123
inq123

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mindlink

ASKER

Thank you all. It seems to be working now. I'm gonna run some more testing, but it does look like the code is working.