Link to home
Start Free TrialLog in
Avatar of xybx
xybxFlag for United States of America

asked on

Get text between elements with regex

OK, this ought to be a piece of cake, but for whatever reason, it's become a big deal. I am simply trying to return text between two elements.

For example, suppose I have a form, and it contains

<text>
whatever I want here

maybe there are line breaks
</text>

Now, using PHP, I ought to be able to grab what is in the <text> tags and do other stuff with it.

I've tried madness like, preg_match_all("/(<text>blah<\/text>)/",$form_input,$matches) and then looping through it, to no avail. Note that it won't even match on '<t'. I'm [obviously?] new to PHP, and I have read the docs for this, but regardless of my variations, nothing has worked.

thanks for the help

Avatar of KennyTM
KennyTM

Hi.

Try

$form_input = '3453535<text>2342345345423</text>asdfafsf<text>2342cvxcv423</text>sfsdfsdf';
 preg_match_all("|<text>(.*)</text>|U",$form_input,$matches);
print_r ($matches); // should display the text between the <text>'s.
ASKER CERTIFIED SOLUTION
Avatar of KennyTM
KennyTM

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Very closer to what you want was discussed here: https://www.experts-exchange.com/questions/21817029/Extracting-the-text-between-two-given-tags-or-patterns.html

If I clearly understood, then you want to parse something posted through your form. Then the posted data will be in variable, for example:
<?php
$_POST['message'] = "<text>
whatever I want here

maybe there are line breaks
</text>";
?>

So you can try to use Roonaan's variant:
<?php
function getTagText($textString, $tagName)
{
    $tag_begin = $tagName;
    $tag_end = (3 === func_num_args()) ? func_get_arg(2) : $tagName;
    $start = stripos($textString, '<'.$tag_begin.'>')+strlen($tag_begin)+2;
    $end  = stripos($textString, '</'.$tag_end.'>', $start);
    if($start && $end) return substr($textString, $start, $end-$start);
    return '';
}

var_dump(getTagText($_POST['message'], 'text'));
?>

Or if, you want to extract text from all tags, for example some one posted data with more tan one tag-sets:
<?php
$_POST['message'] = "<text>
whatever I want here

maybe there are line breaks
</text><text> another text
goes here</text>";
?>

You can modiify my variant so it would be like this:
<?php
function getTagText($textString, $tagName)
{
    $tag_begin = $tagName;
    $tag_end = (3 === func_num_args()) ? func_get_arg(2) : $tagName;
    preg_match_all("/<{$tag_begin}[^>]*>(?P<text>.*)<\/{$tag_end}>/Usi", $textString, $matches, PREG_PATTERN_ORDER);
    return $matches["text"];
}

var_dump(getTagText($_POST['message'], 'text'));
?>


PS The only one thing you got to fix or notice - it do not understand something like "<text> some text <text>cascade goes...</text> here</text>" So you got to note this, or simply modify it so it could understand this. It's not hard to do.
use
 preg_match("|<text>(.*)</text>|Uis",$input,$matches);

print_r($matches);
To make that not greedy add the ? after the .*:
 preg_match("|<text>(.*?)</text>|Uis",$input,$matches);
Re BogoJoker:
The /U flag marks ungreediness already. The extra ? makes the search greedy again (double negative = positve).
Oh, didn't know that, hehe thanks =)
Avatar of xybx

ASKER

Thanks everyone for the help. I ended up solving this one like I needed..

Here's what I ended up with after too much work.

function ReadBetweenTags($tag,$strInput) {
      global $orig;
      $test = $orig;
      $pattern = "|^([\w\W\r\n]*?)<".$tag.">([\w\W\r\n]*?)</".$tag.">([\w\W\r\n]*)$|Ui";
      
      while(preg_match($pattern, $test, $result)) {
            $before = $result[1];
            $match = $result[2];
            $after = $result[3];
            $test="$before<pre class=\"text\">".HTMLOutput($match)."</pre>$after";
      }
      return $test;
}

I'll just divy up the points for those whom I felt helped the most.