Data between html tags using regex in php

I wanted to extract info between html tags .
for eg
  <div>
&#9;   <br>
&#9;&#9;  love to marry is an free matrimonial website
&#9;   </br>
&#9;  <div id='first'>
&#9;&#9;  Search for your partner ends here.
&#9;  </div>
&#9;  <div  class="test">
&#9;&#9;   love to marry will let you know what kanyadaan means
&#9;&#9;</div>
       <span>
           Provide your ads on our website at very cheap rate
      </span>
&#9;<div id="second">
&#9;  Mangal sutra bandhan is an hindu ritual where groom .. to read more please visit http://lovetomarry.com/ritual
      <table>
      <tr>
&#9;&#9;<td>mangal sutra bandhan information</td>
&#9;&#9;<td>Find your partner for free </td>
      </tr>
      <tr>
&#9;&#9;<td>send your love sms at http://lovetomarry.com and win exciting prizes  </td>
&#9;&#9;<td>Get your horoscope built from love to marry , to know more about your future your future enroll our website </td>
      </tr>
      </table>
&#9;  <div>
&#9;      Watch latest online movies for free
&#9;  </div>
&#9;
&#9; Get your daily, weekly, monthly,yearly&#9;Horoscope for free only on lovetomarry.com

&#9;</div>

</div>



########### o/p  ##############

if request is for "BR" tags

love to marry is an free matrimonial website

if request is for "DIV" tags w.r.t id "first"
  Search for your partner ends here.

if request is for "DIV" tags w.r.t class "test"
   love to marry will let you know what kanyadaan means.

 if request is for "SPAN"

   Provide your ads on our website at very cheap rate


if request is for "DIV" tags w.r.t id "Second"

&#9;  Mangal sutra bandhan is an hindu ritual where groom .. to read more please visit http://lovetomarry.com/ritual

&#9;  mangal sutra bandhan information

&#9;  Find your partner for free

&#9;  send your love sms at http://lovetomarry.com and win exciting prizes

&#9;  Get your horoscope built from love to marry , to know more about your future your future enroll our website
&#9;  
&#9;  Watch latest online movies for free

&#9;   Get your daily, weekly, monthly,yearly&#9;Horoscope for free only on lovetomarry.com


###################o/p################################&#9;


If request is for "DIV" tags w.r.t id "Second"
over here i want to exclude table but if it contains some other html tags then i have to maintain the text with there respective tags.

extraction of data within the html tags may be with "id","class" or without there attributes.
i have extracted data between tags w.r.t id but it fails if it has some other div in between
LVL 15
InsoftserviceAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

käµfm³d 👽Commented:
You should really be using an HTML parser for something like this. However, since you asked about regex, you can try:

$tag = 'br';
$attr = '';

// OR

$tag = 'div';
$attr = 'class="test"';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

if (preg_match("/<$tag(?: +[^>]*$attr[^>]*)?>((?:<|[^<](?!/$tag))+)</$tag>/", $html, $matches) {
    echo $matches[1];
}

Open in new window

InsoftserviceAuthor Commented:
thx for info HTML parser can also be used.
as you have mention does it will provide right info for id second as it contains div inside div.
Even i am trying for html parser for this issue


Tried but have got this issue.

syntax error, unexpected '^', expecting T_STRING or T_VARIABLE or T_NUM_STRING
käµfm³d 👽Commented:
Give me a little while to test. I'm not currently near a machine with PHP installed, but I'll have one shortly. I'll post back with a correction.
Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

InsoftserviceAuthor Commented:
not an issue. take your own time .

Enjoy your new year that's more important.


Happy new year to you  & your family  and to all EE family
käµfm³d 👽Commented:
OK, let's try it like this:

$tag = 'br';
$attr = '';

// OR

//$tag = 'div';
//$attr = 'class="test"';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

$pattern = "/<" . $tag . "(?: +[^>]*" . $attr . "[^>]*)?>((?:<|[^<](?!\/" . $tag . ">))+)<\/" . $tag . ">/";

if (preg_match($pattern, $html, $matches)) {
    echo $matches[1];
}

Open in new window

InsoftserviceAuthor Commented:
not working dude. its showing blank . html parser not working its showing &nbsp entity.
käµfm³d 👽Commented:
Hmmm...  These are the results I get:

<html>
  <head>
  </head>
  <body>

<?php

$html = <<<HTML
<div>
         <br>
              love to marry is an free matrimonial website
         </br>
        <div id='first'>
              Search for your partner ends here.
        </div>
        <div  class="test">
               love to marry will let you know what kanyadaan means
            </div>
       <span>
           Provide your ads on our website at very cheap rate
      </span>
      <div id="second">
        Mangal sutra bandhan is an hindu ritual where groom .. to read more please visit http://lovetomarry.com/ritual
      <table>
      <tr>
            <td>mangal sutra bandhan information</td>
            <td>Find your partner for free </td>
      </tr>
      <tr>
            <td>send your love sms at http://lovetomarry.com and win exciting prizes  </td>
            <td>Get your horoscope built from love to marry , to know more about your future your future enroll our website </td>
      </tr>
      </table>
        <div>
            Watch latest online movies for free 
        </div>
      
       Get your daily, weekly, monthly,yearly      Horoscope for free only on lovetomarry.com 

      </div> 

</div>
HTML;

$tag = 'br';
$attr = '';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

$pattern = "/<" . $tag . "(?: +[^>]*" . $attr . "[^>]*)?>((?:<|[^<](?!\/" . $tag . ">))+)<\/" . $tag . ">/";

if (preg_match($pattern, $html, $matches)) {
    echo "***********  &lt;br&gt;  ***********<br />";
    echo $matches[1];
    echo "<br /><br />";
}


$tag = 'div';
$attr = 'class="test"';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

$pattern = "/<" . $tag . "(?: +[^>]*" . $attr . "[^>]*)?>((?:<|[^<](?!\/" . $tag . ">))+)<\/" . $tag . ">/";

if (preg_match($pattern, $html, $matches)) {
    echo "***********  &lt;div class=\"test\"&gt;  ***********<br />";
    echo $matches[1];
}

?>

  </body>
</html>

Open in new window


Screenshot
InsoftserviceAuthor Commented:
Html has other tags too i have just provided the example of the div.But it could contain &nbsp; <br> and other html tags too.like <p>@ and so on.

When tried with the code provided it gave me blank result , but when tried with html parser as you suggested it gave me &nbsp; entity error.
Bernard S.CTOCommented:
May I suggest that you have a look at "jQuery-php", a php library heavily inspired by jQuery and which allows an easy parsing of DOM files.

You would then use already tested functions...

see http://jquery.hohli.com/
Aaron TomoskyDirector of Solutions ConsultingCommented:
I used this once, it worked great and is really easy
http://simplehtmldom.sourceforge.net/

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
InsoftserviceAuthor Commented:
thx @fibo & @aarontomosky i will surely try this .
Bernard S.CTOCommented:
Hi,

What are the results?
Derek JensenCommented:
Well, I can't seem to get any closer than this...

Working off kaufmed's code, should look something like this (assuming all LF's & CR's have been stripped out first):

$pattern = "/<" . $tag . "(>|.+?(?<!<|>)>).+?(?<!<" . $tag . ")(.+?)<\/" . $tag . ">/i";

Open in new window

InsoftserviceAuthor Commented:
thx
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.