Avatar of Insoftservice inso
Insoftservice inso
Flag for India asked on

Data between html tags using regex in php

I wanted to extract info between html tags .
for eg
  <div>
&#9;   <br>
&#9;&#9;  love to marry is an free matrimonial website
&#9;   </br>
&#9;  <div id='first'>
&#9;&#9;  Search for your partner ends here.
&#9;  </div>
&#9;  <div  class="test">
&#9;&#9;   love to marry will let you know what kanyadaan means
&#9;&#9;</div>
       <span>
           Provide your ads on our website at very cheap rate
      </span>
&#9;<div id="second">
&#9;  Mangal sutra bandhan is an hindu ritual where groom .. to read more please visit http://lovetomarry.com/ritual
      <table>
      <tr>
&#9;&#9;<td>mangal sutra bandhan information</td>
&#9;&#9;<td>Find your partner for free </td>
      </tr>
      <tr>
&#9;&#9;<td>send your love sms at http://lovetomarry.com and win exciting prizes  </td>
&#9;&#9;<td>Get your horoscope built from love to marry , to know more about your future your future enroll our website </td>
      </tr>
      </table>
&#9;  <div>
&#9;      Watch latest online movies for free
&#9;  </div>
&#9;
&#9; Get your daily, weekly, monthly,yearly&#9;Horoscope for free only on lovetomarry.com

&#9;</div>

</div>



########### o/p  ##############

if request is for "BR" tags

love to marry is an free matrimonial website

if request is for "DIV" tags w.r.t id "first"
  Search for your partner ends here.

if request is for "DIV" tags w.r.t class "test"
   love to marry will let you know what kanyadaan means.

 if request is for "SPAN"

   Provide your ads on our website at very cheap rate


if request is for "DIV" tags w.r.t id "Second"

&#9;  Mangal sutra bandhan is an hindu ritual where groom .. to read more please visit http://lovetomarry.com/ritual

&#9;  mangal sutra bandhan information

&#9;  Find your partner for free

&#9;  send your love sms at http://lovetomarry.com and win exciting prizes

&#9;  Get your horoscope built from love to marry , to know more about your future your future enroll our website
&#9;  
&#9;  Watch latest online movies for free

&#9;   Get your daily, weekly, monthly,yearly&#9;Horoscope for free only on lovetomarry.com


###################o/p################################&#9;


If request is for "DIV" tags w.r.t id "Second"
over here i want to exclude table but if it contains some other html tags then i have to maintain the text with there respective tags.

extraction of data within the html tags may be with "id","class" or without there attributes.
i have extracted data between tags w.r.t id but it fails if it has some other div in between
PHPRegular Expressions

Avatar of undefined
Last Comment
Insoftservice inso

8/22/2022 - Mon
kaufmed

You should really be using an HTML parser for something like this. However, since you asked about regex, you can try:

$tag = 'br';
$attr = '';

// OR

$tag = 'div';
$attr = 'class="test"';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

if (preg_match("/<$tag(?: +[^>]*$attr[^>]*)?>((?:<|[^<](?!/$tag))+)</$tag>/", $html, $matches) {
    echo $matches[1];
}

Open in new window

Insoftservice inso

ASKER
thx for info HTML parser can also be used.
as you have mention does it will provide right info for id second as it contains div inside div.
Even i am trying for html parser for this issue


Tried but have got this issue.

syntax error, unexpected '^', expecting T_STRING or T_VARIABLE or T_NUM_STRING
kaufmed

Give me a little while to test. I'm not currently near a machine with PHP installed, but I'll have one shortly. I'll post back with a correction.
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
Insoftservice inso

ASKER
not an issue. take your own time .

Enjoy your new year that's more important.


Happy new year to you  & your family  and to all EE family
kaufmed

OK, let's try it like this:

$tag = 'br';
$attr = '';

// OR

//$tag = 'div';
//$attr = 'class="test"';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

$pattern = "/<" . $tag . "(?: +[^>]*" . $attr . "[^>]*)?>((?:<|[^<](?!\/" . $tag . ">))+)<\/" . $tag . ">/";

if (preg_match($pattern, $html, $matches)) {
    echo $matches[1];
}

Open in new window

Insoftservice inso

ASKER
not working dude. its showing blank . html parser not working its showing &nbsp entity.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
SOLUTION
kaufmed

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Insoftservice inso

ASKER
Html has other tags too i have just provided the example of the div.But it could contain &nbsp; <br> and other html tags too.like <p>@ and so on.

When tried with the code provided it gave me blank result , but when tried with html parser as you suggested it gave me &nbsp; entity error.
SOLUTION
Bernard Savonet

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
Aaron Tomosky

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Insoftservice inso

ASKER
thx @fibo & @aarontomosky i will surely try this .
Bernard Savonet

Hi,

What are the results?
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
SOLUTION
Derek Jensen

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Insoftservice inso

ASKER
thx