Solved

Data between html tags using regex in php

Posted on 2012-12-31
15
787 Views
Last Modified: 2014-01-28
I wanted to extract info between html tags .
for eg
  <div>
&#9;   <br>
&#9;&#9;  love to marry is an free matrimonial website
&#9;   </br>
&#9;  <div id='first'>
&#9;&#9;  Search for your partner ends here.
&#9;  </div>
&#9;  <div  class="test">
&#9;&#9;   love to marry will let you know what kanyadaan means
&#9;&#9;</div>
       <span>
           Provide your ads on our website at very cheap rate
      </span>
&#9;<div id="second">
&#9;  Mangal sutra bandhan is an hindu ritual where groom .. to read more please visit http://lovetomarry.com/ritual
      <table>
      <tr>
&#9;&#9;<td>mangal sutra bandhan information</td>
&#9;&#9;<td>Find your partner for free </td>
      </tr>
      <tr>
&#9;&#9;<td>send your love sms at http://lovetomarry.com and win exciting prizes  </td>
&#9;&#9;<td>Get your horoscope built from love to marry , to know more about your future your future enroll our website </td>
      </tr>
      </table>
&#9;  <div>
&#9;      Watch latest online movies for free
&#9;  </div>
&#9;
&#9; Get your daily, weekly, monthly,yearly&#9;Horoscope for free only on lovetomarry.com

&#9;</div>

</div>



########### o/p  ##############

if request is for "BR" tags

love to marry is an free matrimonial website

if request is for "DIV" tags w.r.t id "first"
  Search for your partner ends here.

if request is for "DIV" tags w.r.t class "test"
   love to marry will let you know what kanyadaan means.

 if request is for "SPAN"

   Provide your ads on our website at very cheap rate


if request is for "DIV" tags w.r.t id "Second"

&#9;  Mangal sutra bandhan is an hindu ritual where groom .. to read more please visit http://lovetomarry.com/ritual

&#9;  mangal sutra bandhan information

&#9;  Find your partner for free

&#9;  send your love sms at http://lovetomarry.com and win exciting prizes

&#9;  Get your horoscope built from love to marry , to know more about your future your future enroll our website
&#9;  
&#9;  Watch latest online movies for free

&#9;   Get your daily, weekly, monthly,yearly&#9;Horoscope for free only on lovetomarry.com


###################o/p################################&#9;


If request is for "DIV" tags w.r.t id "Second"
over here i want to exclude table but if it contains some other html tags then i have to maintain the text with there respective tags.

extraction of data within the html tags may be with "id","class" or without there attributes.
i have extracted data between tags w.r.t id but it fails if it has some other div in between
0
Comment
Question by:Insoftservice
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
  • 2
  • +2
15 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 38732999
You should really be using an HTML parser for something like this. However, since you asked about regex, you can try:

$tag = 'br';
$attr = '';

// OR

$tag = 'div';
$attr = 'class="test"';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

if (preg_match("/<$tag(?: +[^>]*$attr[^>]*)?>((?:<|[^<](?!/$tag))+)</$tag>/", $html, $matches) {
    echo $matches[1];
}

Open in new window

0
 
LVL 15

Author Comment

by:Insoftservice
ID: 38733013
thx for info HTML parser can also be used.
as you have mention does it will provide right info for id second as it contains div inside div.
Even i am trying for html parser for this issue


Tried but have got this issue.

syntax error, unexpected '^', expecting T_STRING or T_VARIABLE or T_NUM_STRING
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 38733028
Give me a little while to test. I'm not currently near a machine with PHP installed, but I'll have one shortly. I'll post back with a correction.
0
Creating Instructional Tutorials  

For Any Use & On Any Platform

Contextual Guidance at the moment of need helps your employees/users adopt software o& achieve even the most complex tasks instantly. Boost knowledge retention, software adoption & employee engagement with easy solution.

 
LVL 15

Author Comment

by:Insoftservice
ID: 38733045
not an issue. take your own time .

Enjoy your new year that's more important.


Happy new year to you  & your family  and to all EE family
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 38733327
OK, let's try it like this:

$tag = 'br';
$attr = '';

// OR

//$tag = 'div';
//$attr = 'class="test"';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

$pattern = "/<" . $tag . "(?: +[^>]*" . $attr . "[^>]*)?>((?:<|[^<](?!\/" . $tag . ">))+)<\/" . $tag . ">/";

if (preg_match($pattern, $html, $matches)) {
    echo $matches[1];
}

Open in new window

0
 
LVL 15

Author Comment

by:Insoftservice
ID: 38734354
not working dude. its showing blank . html parser not working its showing &nbsp entity.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 50 total points
ID: 38734611
Hmmm...  These are the results I get:

<html>
  <head>
  </head>
  <body>

<?php

$html = <<<HTML
<div>
         <br>
              love to marry is an free matrimonial website
         </br>
        <div id='first'>
              Search for your partner ends here.
        </div>
        <div  class="test">
               love to marry will let you know what kanyadaan means
            </div>
       <span>
           Provide your ads on our website at very cheap rate
      </span>
      <div id="second">
        Mangal sutra bandhan is an hindu ritual where groom .. to read more please visit http://lovetomarry.com/ritual
      <table>
      <tr>
            <td>mangal sutra bandhan information</td>
            <td>Find your partner for free </td>
      </tr>
      <tr>
            <td>send your love sms at http://lovetomarry.com and win exciting prizes  </td>
            <td>Get your horoscope built from love to marry , to know more about your future your future enroll our website </td>
      </tr>
      </table>
        <div>
            Watch latest online movies for free 
        </div>
      
       Get your daily, weekly, monthly,yearly      Horoscope for free only on lovetomarry.com 

      </div> 

</div>
HTML;

$tag = 'br';
$attr = '';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

$pattern = "/<" . $tag . "(?: +[^>]*" . $attr . "[^>]*)?>((?:<|[^<](?!\/" . $tag . ">))+)<\/" . $tag . ">/";

if (preg_match($pattern, $html, $matches)) {
    echo "***********  &lt;br&gt;  ***********<br />";
    echo $matches[1];
    echo "<br /><br />";
}


$tag = 'div';
$attr = 'class="test"';

// ******************* //

$tag = preg_quote($tag);
$attr = preg_quote($attr);

$pattern = "/<" . $tag . "(?: +[^>]*" . $attr . "[^>]*)?>((?:<|[^<](?!\/" . $tag . ">))+)<\/" . $tag . ">/";

if (preg_match($pattern, $html, $matches)) {
    echo "***********  &lt;div class=\"test\"&gt;  ***********<br />";
    echo $matches[1];
}

?>

  </body>
</html>

Open in new window


Screenshot
0
 
LVL 15

Author Comment

by:Insoftservice
ID: 38742858
Html has other tags too i have just provided the example of the div.But it could contain &nbsp; <br> and other html tags too.like <p>@ and so on.

When tried with the code provided it gave me blank result , but when tried with html parser as you suggested it gave me &nbsp; entity error.
0
 
LVL 29

Assisted Solution

by:fibo
fibo earned 150 total points
ID: 38834188
May I suggest that you have a look at "jQuery-php", a php library heavily inspired by jQuery and which allows an easy parsing of DOM files.

You would then use already tested functions...

see http://jquery.hohli.com/
0
 
LVL 39

Accepted Solution

by:
Aaron Tomosky earned 150 total points
ID: 38835690
I used this once, it worked great and is really easy
http://simplehtmldom.sourceforge.net/
0
 
LVL 15

Author Comment

by:Insoftservice
ID: 38836142
thx @fibo & @aarontomosky i will surely try this .
0
 
LVL 29

Expert Comment

by:fibo
ID: 38948954
Hi,

What are the results?
0
 
LVL 9

Assisted Solution

by:Derek Jensen
Derek Jensen earned 150 total points
ID: 39612615
Well, I can't seem to get any closer than this...

Working off kaufmed's code, should look something like this (assuming all LF's & CR's have been stripped out first):

$pattern = "/<" . $tag . "(>|.+?(?<!<|>)>).+?(?<!<" . $tag . ")(.+?)<\/" . $tag . ">/i";

Open in new window

0
 
LVL 15

Author Closing Comment

by:Insoftservice
ID: 39814712
thx
0

Featured Post

Don't Cry: How Liquid Web is Ensuring Security

WannaCry is just the start. Read how Liquid Web is protecting itself and its customers against new threats.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question