Solved

Get <div> with preg_replace

Posted on 2008-06-16
9
3,482 Views
Last Modified: 2010-04-21
I have a function to extract all <div> from a html code. It gives med the div's id and content, so it can be treated in a function.

It is working very, but the problem is that I now have to get div's inside a div extracted also!

Ex.

<div id="foo">
This is just fill <div id="foo2">This is a new block</div> This is more fill
</div>

How can I do that? I don't necessarily need the parent div, just the ones inside (not containing any child div's)  
// Function to extract div's from HTML code
 

$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';

		

$replacements = get_div_content("$1", "$2", "$3");

		

$proccesed_html = preg_replace($pattern, $replacement, $html);

Open in new window

0
Comment
Question by:Thingmand
  • 4
  • 3
  • 2
9 Comments
 
LVL 49

Expert Comment

by:Roonaan
ID: 21796521
I think you get a hard time getting this done using preg_replace.

You could try using arrays:
      $parts = explode('</div>', $html);
      foreach($parts as $i => $p) {
            $div_start = strrpos($p, '<div');
            if($div_start === false) {
                  continue;
            }
            $div = substr($p, $div_start).'</div>';
            
                       .. do something with the div html ...
      }
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 21796524
Sending $proccesed_html[0] back through the same algorithm should isolate the inner <div>.  Basically,make this a function with the possibility of recursion.

For example, say I have this HTML:

This is not in a div<div>div Stuff<div>more div stuff</div>last stuff</div>Out of div again

The first preg should match the entire first div (assuming you're using 'greedy' mode).  Submitting the contents of that div (your '(.*?)' marker) back in should isolate the second div.

BTW, (.*?) is a little redundant, yes?  Any character (.) repeated 0 or more times (*), repeated 0 or 1 times (?).  
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 21796535
Roonaan's comment addresses the point I left out: use preg_match() or preg_match_all() instead.  If you need to replace text, it might be easier to accomplish once you've already isolated the inner divs (working from the inside out)
0
 

Author Comment

by:Thingmand
ID: 21797056
Roonaan: Thanks for the idea

routinet: It was on my mind, that I needed to re-run the results, but I thought maybe there was a trick with regs. I don't quite follow the use of preg_match instead? Could you give a simple example?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 50

Expert Comment

by:Steve Bink
ID: 21810619
On further reflection, I have to agree with Roonaan.  I don't know of a way to express the potential for nesting.  Searching the string manually sounds like what you need.
0
 

Author Comment

by:Thingmand
ID: 21826845
Well, the strange thing is that its working partly! The attach test code find these class id's:

ID: header
ID: menu
ID: page
ID: newsletter
ID: news
ID: content
ID: footer

I can't see the system!
<?php
 

	$html = <<<END

<body>

<!-- start header -->

<div id="header">

	<div id="logo"></div>

	<div id="menu"></div>

</div>

<!-- end header -->

<!-- start page -->

<div id="page">

	<!-- start sidebar -->

	<div id="sidebar">

		<div id="box5"></div>

		<!-- start newsletter form -->

		<div id="newsletter"></div>

		<!-- end newsletter form -->

		<!-- start recent news -->

		<div id="news"></div>

		<!-- end recent news -->

	</div>

	<!-- end sidebar -->

	<!-- start content -->

	<div id="content">

		<div id="box6"></div>

	</div>

	<!-- end content -->

	<div style="clear: both; height: 30px;">&nbsp;</div>

</div>

<!-- end page -->

<div id="footer"></div>

</body>

		

END;
 
 

		

	$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';

	

	$replacement = 'get_div_content("$1", "$2", "$3")';

	

	$proccesed_html = preg_replace($pattern, $replacement, $html);

		

		

	function get_div_content($orgDiv, $classID, $rules) {
 

			echo "ID: $classID<br>\n";

	}

	

	echo "<br>Pattern: <pre>" . htmlentities($pattern) . "</pre><br>\n";

	

	echo "<pre>" . htmlentities($html) . "</pre><br><br>\n";

?>

Open in new window

0
 
LVL 49

Accepted Solution

by:
Roonaan earned 500 total points
ID: 21828571
When your html is xhtml you could try parsing it as xml?
0
 

Author Comment

by:Thingmand
ID: 21829039

// $pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
 

// Should be:
 

$pattern = '/(<div.*?id="([a-z0-9_]+)".*?>)(.*?)<\/div>/ise';
 

// It dosn't change the result, though...

Open in new window

0
 

Author Closing Comment

by:Thingmand
ID: 31467728
Good damn, thats a brilliant idea! It works like a charm with xml_parser :o)
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will explain how to display the first page of your Microsoft Word documents (e.g. .doc, .docx, etc...) as images in a web page programatically. I have scoured the web on a way to do this unsuccessfully. The goal is to produce something …
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

930 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now