Solved

Get <div> with preg_replace

Posted on 2008-06-16
9
3,489 Views
Last Modified: 2010-04-21
I have a function to extract all <div> from a html code. It gives med the div's id and content, so it can be treated in a function.

It is working very, but the problem is that I now have to get div's inside a div extracted also!

Ex.

<div id="foo">
This is just fill <div id="foo2">This is a new block</div> This is more fill
</div>

How can I do that? I don't necessarily need the parent div, just the ones inside (not containing any child div's)  
// Function to extract div's from HTML code
 
$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
		
$replacements = get_div_content("$1", "$2", "$3");
		
$proccesed_html = preg_replace($pattern, $replacement, $html);

Open in new window

0
Comment
Question by:Thingmand
  • 4
  • 3
  • 2
9 Comments
 
LVL 49

Expert Comment

by:Roonaan
ID: 21796521
I think you get a hard time getting this done using preg_replace.

You could try using arrays:
      $parts = explode('</div>', $html);
      foreach($parts as $i => $p) {
            $div_start = strrpos($p, '<div');
            if($div_start === false) {
                  continue;
            }
            $div = substr($p, $div_start).'</div>';
            
                       .. do something with the div html ...
      }
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 21796524
Sending $proccesed_html[0] back through the same algorithm should isolate the inner <div>.  Basically,make this a function with the possibility of recursion.

For example, say I have this HTML:

This is not in a div<div>div Stuff<div>more div stuff</div>last stuff</div>Out of div again

The first preg should match the entire first div (assuming you're using 'greedy' mode).  Submitting the contents of that div (your '(.*?)' marker) back in should isolate the second div.

BTW, (.*?) is a little redundant, yes?  Any character (.) repeated 0 or more times (*), repeated 0 or 1 times (?).  
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 21796535
Roonaan's comment addresses the point I left out: use preg_match() or preg_match_all() instead.  If you need to replace text, it might be easier to accomplish once you've already isolated the inner divs (working from the inside out)
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 

Author Comment

by:Thingmand
ID: 21797056
Roonaan: Thanks for the idea

routinet: It was on my mind, that I needed to re-run the results, but I thought maybe there was a trick with regs. I don't quite follow the use of preg_match instead? Could you give a simple example?
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 21810619
On further reflection, I have to agree with Roonaan.  I don't know of a way to express the potential for nesting.  Searching the string manually sounds like what you need.
0
 

Author Comment

by:Thingmand
ID: 21826845
Well, the strange thing is that its working partly! The attach test code find these class id's:

ID: header
ID: menu
ID: page
ID: newsletter
ID: news
ID: content
ID: footer

I can't see the system!
<?php
 
	$html = <<<END
<body>
<!-- start header -->
<div id="header">
	<div id="logo"></div>
	<div id="menu"></div>
</div>
<!-- end header -->
<!-- start page -->
<div id="page">
	<!-- start sidebar -->
	<div id="sidebar">
		<div id="box5"></div>
		<!-- start newsletter form -->
		<div id="newsletter"></div>
		<!-- end newsletter form -->
		<!-- start recent news -->
		<div id="news"></div>
		<!-- end recent news -->
	</div>
	<!-- end sidebar -->
	<!-- start content -->
	<div id="content">
		<div id="box6"></div>
	</div>
	<!-- end content -->
	<div style="clear: both; height: 30px;">&nbsp;</div>
</div>
<!-- end page -->
<div id="footer"></div>
</body>
		
END;
 
 
		
	$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
	
	$replacement = 'get_div_content("$1", "$2", "$3")';
	
	$proccesed_html = preg_replace($pattern, $replacement, $html);
		
		
	function get_div_content($orgDiv, $classID, $rules) {
 
			echo "ID: $classID<br>\n";
	}
	
	echo "<br>Pattern: <pre>" . htmlentities($pattern) . "</pre><br>\n";
	
	echo "<pre>" . htmlentities($html) . "</pre><br><br>\n";
?>

Open in new window

0
 
LVL 49

Accepted Solution

by:
Roonaan earned 500 total points
ID: 21828571
When your html is xhtml you could try parsing it as xml?
0
 

Author Comment

by:Thingmand
ID: 21829039

// $pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
 
// Should be:
 
$pattern = '/(<div.*?id="([a-z0-9_]+)".*?>)(.*?)<\/div>/ise';
 
// It dosn't change the result, though...

Open in new window

0
 

Author Closing Comment

by:Thingmand
ID: 31467728
Good damn, thats a brilliant idea! It works like a charm with xml_parser :o)
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article will explain how to display the first page of your Microsoft Word documents (e.g. .doc, .docx, etc...) as images in a web page programatically. I have scoured the web on a way to do this unsuccessfully. The goal is to produce something …
These days socially coordinated efforts have turned into a critical requirement for enterprises.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

785 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question