Solved

Get <div> with preg_replace

Posted on 2008-06-16
9
3,508 Views
Last Modified: 2010-04-21
I have a function to extract all <div> from a html code. It gives med the div's id and content, so it can be treated in a function.

It is working very, but the problem is that I now have to get div's inside a div extracted also!

Ex.

<div id="foo">
This is just fill <div id="foo2">This is a new block</div> This is more fill
</div>

How can I do that? I don't necessarily need the parent div, just the ones inside (not containing any child div's)  
// Function to extract div's from HTML code
 
$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
		
$replacements = get_div_content("$1", "$2", "$3");
		
$proccesed_html = preg_replace($pattern, $replacement, $html);

Open in new window

0
Comment
Question by:Thingmand
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
9 Comments
 
LVL 49

Expert Comment

by:Roonaan
ID: 21796521
I think you get a hard time getting this done using preg_replace.

You could try using arrays:
      $parts = explode('</div>', $html);
      foreach($parts as $i => $p) {
            $div_start = strrpos($p, '<div');
            if($div_start === false) {
                  continue;
            }
            $div = substr($p, $div_start).'</div>';
            
                       .. do something with the div html ...
      }
0
 
LVL 51

Expert Comment

by:Steve Bink
ID: 21796524
Sending $proccesed_html[0] back through the same algorithm should isolate the inner <div>.  Basically,make this a function with the possibility of recursion.

For example, say I have this HTML:

This is not in a div<div>div Stuff<div>more div stuff</div>last stuff</div>Out of div again

The first preg should match the entire first div (assuming you're using 'greedy' mode).  Submitting the contents of that div (your '(.*?)' marker) back in should isolate the second div.

BTW, (.*?) is a little redundant, yes?  Any character (.) repeated 0 or more times (*), repeated 0 or 1 times (?).  
0
 
LVL 51

Expert Comment

by:Steve Bink
ID: 21796535
Roonaan's comment addresses the point I left out: use preg_match() or preg_match_all() instead.  If you need to replace text, it might be easier to accomplish once you've already isolated the inner divs (working from the inside out)
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Thingmand
ID: 21797056
Roonaan: Thanks for the idea

routinet: It was on my mind, that I needed to re-run the results, but I thought maybe there was a trick with regs. I don't quite follow the use of preg_match instead? Could you give a simple example?
0
 
LVL 51

Expert Comment

by:Steve Bink
ID: 21810619
On further reflection, I have to agree with Roonaan.  I don't know of a way to express the potential for nesting.  Searching the string manually sounds like what you need.
0
 

Author Comment

by:Thingmand
ID: 21826845
Well, the strange thing is that its working partly! The attach test code find these class id's:

ID: header
ID: menu
ID: page
ID: newsletter
ID: news
ID: content
ID: footer

I can't see the system!
<?php
 
	$html = <<<END
<body>
<!-- start header -->
<div id="header">
	<div id="logo"></div>
	<div id="menu"></div>
</div>
<!-- end header -->
<!-- start page -->
<div id="page">
	<!-- start sidebar -->
	<div id="sidebar">
		<div id="box5"></div>
		<!-- start newsletter form -->
		<div id="newsletter"></div>
		<!-- end newsletter form -->
		<!-- start recent news -->
		<div id="news"></div>
		<!-- end recent news -->
	</div>
	<!-- end sidebar -->
	<!-- start content -->
	<div id="content">
		<div id="box6"></div>
	</div>
	<!-- end content -->
	<div style="clear: both; height: 30px;">&nbsp;</div>
</div>
<!-- end page -->
<div id="footer"></div>
</body>
		
END;
 
 
		
	$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
	
	$replacement = 'get_div_content("$1", "$2", "$3")';
	
	$proccesed_html = preg_replace($pattern, $replacement, $html);
		
		
	function get_div_content($orgDiv, $classID, $rules) {
 
			echo "ID: $classID<br>\n";
	}
	
	echo "<br>Pattern: <pre>" . htmlentities($pattern) . "</pre><br>\n";
	
	echo "<pre>" . htmlentities($html) . "</pre><br><br>\n";
?>

Open in new window

0
 
LVL 49

Accepted Solution

by:
Roonaan earned 500 total points
ID: 21828571
When your html is xhtml you could try parsing it as xml?
0
 

Author Comment

by:Thingmand
ID: 21829039

// $pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
 
// Should be:
 
$pattern = '/(<div.*?id="([a-z0-9_]+)".*?>)(.*?)<\/div>/ise';
 
// It dosn't change the result, though...

Open in new window

0
 

Author Closing Comment

by:Thingmand
ID: 31467728
Good damn, thats a brilliant idea! It works like a charm with xml_parser :o)
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question