Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3571
  • Last Modified:

Get <div> with preg_replace

I have a function to extract all <div> from a html code. It gives med the div's id and content, so it can be treated in a function.

It is working very, but the problem is that I now have to get div's inside a div extracted also!

Ex.

<div id="foo">
This is just fill <div id="foo2">This is a new block</div> This is more fill
</div>

How can I do that? I don't necessarily need the parent div, just the ones inside (not containing any child div's)  
// Function to extract div's from HTML code
 
$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
		
$replacements = get_div_content("$1", "$2", "$3");
		
$proccesed_html = preg_replace($pattern, $replacement, $html);

Open in new window

0
Thingmand
Asked:
Thingmand
  • 4
  • 3
  • 2
1 Solution
 
RoonaanCommented:
I think you get a hard time getting this done using preg_replace.

You could try using arrays:
      $parts = explode('</div>', $html);
      foreach($parts as $i => $p) {
            $div_start = strrpos($p, '<div');
            if($div_start === false) {
                  continue;
            }
            $div = substr($p, $div_start).'</div>';
            
                       .. do something with the div html ...
      }
0
 
Steve BinkCommented:
Sending $proccesed_html[0] back through the same algorithm should isolate the inner <div>.  Basically,make this a function with the possibility of recursion.

For example, say I have this HTML:

This is not in a div<div>div Stuff<div>more div stuff</div>last stuff</div>Out of div again

The first preg should match the entire first div (assuming you're using 'greedy' mode).  Submitting the contents of that div (your '(.*?)' marker) back in should isolate the second div.

BTW, (.*?) is a little redundant, yes?  Any character (.) repeated 0 or more times (*), repeated 0 or 1 times (?).  
0
 
Steve BinkCommented:
Roonaan's comment addresses the point I left out: use preg_match() or preg_match_all() instead.  If you need to replace text, it might be easier to accomplish once you've already isolated the inner divs (working from the inside out)
0
The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

 
ThingmandAuthor Commented:
Roonaan: Thanks for the idea

routinet: It was on my mind, that I needed to re-run the results, but I thought maybe there was a trick with regs. I don't quite follow the use of preg_match instead? Could you give a simple example?
0
 
Steve BinkCommented:
On further reflection, I have to agree with Roonaan.  I don't know of a way to express the potential for nesting.  Searching the string manually sounds like what you need.
0
 
ThingmandAuthor Commented:
Well, the strange thing is that its working partly! The attach test code find these class id's:

ID: header
ID: menu
ID: page
ID: newsletter
ID: news
ID: content
ID: footer

I can't see the system!
<?php
 
	$html = <<<END
<body>
<!-- start header -->
<div id="header">
	<div id="logo"></div>
	<div id="menu"></div>
</div>
<!-- end header -->
<!-- start page -->
<div id="page">
	<!-- start sidebar -->
	<div id="sidebar">
		<div id="box5"></div>
		<!-- start newsletter form -->
		<div id="newsletter"></div>
		<!-- end newsletter form -->
		<!-- start recent news -->
		<div id="news"></div>
		<!-- end recent news -->
	</div>
	<!-- end sidebar -->
	<!-- start content -->
	<div id="content">
		<div id="box6"></div>
	</div>
	<!-- end content -->
	<div style="clear: both; height: 30px;">&nbsp;</div>
</div>
<!-- end page -->
<div id="footer"></div>
</body>
		
END;
 
 
		
	$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
	
	$replacement = 'get_div_content("$1", "$2", "$3")';
	
	$proccesed_html = preg_replace($pattern, $replacement, $html);
		
		
	function get_div_content($orgDiv, $classID, $rules) {
 
			echo "ID: $classID<br>\n";
	}
	
	echo "<br>Pattern: <pre>" . htmlentities($pattern) . "</pre><br>\n";
	
	echo "<pre>" . htmlentities($html) . "</pre><br><br>\n";
?>

Open in new window

0
 
RoonaanCommented:
When your html is xhtml you could try parsing it as xml?
0
 
ThingmandAuthor Commented:

// $pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
 
// Should be:
 
$pattern = '/(<div.*?id="([a-z0-9_]+)".*?>)(.*?)<\/div>/ise';
 
// It dosn't change the result, though...

Open in new window

0
 
ThingmandAuthor Commented:
Good damn, thats a brilliant idea! It works like a charm with xml_parser :o)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

  • 4
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now