Solved

Get <div> with preg_replace

Posted on 2008-06-16
9
3,502 Views
Last Modified: 2010-04-21
I have a function to extract all <div> from a html code. It gives med the div's id and content, so it can be treated in a function.

It is working very, but the problem is that I now have to get div's inside a div extracted also!

Ex.

<div id="foo">
This is just fill <div id="foo2">This is a new block</div> This is more fill
</div>

How can I do that? I don't necessarily need the parent div, just the ones inside (not containing any child div's)  
// Function to extract div's from HTML code
 
$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
		
$replacements = get_div_content("$1", "$2", "$3");
		
$proccesed_html = preg_replace($pattern, $replacement, $html);

Open in new window

0
Comment
Question by:Thingmand
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
9 Comments
 
LVL 49

Expert Comment

by:Roonaan
ID: 21796521
I think you get a hard time getting this done using preg_replace.

You could try using arrays:
      $parts = explode('</div>', $html);
      foreach($parts as $i => $p) {
            $div_start = strrpos($p, '<div');
            if($div_start === false) {
                  continue;
            }
            $div = substr($p, $div_start).'</div>';
            
                       .. do something with the div html ...
      }
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 21796524
Sending $proccesed_html[0] back through the same algorithm should isolate the inner <div>.  Basically,make this a function with the possibility of recursion.

For example, say I have this HTML:

This is not in a div<div>div Stuff<div>more div stuff</div>last stuff</div>Out of div again

The first preg should match the entire first div (assuming you're using 'greedy' mode).  Submitting the contents of that div (your '(.*?)' marker) back in should isolate the second div.

BTW, (.*?) is a little redundant, yes?  Any character (.) repeated 0 or more times (*), repeated 0 or 1 times (?).  
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 21796535
Roonaan's comment addresses the point I left out: use preg_match() or preg_match_all() instead.  If you need to replace text, it might be easier to accomplish once you've already isolated the inner divs (working from the inside out)
0
Salesforce Has Never Been Easier

Improve and reinforce salesforce training & adoption using WalkMe's digital adoption platform. Start saving on costly employee training by creating fast intuitive Walk-Thrus for Salesforce. Claim your Free Account Now

 

Author Comment

by:Thingmand
ID: 21797056
Roonaan: Thanks for the idea

routinet: It was on my mind, that I needed to re-run the results, but I thought maybe there was a trick with regs. I don't quite follow the use of preg_match instead? Could you give a simple example?
0
 
LVL 50

Expert Comment

by:Steve Bink
ID: 21810619
On further reflection, I have to agree with Roonaan.  I don't know of a way to express the potential for nesting.  Searching the string manually sounds like what you need.
0
 

Author Comment

by:Thingmand
ID: 21826845
Well, the strange thing is that its working partly! The attach test code find these class id's:

ID: header
ID: menu
ID: page
ID: newsletter
ID: news
ID: content
ID: footer

I can't see the system!
<?php
 
	$html = <<<END
<body>
<!-- start header -->
<div id="header">
	<div id="logo"></div>
	<div id="menu"></div>
</div>
<!-- end header -->
<!-- start page -->
<div id="page">
	<!-- start sidebar -->
	<div id="sidebar">
		<div id="box5"></div>
		<!-- start newsletter form -->
		<div id="newsletter"></div>
		<!-- end newsletter form -->
		<!-- start recent news -->
		<div id="news"></div>
		<!-- end recent news -->
	</div>
	<!-- end sidebar -->
	<!-- start content -->
	<div id="content">
		<div id="box6"></div>
	</div>
	<!-- end content -->
	<div style="clear: both; height: 30px;">&nbsp;</div>
</div>
<!-- end page -->
<div id="footer"></div>
</body>
		
END;
 
 
		
	$pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
	
	$replacement = 'get_div_content("$1", "$2", "$3")';
	
	$proccesed_html = preg_replace($pattern, $replacement, $html);
		
		
	function get_div_content($orgDiv, $classID, $rules) {
 
			echo "ID: $classID<br>\n";
	}
	
	echo "<br>Pattern: <pre>" . htmlentities($pattern) . "</pre><br>\n";
	
	echo "<pre>" . htmlentities($html) . "</pre><br><br>\n";
?>

Open in new window

0
 
LVL 49

Accepted Solution

by:
Roonaan earned 500 total points
ID: 21828571
When your html is xhtml you could try parsing it as xml?
0
 

Author Comment

by:Thingmand
ID: 21829039

// $pattern = '/(<div.*?id="([a-z09_]+)".*?>)(.*?)<\/div>/ise';
 
// Should be:
 
$pattern = '/(<div.*?id="([a-z0-9_]+)".*?>)(.*?)<\/div>/ise';
 
// It dosn't change the result, though...

Open in new window

0
 

Author Closing Comment

by:Thingmand
ID: 31467728
Good damn, thats a brilliant idea! It works like a charm with xml_parser :o)
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

732 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question