• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 886
  • Last Modified:

Alter Code to Capture Content from DIV's

I got a code from the URL below (also attached) that captures meta data the from a URL. I was wondering, how do I change the code to instead of capturing content from between the <TITLE> and </TITLE> tags, capturing content between every <DIV> and </DIV> tag?

Ref. http://www.drquincy.com/resources/tutorials/webserverside/getremotewebpageinfo/
<?php
 
    $url = "http://www.drquincy.com/";
    
    $fp = fopen( $url, 'r' );
    
    $content = "";
    
 
    while( !feof( $fp ) ) {
    
       $buffer = trim( fgets( $fp, 4096 ) );
       $content .= $buffer;
       
    }
    
    $start = '<title>';
    $end = '<\/title>';
    
    preg_match( "/$start(.*)$end/s", $content, $match );
    $title = $match[ 1 ]; 
    
    $metatagarray = get_meta_tags( $url );
    $keywords = $metatagarray[ "keywords" ];
    $description = $metatagarray[ "description" ];
    
    echo "<div><strong>URL:</strong> $url</div>\n";
    echo "<div><strong>Title:</strong> $title</div>\n";
    echo "<div><strong>Description:</strong> $description</div>\n";
    echo "<div><strong>Keywords:</strong> $keywords</div>\n";
 
?>

Open in new window

0
EMB01
Asked:
EMB01
  • 13
  • 9
1 Solution
 
Ray PaseurCommented:
We don't have all of the code there, I think.  But you might try changing line 17 and 18 as follows...
    $start = '<div>';
    $end = '<\/div>';

Open in new window

0
 
EMB01Author Commented:
Yeah, but will that capture every DIV or just the first DIV?
0
 
Ray PaseurCommented:
It may get only the first, but I'm experimenting with it now.  I'll post a working script for you shortly.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Ray PaseurCommented:
See if this does the trick.  Best regards, ~Ray
<?php // RAY_find_div_tags.php
// MODIFIED TO GRAB ALL THE DIVs
// SEE http://us.php.net/manual/en/function.preg-match-all.php#76388
function findinside($start, $end, $string)
{
   preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $string, $m);
   return $m[1];
}
 
 
    $url = "http://www.drquincy.com/";
 
    $fp = fopen( $url, 'r' );
 
    $content = "";
 
 
    while( !feof( $fp ) ) {
 
       $buffer = trim( fgets( $fp, 4096 ) );
       $content .= $buffer;
 
    }
 
    $start = '<title>';
    $end = '<\/title>';
 
    preg_match( "/$start(.*)$end/s", $content, $match );
    $title = $match[ 1 ];
 
// MODIFIED TO GRAB ALL THE DIVs
    $start = '<div>';
    $end = '</div>';
    $all_divs = findinside($start, $end, $content);
 
 
    $metatagarray = get_meta_tags( $url );
    $keywords = $metatagarray[ "keywords" ];
    $description = $metatagarray[ "description" ];
 
    echo "<div><strong>URL:</strong> $url</div>\n";
    echo "<div><strong>Title:</strong> $title</div>\n";
    echo "<div><strong>Description:</strong> $description</div>\n";
    echo "<div><strong>Keywords:</strong> $keywords</div>\n";
 
// MODIFIED TO GRAB ALL THE DIVs
    echo "<div><strong>" . number_format(count($all_divs)) . " DIVs:</strong></div>\n";
    foreach ($all_divs as $one_div)
    {
       echo "<br/><strong>DIV: </strong>" . htmlentities($one_div) . "\n";
    }
?>

Open in new window

0
 
EMB01Author Commented:
Hey, Ray. Thanks for your code. I have a question: When I point the $url at a pag I know is completely comprised of DIVs such as this:
http://www.emarketbuilders.com

It doesn't pick up anything. Is this because the DIV tags look like this:
<div class="divclass">Content</div>

And, not:
<div>Content</div>

How can I alter the script so that it picks up DIV's with properties as most may have?
0
 
Ray PaseurCommented:
I would try this on line 32.  See if that does the trick.
    $start = '<div';

Open in new window

0
 
Ray PaseurCommented:
This should be a little more useful.  HTH, ~Ray
<?php // RAY_find_div_tags.php
 
// MODIFIED TO GRAB ALL THE DIVs
// SEE http://us.php.net/manual/en/function.preg-match-all.php#76388
function findinside($start, $end, $string)
{
   preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $string, $m);
   return $m[1];
}
 
 
$url = "http://www.drquincy.com/";
$content = file_get_contents($url);
 
$start = '<title>';
$end = '<\/title>';
preg_match( "/$start(.*)$end/s", $content, $match );
$title = $match[ 1 ];
 
// MODIFIED TO GRAB ALL THE DIVs
$start = '<div';
$end = '</div>';
$all_divs = findinside($start, $end, $content);
 
$metatagarray = get_meta_tags( $url );
$keywords = $metatagarray[ "keywords" ];
$description = $metatagarray[ "description" ];
 
echo "<div><strong>URL:</strong> $url</div>\n";
echo "<div><strong>Title:</strong> $title</div>\n";
echo "<div><strong>Description:</strong> $description</div>\n";
echo "<div><strong>Keywords:</strong> $keywords</div>\n";
 
// MODIFIED TO GRAB ALL THE DIVs
echo "<div><strong>" . number_format(count($all_divs)) . " DIVs:</strong></div>\n";
foreach ($all_divs as $one_div)
{
   echo "<br/><strong>DIV: </strong>" . htmlentities('<div'.$one_div) . "\n";
}
?>

Open in new window

0
 
EMB01Author Commented:
That was the first thing I tried before my response ($start = '<div';). Is the code you posted different than that single change?
0
 
Ray PaseurCommented:
Only slightly.  See line 38.
0
 
EMB01Author Commented:
I see. That only reads one DIV while there are many in the page:
1 DIVs:

DIV: <div class="arrowlink"><a href="/articles/" title="Articles"><strong>Articles</strong></a> <br /> <a href="/promos/" title="Promotions"><strong>Promotions</strong></a></div> <h4>Testimonials</h4> <div class="tcontainer"> <div class="tquote">"Excellent post-production support!"</div> <div class="tfrom">Ben Baier of New Wave Lending</div>
0
 
Ray PaseurCommented:
That's not the output I get from running what I posted.  I tested it before posting.
0
 
EMB01Author Commented:
I mean when posting the $url at the following address (as from #23644818):
http://www.emarketbuilders.com
0
 
Ray PaseurCommented:
It would have been useful to get the desired test case up front, eh?!

Try this.  It seems to find more divs, but then you get to the issue of the nested DIV statements, etc.  How you would want to handle that is up to you.  You might consider a recursive algorithm if you wanted to go to "N" levels of nesting.
<?php // RAY_parse_div_tags.php
 
// MODIFIED TO GRAB ALL THE DIVs
// SEE http://us.php.net/manual/en/function.preg-match-all.php#76388
function findinside($start, $end, $string)
{
   preg_match_all('/' . preg_quote($start, '/') . '([^\.)]+)'. preg_quote($end, '/').'/i', $string, $m);
   return $m[1];
}
 
 
$url = "http://www.emarketbuilders.com/";
$content = file_get_contents($url);
 
$start = '<title>';
$end = '<\/title>';
preg_match( "/$start(.*)$end/s", $content, $match );
$title = $match[ 1 ];
 
 
$metatagarray = get_meta_tags( $url );
$keywords = $metatagarray[ "keywords" ];
$description = $metatagarray[ "description" ];
 
echo "<div><strong>URL:</strong> $url</div>\n";
echo "<div><strong>Title:</strong> $title</div>\n";
echo "<div><strong>Description:</strong> $description</div>\n";
echo "<div><strong>Keywords:</strong> $keywords</div>\n";
 
// MODIFIED TO GRAB ALL THE DIVs
$content = strip_tags($content, '<div>');
$start = '<div';
$end = '</div>';
$all_divs = findinside($start, $end, $content);
 
// MODIFIED TO GRAB ALL THE DIVs
echo "<div><strong>" . number_format(count($all_divs)) . " DIVs:</strong></div>\n";
foreach ($all_divs as $one_div)
{
   echo "<br/><strong>DIV: </strong>" . htmlentities('<div'.$one_div) . "\n";
}
?>

Open in new window

0
 
EMB01Author Commented:
Excuse my misunderstanding, I just wanted to change the script that would capture all DIV's from a webpage. I had no idea of nested DIV statements. With the newest script, I get the following output; however, there are more DIV's than are echoed, plus, some don't seem accurate:
8 DIVs:

DIV: <div id="container"> <div class="header"> <div class="logot"> </div> <div class="navigation"> <div id="cssMenu1" class="horizontal"> Solutions Web Development Web Design E-Commerce Content Management Graphic Design Multimedia Design Print Media Internet Marketing Logo Design Services SEO SEM SEC Link Campaign Corporate Identity Consulting Email Marketing Guerilla Marketing Web Analytics Company Portfolio Contact Sign In </div> </div>
DIV: <div class="sscontainer"> <div class="solutions">Solutions Web Development Web Design E-Commerce Content Management Graphic Design Multimedia Design Print Media Internet Marketing Logo Design </div> <div class="services">Services SEO SEM SEC Link Campaign Corporate Identity Consulting Email Marketing Guerilla Marketing Web Analytics </div>
DIV: <div class="ads"> </div> </div> <div class="centerpanel" id="centercolumn"> Announcements <div class="acontainer"> <div class="aphoto">
DIV: <div class="acontainer"> <div class="aphoto">
DIV: <div class="acontainer"> <div class="aphoto"> </div> <div class="atext">C3 Finished! With more than they originally bargained for, C3 is now complete!</div> </div> <div class="acontainer"> <div class="aphoto">
DIV: <div class="acontainer"> <div class="aphoto">
DIV: <div class="rightpanel" id="rightcolumn"> Request Free Quote Fill out a quick contact form and let EMB do the rest! Advertisement <div class="ads"> </div> Resources <div class="arrowlink">Articles Promotions</div> Testimonials <div class="tcontainer"> <div class="tquote">"Excellent post-production support!"</div> <div class="tfrom">Ben Baier of New Wave Lending</div> </div> Read More Testimonials Showcase <div class="scontainer"> <div class="sshot"> </div> <div class="stext">New Wave Lending Web Design</div> </div> </div>
DIV: <div class="logo"></div>
0
 
Ray PaseurCommented:
Which ones seem to be missing?  Which are inaccurate?
0
 
EMB01Author Commented:
Well, attached is the source of that page. Notice the entire page is comprised of DIV's. The following DIV's seem inaccurate:
<div class="acontainer"> <div class="aphoto">
(It's empty.)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>EMB Web Design</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<meta name="Keywords" content="EMB, web design, web development, seo, ecommerce, emarketing, full service, ohio, canton" />
<meta name="Description" content="EMB is a full-service Web Design &amp; Development company that offers such solutions to and services for web design, web development, SEO, and much more!" />
 
<script type="text/javascript" src="http://127.0.0.1:37935/xpopup.js"></script><script src="includes/cssmenus2/js/cssmenus.js" type="text/javascript"></script>
<script type="text/javascript" src="includes/nifty/equalcolumns.js"></script>
<link href="home_style.css" rel="stylesheet" type="text/css" />
<link href="includes/cssmenus2/skins/emb/horizontal.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="container">
  <div class="header">
    <div class="logot"><a href="/index.php" title="EMB Web Design"><img src="logos/logo_100.gif" alt="EMB Web Design" width="100" height="42" border="0" /></a> </div>
    <div class="navigation">
	
<div id="cssMenu1" class="horizontal">
  <ul class="emb">
    <li> <a href="/solutions/" title="Solutions">Solutions</a>
        <ul>
          <li> <a href="/solutions/web_development.php" title="Web Development">Web Development</a> </li>
          <li> <a href="/solutions/web_design.php" title="Web Design">Web Design</a> </li>
          <li> <a href="/solutions/e_commerce.php" title="E-Commerce">E-Commerce</a> </li>
          <li> <a href="/solutions/content_management.php" title="Content Management">Content Management</a> </li>
          <li> <a href="/solutions/graphic_design.php" title="Graphic Design">Graphic Design</a> </li>
          <li> <a href="/solutions/multimedia_design.php" title="Multimedia Design">Multimedia Design</a> </li>
          <li> <a href="/solutions/print_media.php" title="Print Media">Print Media</a> </li>
          <li> <a href="/solutions/internet_marketing.php" title="Internet Marketing">Internet Marketing</a> </li>
          <li> <a href="/solutions/logo_design.php" title="Logo Design">Logo Design</a> </li>
        </ul>
    </li>
    <li> <a href="/services/" title="Services">Services</a>
        <ul>
          <li> <a href="/services/seo.php" title="SEO">SEO</a> </li>
          <li> <a href="/services/sem.php" title="SEM">SEM</a> </li>
          <li> <a href="/services/sec.php" title="SEC">SEC</a> </li>
          <li> <a href="/services/link_campaign.php" title="Link Campaign">Link Campaign</a> </li>
          <li> <a href="/services/corporate_identity.php" title="Corporate Identity">Corporate Identity</a> </li>
          <li> <a href="/services/consulting.php" title="Consulting">Consulting</a> </li>
          <li> <a href="/services/email_marketing.php" title="Email Marketing">Email Marketing</a> </li>
          <li> <a href="/services/guerilla_marketing.php" title="Guerilla Marketing">Guerilla Marketing</a> </li>
          <li> <a href="/services/web_analytics.php" title="Web Analytics">Web Analytics</a> </li>
        </ul>
    </li>
    <li> <a href="/emb.php" title="Company">Company</a> </li>
    <li> <a href="/portfolio/" title="Portfolio">Portfolio</a> </li>
	<li> <a href="/admin/" title="Account">Account</a> </li>	<li> <a href="/admin/users_LogOut.php" title="Sign Out">Sign Out</a> </li>  </ul>
  <br />
  <script type="text/javascript">
	<!--
    var obj_cssMenu1 = new CSSMenu("cssMenu1");
    obj_cssMenu1.setTimeouts(400, 200, 800);
    obj_cssMenu1.setSubMenuOffset(0, 0, 5, 0);
    obj_cssMenu1.setHighliteCurrent(true);
    obj_cssMenu1.show();
   //-->
  </script>
</div>
    </div>
  </div>
    <div class="flash">
      <object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,29,0" width="720" height="213">
        <param name="movie" value="flash/header.swf" />
        <param name="wmode" value="opaque" />
        <param name="quality" value="high" />
        <param name="menu" value="false" />
<object data="flash/header.swf"
width="720" height="213" type="application/x-shockwave-flash">
          <param name="wmode" value="opaque" />
          <param name="quality" value="high" />
          <param name="menu" value="false" />
          <param name="pluginurl" value="http://www.macromedia.com/go/getflashplayer" />
          FAIL (the browser should render some flash content, not this).
        </object>
      </object>
    </div>
</div>
  <div id="contentwrapper">
<div class="leftpanel" id="leftcolumn">
<h1>Solutions &amp; Services </h1>
<p>Welcome! Please feel free to browse our solutions and services below... </p>
	<div class="sscontainer">
	<div class="solutions"><strong>Solutions<br />
      </strong><a href="/solutions/web_development.php" class="p">Web Development</a><br />
      <a href="/solutions/web_design.php" class="p">Web Design</a><br />
      <a href="/solutions/e_commerce.php" class="p">E-Commerce</a><br />
      <a href="/solutions/content_management.php" class="p">Content Management</a><br />
      <a href="/solutions/graphic_design.php" class="p">Graphic Design</a><br />
      <a href="/solutions/multimedia_design.php" class="p">Multimedia Design</a><br />
      <a href="/solutions/print_media.php" class="p">Print Media</a><br />
      <a href="/solutions/internet_marketing.php" class="p">Internet Marketing</a><br />
      <a href="/solutions/logo_design.php" class="p">Logo Design</a>
	  </div>
    <div class="services"><strong>Services<br />
      </strong><a href="/services/seo.php" class="p">SEO</a><br />
      <a href="/services/sem.php" class="p">SEM</a><br />
      <a href="/services/sec.php" class="p">SEC</a><br />
      <a href="/services/link_campaign.php" class="p">Link Campaign</a><br />
      <a href="/services/corporate_identity.php" class="p">Corporate Identity</a><br />
      <a href="/services/consulting.php" class="p">Consulting</a><br />
      <a href="/services/email_marketing.php" class="p">Email Marketing</a><br />
      <a href="/services/guerilla_marketing.php" class="p">Guerilla Marketing</a><br />
    <a href="/services/web_analytics.php" class="p">Web Analytics</a>
	</div></div>
	<p><strong>About EMB<br />
	</strong>e-Market Builders was established in March 2004, to provide small to medium sized organizations with cost effective, innovative e-commerce and marketing solutions that enable them to compete with industry leaders. </p>
	<p><a href="/emb.php" class="p"><strong>Read About Us</strong></a></p>
	<p><strong>What Can EMB Do For Me?</strong><br />
	Since EMB is a full-service web development and online marketing firm, the question should be more like: <em>What can't EMB do for me!</em></p>
	<p>Read our full list of <a href="/solutions/" class="p"><strong>Solutions</strong></a> or <a href="/services/" class="p"><strong>Services</strong></a> and be sure to <a href="/contact_form.php" class="p"><strong>request a free quote</strong></a> to see how low are prices really are!</p>
	  
<div class="ads">
  <a href="http://www.emarketbuilders.com/promos/"><img src="http://www.emarketbuilders.com/images/redesign_launch_special_2.gif" alt="Website Redesign Launch Special!" width="215" height="143" border="0" /></a></div>
</div>
<div class="centerpanel" id="centercolumn">
<h2>Announcements</h2>
  
<div class="acontainer">
  <div class="aphoto">
  <a href="/announcement.php?title=Holiday Savings Announced!" class="p"><img src="/images/thumbnails/happy_holiday_specials_70x0.jpg" alt="Holiday Savings Announced!" border="0" /></a></div>
  <div class="atext"><strong><a href="/announcement.php?title=Holiday Savings Announced!" class="p" title="Holiday Savings Announced!">Holiday Savings Announced!</a><br />
  </strong>We're announcing a big discount for the last month of the year.</div>
</div>
<div class="acontainer">
  <div class="aphoto">
  <a href="/announcement.php?title=ReLogisTech's Website Now Complete!" class="p"><img src="/images/thumbnails/relogistechs_ss_large_70x0.jpg" alt="ReLogisTech's Website Now Complete!" border="0" /></a></div>
  <div class="atext"><strong><a href="/announcement.php?title=ReLogisTech's Website Now Complete!" class="p" title="ReLogisTech's Website Now Complete!">ReLogisTech's Website Now Complete!</a><br />
  </strong>The ReLogisTechs.com online storefront is now complete!</div>
</div>
<div class="acontainer">
  <div class="aphoto">
  <a href="/announcement.php?title=C3 Finished!" class="p"><img src="/images/thumbnails/c3_ss_large_70x0.jpg" alt="C3 Finished!" border="0" /></a></div>
  <div class="atext"><strong><a href="/announcement.php?title=C3 Finished!" class="p" title="C3 Finished!">C3 Finished!</a><br />
  </strong>With more than they originally bargained for, C3 is now complete!</div>
</div>
<div class="acontainer">
  <div class="aphoto">
  <a href="/announcement.php?title=New EMB Website Complete!" class="p"><img src="/images/thumbnails/emb_70x0.jpg" alt="New EMB Website Complete!" border="0" /></a></div>
  <div class="atext"><strong><a href="/announcement.php?title=New EMB Website Complete!" class="p" title="New EMB Website Complete!">New EMB Website Complete!</a><br />
  </strong>Finally, our new W3C-Compliant website has arrived.</div>
</div>
<div class="acontainer">
  <div class="aphoto">
  <a href="/announcement.php?title=c3CyberClub.com Scheduled" class="p"><img src="/images/thumbnails/c3cyberclub_logo_70x0.jpg" alt="c3CyberClub.com Scheduled" border="0" /></a></div>
  <div class="atext"><strong><a href="/announcement.php?title=c3CyberClub.com Scheduled" class="p" title="c3CyberClub.com Scheduled">c3CyberClub.com Scheduled</a><br />
  </strong>C3 is a progressive computer-based learning center and is now expanding online via EMB's web development services!</div>
</div>
<p><strong>Miss An Item?<br />
</strong>Make sure that you keep up to date on all the latest in the <strong><a href="/announcement_archive.php" class="p">EMB Announcement Archive</a></strong>.</p>
</div>
<div class="rightpanel" id="rightcolumn">
 
<h3>Shopping Cart</h3>
<p>Check out your shopping cart! </p>
<a href="/clients/cart.php" class="scbutton" title="Go to Shopping Cart!"></a>
<h4>Advertisement</h4>
        
<div class="ads">
  <a href="http://www.anrdoezrs.net/4c106hz74z6MPQPROTTMONRQPRWT" target="_top">
<img src="http://www.awltovhc.com/sl72tkocig145463881326546B8" alt="" border="0"/></a></div>
<h4>Resources</h4>
<div class="arrowlink"><a href="/articles/" title="Articles"><strong>Articles</strong></a>
<br />
<a href="/promos/" title="Promotions"><strong>Promotions</strong></a></div>
<h4>Testimonials</h4>
 
<div class="tcontainer">
  <div class="tquote">"Excellent post-production support!"</div>
  <div class="tfrom">Ben Baier of New Wave Lending</div>
</div>
<p><a href="/testimonials.php" class="p"><strong>Read More Testimonials</strong></a></p>
<h4>Showcase</h4>
  
<div class="scontainer">
  <div class="sshot"> <a href="/portfolio/client.php?name=New Wave Lending" class="p"><img src="/images/thumbnails/newwavelending_ss_large_60x0.jpg" alt="New Wave Lending" border="0" /></a></div>
  <div class="stext"><strong><a href="/portfolio/client.php?name=New Wave Lending" class="p">New Wave Lending</a><br />
  </strong>Web Design</div>
</div>
</div>
</div>
  <div class="footer">
  <a href="/index.php" class="f">Home</a> | <a href="/emb.php" class="f">About Us</a> | <a href="/portfolio/" class="f">Portfolio</a> | <a href="/contact_us.php" class="f">Contact Us</a> | <a href="/solutions/web_design.php" class="f">Web Design</a> | <a href="/solutions/internet_marketing.php" class="f">Internet Marketing</a> | <a href="/solutions/graphic_design.php" class="f">Graphic Design</a> | <a href="/solutions/multimedia_design.php" class="f">Multimedia Design</a><br />
      <a href="/services/seo.php" class="f">SEO</a> | <a href="/services/consulting.php" class="f">Consulting</a> | <a href="/promos/" class="f">Promotions</a> | <a href="/links.php" class="f">Links</a> | <a href="/sitemap.php" class="f">Sitemap</a>
      <div class="copyright"><span class="f">Copyright &copy; 2006 EMB. <a href="/terms_of_use.php" class="f">All rights reserved.</a><br />
            <a href="http://jigsaw.w3.org/css-validator/check/referer" class="f" style="background-color: #FFF"> <img src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!" width="32" height="32" border="0" align="bottom" style="border:0;width:88px;height:31px" /> </a> <a href="http://validator.w3.org/check?uri=referer" class="f" style="background-color: #FFF"><img src="http://www.w3.org/Icons/valid-xhtml10" alt="Valid XHTML 1.0 Transitional" align="bottom" style="border:0;width:88px;height:31px" /></a> </span></div>
      <div class="logo"><img src="/logos/logo_footer.gif" alt="EMB Web Design" width="101" height="57" align="right" /></div>
  </div>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
var pageTracker = _gat._getTracker("UA-1915512-2");
pageTracker._initData();
pageTracker._trackPageview();
</script>
</body>
<script type="text/javascript">_popupControl();</script>
</html>

Open in new window

0
 
Ray PaseurCommented:
I'll try a different approach - back in a few...
0
 
Ray PaseurCommented:
See if this is helpful.  
<?php // RAY_parse_divs.php
 
// TRAP ERROR MESSAGES
ob_start();
 
$dom = new DomDocument;
$dom->preserveWhiteSpace = TRUE;
$dom->loadHTMLFile('http://www.emarketbuilders.com');
 
// RETRIEVE ERROR MESSAGES
$dom_errors = ob_get_clean();
 
// LOCATE THE DIVS
$divs = $dom->getElementsByTagName('div');
 
foreach ($divs as $div) {
   $div_class = $div->getAttribute("class");
   $div_id    = $div->getAttribute("id");
   $div_data  = $div->nodeValue;
 
   echo "<br/><strong>DIV ";
   if ($div_id    != '') echo "ID=$div_id ";
   if ($div_class != '') echo "CLASS=$div_class ";
   echo "</strong>";
 
   echo htmlentities($div_data) . "<br/>\n";
}
 
 
echo "<br/><br/>$dom_errors \n";

Open in new window

0
 
EMB01Author Commented:
Thanks, and, one last thing out of curiousity... If I wanted to output not text, but HTML, how would I do that since I'm using the DOM?
0
 
EMB01Author Commented:
Thanks, I'm unfamiliar with DOM so it's good to see it in action!
0
 
Ray PaseurCommented:
Thanks for the points!  I'm not sure how I would get HTML out of it.  It might be necessary to take some of the DOM data and go back into the HTML to look for the HTML pieces.  What with nesting of <div> tags, etc. this may not be a trivial task.  One sticky wicket: unmatched or mis-nested tags.  Without XML-like rules, it's potentially difficult to figure out where a <div> ends.  Maybe that is one of the reasons that browsers render things differently in quirks mode!
0
 
Ray PaseurCommented:
Thinking about this a little more, I came across this:
http://simplehtmldom.sourceforge.net/

I haven't tried it, but it looks interesting.

best regards, ~Ray
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 13
  • 9
Tackle projects and never again get stuck behind a technical roadblock.
Join Now