?
Solved

Get content of an element with XPath?

Posted on 2009-04-26
14
Medium Priority
?
1,185 Views
Last Modified: 2013-11-11
<haha>
  <yeah>
    <div>
       yeah
       <blah>
         asdoijasd
       </blah>
       <more>asdasd</more>
     </div>
   </yeah>
</haha>


How do I get all content inside <div>? I don't just want the text content, I need the markup as well.



I'm using this code in PHP, but it just gets the text content.

$xpath = new DOMXPath($domdoc);
$nodes = $xpath->evaluate("//div[@id='removedforprivacy']/div[@class='privacy-concerns']");
$element = $nodes->item(0);
$description = $element->textContent;
0
Comment
Question by:skiingisfun
  • 4
  • 4
  • 3
  • +1
12 Comments
 
LVL 3

Expert Comment

by:GodDoesntExist
ID: 24239241
Why dont you just add it to a variable?

$content = " <haha>
  <yeah>
    <div>
       yeah
       <blah>
         asdoijasd
       </blah>
       <more>asdasd</more>
     </div>
   </yeah>
</haha>";
0
 

Author Comment

by:skiingisfun
ID: 24239413
unfortunately that has nothing to do with what i'm trying to accomplish

i need to get the contents of the div using xpath
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 24254606
@skiingisfun: It looks like your sample code and your test data are out of synch.  If you can post some of the REAL data our answers may be more helpful.  Thanks, ~Ray
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
LVL 40

Expert Comment

by:Richard Quadling
ID: 24254787
http://docs.php.net/domxpath has a good example in the user notes...

// example 3: same as above with wildcard
$elements = $xpath->query("*/div[@id='yourTagIdHere']");

<?php
// to retrieve selected html data, try these DomXPath examples:
 
$file = $DOCUMENT_ROOT. "test.html";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
 
$xpath = new DOMXpath($doc);
 
// example 1: for everything with an id
//$elements = $xpath->query("//*[@id]");
 
// example 2: for node data in a selected id
//$elements = $xpath->query("/html/body/div[@id='yourTagIdHere']");
 
// example 3: same as above with wildcard
$elements = $xpath->query("*/div[@id='yourTagIdHere']");
 
if (!is_null($elements)) {
  foreach ($elements as $element) {
    echo "<br/>[". $element->nodeName. "]";
 
    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";
    }
  }
}
?>

Open in new window

0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 24254789
This will show you one way of getting some of the information.  If you give me some of the real data, I may be able to expand on the example.

Best regards, ~Ray
<?php // RAY_temp_skiingisfun.php
error_reporting(E_ALL);
echo "<pre>\n";
 
// TEST DATA FROM THE POST AT EE
$xml = '
<haha>
  <yeah>
    <div>
       yeah
       <blah>
         asdoijasd
       </blah>
       <more>asdasd</more>
     </div>
   </yeah>
</haha>';
 
// REMOVE UNNECESSARY WHITE SPACE AND NEWLINES
$xml = ereg_replace("\n",  '',  $xml);
$xml = ereg_replace(" +<", '<', $xml);
$xml = ereg_replace("> +", '>', $xml);
echo htmlentities($xml);
echo "\n\n";
 
// MAKE AN OBJECT
$obj = SimpleXML_Load_String($xml);
 
// VISUALIZE THE OBJECT
var_dump($obj);
 
// LOOK AT SOME PARTS
$thing = trim( (string)$obj->yeah->div );
var_dump($thing);
 
$thing = trim( (string)$obj->yeah->div->blah );
var_dump($thing);
 
$thing = trim( (string)$obj->yeah->div->more );
var_dump($thing);
 
// GET THE PARTIAL XML FROM THE OBJECT
$thing = $obj->yeah->div->asXML();
 
// VISUALIZE THE PARTIAL XML STRING
echo "\n\n";
echo htmlentities($thing);

Open in new window

0
 

Author Comment

by:skiingisfun
ID: 24255053
thanks very much for the help

here is the real code:

      $xpath = new DOMXPath($domdoc);

      $nodes = $xpath->evaluate("//div[@id='productDescription']/div[@class='content']");
      $element = $nodes->item(0);

      $description = $element->textContent;

and content: http://www.amazon.com/Toshiba-Satellite-A305-S6908-15-4-Inch-Laptop/dp/B001NEJO0Y/ref=pd_bbs_sr_1?ie=UTF8&s=electronics&qid=1240952161&sr=8-1


I'm trying to scrape the product description off that page. My code gets the text of the product description, but any tags inside the product description are removed. I want all of the HTML in the product description.

0
 
LVL 40

Accepted Solution

by:
Richard Quadling earned 2000 total points
ID: 24255342
Does this work for you?

Outputs ...

<div class="content">
 

      <b>Amazon.com Product Description</b><br/>
  Designed to meet the demands of a multimedia-hungry family, students on the go, and small businesses with demanding computing needs, the Toshiba Satellite A305-S6908 offers a versatile base for all your needs. It includes a generous 15.4-inch widescreen TruBrite high-

[SNIPPED]
<?php
$o_Doc = new DOMDocument();
@$o_Doc->loadHTMLFile('http://www.amazon.com/Toshiba-Satellite-A305-S6908-15-4-Inch-Laptop/dp/B001NEJO0Y/ref=pd_bbs_sr_1?ie=UTF8&s=electronics&qid=1240952161&sr=8-1');
 
$o_XPath = new DOMXpath($o_Doc);
 
$o_NodeList = $o_XPath->evaluate("//div[@id='productDescription']/div[@class='content']"); 
$o_Node = $o_NodeList->item(0);
echo $o_Doc->saveXML($o_Node);

Open in new window

0
 

Author Closing Comment

by:skiingisfun
ID: 31574812
perfect, thanks, you rule!
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 24258194
See that Ray? I rule!!! ;-)
0
 

Author Comment

by:skiingisfun
ID: 24258329
haha, but if it wasn't for ray requesting i post exactly what i needed i wouldn't have got your solution

so i'm not sure who's more awesome

lol
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 24258360
Probably Ray.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 24260087
Richard has been here longer!  Yes, you rock!!
0

Featured Post

[Webinar] Improve your customer journey

A positive customer journey is important in attracting and retaining business. To improve this experience, you can use Google Maps APIs to increase checkout conversions, boost user engagement, and optimize order fulfillment. Learn how in this webinar presented by Dito.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
In this tutorial viewers will learn how to style a corner ribbon overlay for an image using CSS Create a new class by typing ".Ribbon":  Define the class' "display:" as "inline-block": Define its "position:" as "relative": Define its "overflow:" as …
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

601 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question