• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 937
  • Last Modified:

Using DOMNodeList to get links with rel=nofollow using php

I have this code but does not seem to be working or I am not figuring out how to get the data I need into a php variable


<?
$html = 'http://www.website.com';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
// returns a list of all links with rel=nofollow
$nlist = $xpath->query('//a[@rel="nofollow"]');
foreach ($nlist as $link) {
      //condition here
}
?>

the foreach loop does not seem to be finding the links. Is there a way i can do this? I need to extract the URL from the <a> that contains the rel=nofollow

also the "nofollow" is wrapped in double quotes. What if a page has them wrapped in single quotes. Is that going to still find those? I need to make sure I find both.
0
cbielich
Asked:
cbielich
  • 2
1 Solution
 
Robert SchuttSoftware EngineerCommented:
loadHTML works with a text string, not a URL. use DOMDocument::load for that. My website provider apparently doesn't allow this, or maybe I don't know how to set the option correctly. It does work though with a local .htm file I loaded:
<?

// also posted at: http://schutt.nl/ee/Q_27858934/

ini_set("display_error","on");
ini_set("allow_url_fopen","on"); // no way jose
error_reporting(E_ALL);

//$html = 'http://www.website.com';
//$html = 'http://schutt.nl/ee/Q_27858934/test.htm';
$html = 'test.htm'; // local file, does work

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->load($html); // loads your html
$xpath = new DOMXPath($doc);
// returns a list of all links with rel=nofollow
$nlist = $xpath->query('//a[@rel="nofollow"]');

foreach ($nlist as $link) {
      //condition here
	echo htmlentities($link->nodeValue).", href = {$link->getAttribute('href')}<br>\n";
}
?>

Open in new window

0
 
Robert SchuttSoftware EngineerCommented:
By the way, just googling around just now I see that it a bad idea to use:
ini_set("allow_url_fopen","on");

Open in new window

if it even works on your system, the least you could do is switch it off again after use, but I guess using DOMDocument is not really secure and there are better ways to load the content of another website (like curl, but I have never used that yet).
0
 
Ray PaseurCommented:
$html = 'http://www.website.com';
I am guessing that is not the real URL you want to inspect.  Please post the ACTUAL URL and I will see if I can show you how to find the information you want.
0

Featured Post

Become an Android App Developer

Ready to kick start your career in 2018? Learn how to build an Android app in January’s Course of the Month and open the door to new opportunities.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now