Link to home
Start Free TrialLog in
Avatar of cbielich
cbielichFlag for United States of America

asked on

Using DOMNodeList to get links with rel=nofollow using php

I have this code but does not seem to be working or I am not figuring out how to get the data I need into a php variable


<?
$html = 'http://www.website.com';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
// returns a list of all links with rel=nofollow
$nlist = $xpath->query('//a[@rel="nofollow"]');
foreach ($nlist as $link) {
      //condition here
}
?>

the foreach loop does not seem to be finding the links. Is there a way i can do this? I need to extract the URL from the <a> that contains the rel=nofollow

also the "nofollow" is wrapped in double quotes. What if a page has them wrapped in single quotes. Is that going to still find those? I need to make sure I find both.
ASKER CERTIFIED SOLUTION
Avatar of Robert Schutt
Robert Schutt
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
By the way, just googling around just now I see that it a bad idea to use:
ini_set("allow_url_fopen","on");

Open in new window

if it even works on your system, the least you could do is switch it off again after use, but I guess using DOMDocument is not really secure and there are better ways to load the content of another website (like curl, but I have never used that yet).
$html = 'http://www.website.com';
I am guessing that is not the real URL you want to inspect.  Please post the ACTUAL URL and I will see if I can show you how to find the information you want.