ChilliSauce
asked on
PHP - Strange results when I try and retrieve some websites ?? get_meta_tags & Curl
Here's my code:
When I run this code, it works fine for the domain http://moz.com , but for http://seolinkdirectory.org the get_meta_tags function returns nothing and the GetWebPage function returns a load of gibberish!
Can anyone identify why this is happening ?
<?
$URL[1] = 'http://seolinkdirectory.org';
$URL[2] = 'http://moz.com';
foreach($URL as $Key => $Website)
{
$Meta = get_meta_tags($Website);
foreach($Meta as $Key1 => $Value1)
{
echo $Website . ' ' . $Key1 . ' = ' . $Value1 . '<br>';
}
$Page = GetWebPage($Website);
echo $Website . "<br><br><br>Webpage <br><br><br><br>" . $Page;
}
function GetWebPage($URL)
{
$ua = 'Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 (ROBOT)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $URL);
curl_setopt($ch, CURLOPT_USERAGENT, $ua);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, True);
curl_setopt($ch, CURLOPT_NOBODY, False);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, True);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, True);
$content = curl_exec($ch);
curl_close($ch);
return $content;
}
?>
When I run this code, it works fine for the domain http://moz.com , but for http://seolinkdirectory.org the get_meta_tags function returns nothing and the GetWebPage function returns a load of gibberish!
Can anyone identify why this is happening ?
Check these pages with "view source." It looks like the seo link directory is generated using Javascript libraries. This is a technique that some publishers have adopted to prevent "screen scraping" with cURL. It is a way that they protect their copyrighted content. If you want access to their content and they want you to have programmatic access (this is probably a paid relationship) the publisher will usually expose an API. You might ask about that.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.