asked on

PHP - Strange results when I try and retrieve some websites ?? get_meta_tags & Curl

Here's my code:

<?
$URL[1] = 'http://seolinkdirectory.org'; 
$URL[2] = 'http://moz.com';

foreach($URL as $Key => $Website)
{
	$Meta = get_meta_tags($Website);
	foreach($Meta as $Key1 => $Value1)
	{
		echo $Website . ' ' . $Key1 . ' = ' . $Value1 . '<br>';	
	}
	$Page = GetWebPage($Website);
	echo $Website . "<br><br><br>Webpage <br><br><br><br>" . $Page;
}

function GetWebPage($URL) 
{
	$ua = 'Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0 (ROBOT)';
     
	

    $ch             = curl_init();

    curl_setopt($ch, CURLOPT_URL,            $URL);
    curl_setopt($ch, CURLOPT_USERAGENT,      $ua);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, True);
    curl_setopt($ch, CURLOPT_NOBODY,         False);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, True);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, True);
    $content = curl_exec($ch);
	curl_close($ch);
	return $content;
}
?>

Open in new window

When I run this code, it works fine for the domain http://moz.com , but for http://seolinkdirectory.org the get_meta_tags function returns nothing and the GetWebPage function returns a load of gibberish!

Can anyone identify why this is happening ?

Ray Paseur

Check these pages with "view source." It looks like the seo link directory is generated using Javascript libraries. This is a technique that some publishers have adopted to prevent "screen scraping" with cURL. It is a way that they protect their copyrighted content. If you want access to their content and they want you to have programmatic access (this is probably a paid relationship) the publisher will usually expose an API. You might ask about that.

ASKER CERTIFIED SOLUTION

gr8gonzo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

PHP - Strange results when I try and retrieve some websites ?? get_meta_tags &amp; Curl

PHP - Strange results when I try and retrieve some websites ?? get_meta_tags & Curl