Solved

Check pagerank in multiple cURL handles in parallel

Posted on 2009-05-18
1
479 Views
Last Modified: 2012-05-07
Hi E's, in snippet code I show the function I use to check page rank of each page. In this case and in every script I have about ten url's for check the page rank, and if I check one by one its a lot of time.
I know cURL run multiple cURL handles in parallel, with "curl_multi_init", but I don't know how I use. So i need a experts helps.

Basely, what I want to do is:
-I call the url's one by one from database
-I call the function (in parallel)
-Save in database

How I do?

Regards, JC
 

<?php
 

  define('GOOGLE_MAGIC', 0xE6359A60);
 

  function _zeroFill($a, $b){

    $z = hexdec(80000000);

    if ($z & $a){

      $a = ($a>>1);

      $a &= (~$z);

      $a |= 0x40000000;

      $a = ($a>>($b-1));

    }else

      $a = ($a>>$b);

    return $a;

  }
 

  function _mix($a,$b,$c){

    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,13));

    $b -= $c; $b -= $a; $b ^= ($a<<8);

    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,13));

    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,12));

    $b -= $c; $b -= $a; $b ^= ($a<<16);

    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,5));

    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,3));

    $b -= $c; $b -= $a; $b ^= ($a<<10);

    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,15));

    return array($a,$b,$c);

  }
 

  function _GoogleCH($url, $length=null, $init=GOOGLE_MAGIC){

    if(is_null($length))

      $length = sizeof($url);

    $a = $b = 0x9E3779B9;

    $c = $init;

    $k = 0;

    $len = $length;

    while($len >= 12){

      $a += ($url[$k + 0] + ($url[$k + 1] << 8) + ($url[$k + 2] << 16) + ($url[$k + 3] << 24));

      $b += ($url[$k + 4] + ($url[$k + 5] << 8) + ($url[$k + 6] << 16) + ($url[$k + 7] << 24));

      $c += ($url[$k + 8] + ($url[$k + 9] << 8) + ($url[$k + 10] << 16) + ($url[$k + 11] << 24));

      $_mix = _mix($a,$b,$c);

      $a = $_mix[0]; $b = $_mix[1]; $c = $_mix[2];

      $k += 12;

      $len -= 12;

    }

    $c += $length;

    switch($len){

      case 11: $c += ($url[$k + 10] << 24);

      case 10: $c += ($url[$k + 9] << 16);

      case 9 : $c += ($url[$k + 8] << 8);

      case 8 : $b += ($url[$k + 7] << 24);

      case 7 : $b += ($url[$k + 6] << 16);

      case 6 : $b += ($url[$k + 5] << 8);

      case 5 : $b += ($url[$k + 4]);

      case 4 : $a += ($url[$k + 3] << 24);

      case 3 : $a += ($url[$k + 2] << 16);

      case 2 : $a += ($url[$k + 1] << 8);

      case 1 : $a += ($url[$k + 0]);

    }

    $_mix = _mix($a,$b,$c);

    return $_mix[2];

  }
 

  function _strord($string){

    for($i = 0;$i < strlen($string);$i++)

      $result[$i] = ord($string{$i});

    return $result;

}
 

  function getPageRank($url){

    $pagerank = -1;

    $ch = "6"._GoogleCH(_strord("info:" . $url));

    $fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);

    if($fp){

      $out = "GET /search?client=navclient-auto&ch=" . $ch . "&features=Rank&q=info:" . $url . " HTTP/1.1\r\n";

      $out .= "Host: www.google.com\r\n";

      $out .= "Connection: Close\r\n\r\n";

      fwrite($fp, $out);

      while (!feof($fp)){

        $data = fgets($fp, 128);

        $pos = strpos($data, "Rank_");

        if($pos === false){

        }else

          $pagerank = substr($data, $pos + 9);

      }

      fclose($fp);

    }

    return $pagerank;

}
 

//////////////////////////////////////////

CALL URL'S FROM THE DATA BASE

MADE A WHILE {

$pr = getPageRank("$pagerankurl");

SAVE IN DATABASE
 

?>

Open in new window

0
Comment
Question by:Pedro Chagas
1 Comment
 
LVL 11

Accepted Solution

by:
BrianMM earned 500 total points
ID: 24420552
Hi,

Recently(ish) i implemented a web scraping tool following some hints from http://www.developertutorials.com/blog/php/parallel-web-scraping-in-php-curl-multi-functions-375/ which does cURL in parallel.

Check it out see if it gives you some pointers.

If not let me know and I'll see what can be done when I have more time to spend.
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

This article will explain how to display the first page of your Microsoft Word documents (e.g. .doc, .docx, etc...) as images in a web page programatically. I have scoured the web on a way to do this unsuccessfully. The goal is to produce something …
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to count occurrences of each item in an array.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now