Solved

Check pagerank in multiple cURL handles in parallel

Posted on 2009-05-18
1
483 Views
Last Modified: 2012-05-07
Hi E's, in snippet code I show the function I use to check page rank of each page. In this case and in every script I have about ten url's for check the page rank, and if I check one by one its a lot of time.
I know cURL run multiple cURL handles in parallel, with "curl_multi_init", but I don't know how I use. So i need a experts helps.

Basely, what I want to do is:
-I call the url's one by one from database
-I call the function (in parallel)
-Save in database

How I do?

Regards, JC
 

<?php
 

  define('GOOGLE_MAGIC', 0xE6359A60);
 

  function _zeroFill($a, $b){

    $z = hexdec(80000000);

    if ($z & $a){

      $a = ($a>>1);

      $a &= (~$z);

      $a |= 0x40000000;

      $a = ($a>>($b-1));

    }else

      $a = ($a>>$b);

    return $a;

  }
 

  function _mix($a,$b,$c){

    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,13));

    $b -= $c; $b -= $a; $b ^= ($a<<8);

    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,13));

    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,12));

    $b -= $c; $b -= $a; $b ^= ($a<<16);

    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,5));

    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,3));

    $b -= $c; $b -= $a; $b ^= ($a<<10);

    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,15));

    return array($a,$b,$c);

  }
 

  function _GoogleCH($url, $length=null, $init=GOOGLE_MAGIC){

    if(is_null($length))

      $length = sizeof($url);

    $a = $b = 0x9E3779B9;

    $c = $init;

    $k = 0;

    $len = $length;

    while($len >= 12){

      $a += ($url[$k + 0] + ($url[$k + 1] << 8) + ($url[$k + 2] << 16) + ($url[$k + 3] << 24));

      $b += ($url[$k + 4] + ($url[$k + 5] << 8) + ($url[$k + 6] << 16) + ($url[$k + 7] << 24));

      $c += ($url[$k + 8] + ($url[$k + 9] << 8) + ($url[$k + 10] << 16) + ($url[$k + 11] << 24));

      $_mix = _mix($a,$b,$c);

      $a = $_mix[0]; $b = $_mix[1]; $c = $_mix[2];

      $k += 12;

      $len -= 12;

    }

    $c += $length;

    switch($len){

      case 11: $c += ($url[$k + 10] << 24);

      case 10: $c += ($url[$k + 9] << 16);

      case 9 : $c += ($url[$k + 8] << 8);

      case 8 : $b += ($url[$k + 7] << 24);

      case 7 : $b += ($url[$k + 6] << 16);

      case 6 : $b += ($url[$k + 5] << 8);

      case 5 : $b += ($url[$k + 4]);

      case 4 : $a += ($url[$k + 3] << 24);

      case 3 : $a += ($url[$k + 2] << 16);

      case 2 : $a += ($url[$k + 1] << 8);

      case 1 : $a += ($url[$k + 0]);

    }

    $_mix = _mix($a,$b,$c);

    return $_mix[2];

  }
 

  function _strord($string){

    for($i = 0;$i < strlen($string);$i++)

      $result[$i] = ord($string{$i});

    return $result;

}
 

  function getPageRank($url){

    $pagerank = -1;

    $ch = "6"._GoogleCH(_strord("info:" . $url));

    $fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);

    if($fp){

      $out = "GET /search?client=navclient-auto&ch=" . $ch . "&features=Rank&q=info:" . $url . " HTTP/1.1\r\n";

      $out .= "Host: www.google.com\r\n";

      $out .= "Connection: Close\r\n\r\n";

      fwrite($fp, $out);

      while (!feof($fp)){

        $data = fgets($fp, 128);

        $pos = strpos($data, "Rank_");

        if($pos === false){

        }else

          $pagerank = substr($data, $pos + 9);

      }

      fclose($fp);

    }

    return $pagerank;

}
 

//////////////////////////////////////////

CALL URL'S FROM THE DATA BASE

MADE A WHILE {

$pr = getPageRank("$pagerankurl");

SAVE IN DATABASE
 

?>

Open in new window

0
Comment
Question by:Pedro Chagas
1 Comment
 
LVL 11

Accepted Solution

by:
BrianMM earned 500 total points
ID: 24420552
Hi,

Recently(ish) i implemented a web scraping tool following some hints from http://www.developertutorials.com/blog/php/parallel-web-scraping-in-php-curl-multi-functions-375/ which does cURL in parallel.

Check it out see if it gives you some pointers.

If not let me know and I'll see what can be done when I have more time to spend.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
TCPDF Table of Contents - Page Numbers Don't Right Align 2 36
php date - compare 4 40
Site hacked - decoding the PHP? 15 61
Session timeout 5 13
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now