?
Solved

Check pagerank in multiple cURL handles in parallel

Posted on 2009-05-18
1
Medium Priority
?
508 Views
Last Modified: 2012-05-07
Hi E's, in snippet code I show the function I use to check page rank of each page. In this case and in every script I have about ten url's for check the page rank, and if I check one by one its a lot of time.
I know cURL run multiple cURL handles in parallel, with "curl_multi_init", but I don't know how I use. So i need a experts helps.

Basely, what I want to do is:
-I call the url's one by one from database
-I call the function (in parallel)
-Save in database

How I do?

Regards, JC
 

<?php
 
  define('GOOGLE_MAGIC', 0xE6359A60);
 
  function _zeroFill($a, $b){
    $z = hexdec(80000000);
    if ($z & $a){
      $a = ($a>>1);
      $a &= (~$z);
      $a |= 0x40000000;
      $a = ($a>>($b-1));
    }else
      $a = ($a>>$b);
    return $a;
  }
 
  function _mix($a,$b,$c){
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,13));
    $b -= $c; $b -= $a; $b ^= ($a<<8);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,13));
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,12));
    $b -= $c; $b -= $a; $b ^= ($a<<16);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,5));
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,3));
    $b -= $c; $b -= $a; $b ^= ($a<<10);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,15));
    return array($a,$b,$c);
  }
 
  function _GoogleCH($url, $length=null, $init=GOOGLE_MAGIC){
    if(is_null($length))
      $length = sizeof($url);
    $a = $b = 0x9E3779B9;
    $c = $init;
    $k = 0;
    $len = $length;
    while($len >= 12){
      $a += ($url[$k + 0] + ($url[$k + 1] << 8) + ($url[$k + 2] << 16) + ($url[$k + 3] << 24));
      $b += ($url[$k + 4] + ($url[$k + 5] << 8) + ($url[$k + 6] << 16) + ($url[$k + 7] << 24));
      $c += ($url[$k + 8] + ($url[$k + 9] << 8) + ($url[$k + 10] << 16) + ($url[$k + 11] << 24));
      $_mix = _mix($a,$b,$c);
      $a = $_mix[0]; $b = $_mix[1]; $c = $_mix[2];
      $k += 12;
      $len -= 12;
    }
    $c += $length;
    switch($len){
      case 11: $c += ($url[$k + 10] << 24);
      case 10: $c += ($url[$k + 9] << 16);
      case 9 : $c += ($url[$k + 8] << 8);
      case 8 : $b += ($url[$k + 7] << 24);
      case 7 : $b += ($url[$k + 6] << 16);
      case 6 : $b += ($url[$k + 5] << 8);
      case 5 : $b += ($url[$k + 4]);
      case 4 : $a += ($url[$k + 3] << 24);
      case 3 : $a += ($url[$k + 2] << 16);
      case 2 : $a += ($url[$k + 1] << 8);
      case 1 : $a += ($url[$k + 0]);
    }
    $_mix = _mix($a,$b,$c);
    return $_mix[2];
  }
 
  function _strord($string){
    for($i = 0;$i < strlen($string);$i++)
      $result[$i] = ord($string{$i});
    return $result;
}
 
  function getPageRank($url){
    $pagerank = -1;
    $ch = "6"._GoogleCH(_strord("info:" . $url));
    $fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
    if($fp){
      $out = "GET /search?client=navclient-auto&ch=" . $ch . "&features=Rank&q=info:" . $url . " HTTP/1.1\r\n";
      $out .= "Host: www.google.com\r\n";
      $out .= "Connection: Close\r\n\r\n";
      fwrite($fp, $out);
      while (!feof($fp)){
        $data = fgets($fp, 128);
        $pos = strpos($data, "Rank_");
        if($pos === false){
        }else
          $pagerank = substr($data, $pos + 9);
      }
      fclose($fp);
    }
    return $pagerank;
}
 
//////////////////////////////////////////
CALL URL'S FROM THE DATA BASE
MADE A WHILE {
$pr = getPageRank("$pagerankurl");
SAVE IN DATABASE
 
?>

Open in new window

0
Comment
Question by:Pedro Chagas
1 Comment
 
LVL 11

Accepted Solution

by:
BrianMM earned 1500 total points
ID: 24420552
Hi,

Recently(ish) i implemented a web scraping tool following some hints from http://www.developertutorials.com/blog/php/parallel-web-scraping-in-php-curl-multi-functions-375/ which does cURL in parallel.

Check it out see if it gives you some pointers.

If not let me know and I'll see what can be done when I have more time to spend.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
Part of the Global Positioning System A geocode (https://developers.google.com/maps/documentation/geocoding/) is the major subset of a GPS coordinate (http://en.wikipedia.org/wiki/Global_Positioning_System), the other parts being the altitude and t…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
Suggested Courses

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question