Solved

Check pagerank in multiple cURL handles in parallel

Posted on 2009-05-18
1
491 Views
Last Modified: 2012-05-07
Hi E's, in snippet code I show the function I use to check page rank of each page. In this case and in every script I have about ten url's for check the page rank, and if I check one by one its a lot of time.
I know cURL run multiple cURL handles in parallel, with "curl_multi_init", but I don't know how I use. So i need a experts helps.

Basely, what I want to do is:
-I call the url's one by one from database
-I call the function (in parallel)
-Save in database

How I do?

Regards, JC
 

<?php
 
  define('GOOGLE_MAGIC', 0xE6359A60);
 
  function _zeroFill($a, $b){
    $z = hexdec(80000000);
    if ($z & $a){
      $a = ($a>>1);
      $a &= (~$z);
      $a |= 0x40000000;
      $a = ($a>>($b-1));
    }else
      $a = ($a>>$b);
    return $a;
  }
 
  function _mix($a,$b,$c){
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,13));
    $b -= $c; $b -= $a; $b ^= ($a<<8);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,13));
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,12));
    $b -= $c; $b -= $a; $b ^= ($a<<16);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,5));
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,3));
    $b -= $c; $b -= $a; $b ^= ($a<<10);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,15));
    return array($a,$b,$c);
  }
 
  function _GoogleCH($url, $length=null, $init=GOOGLE_MAGIC){
    if(is_null($length))
      $length = sizeof($url);
    $a = $b = 0x9E3779B9;
    $c = $init;
    $k = 0;
    $len = $length;
    while($len >= 12){
      $a += ($url[$k + 0] + ($url[$k + 1] << 8) + ($url[$k + 2] << 16) + ($url[$k + 3] << 24));
      $b += ($url[$k + 4] + ($url[$k + 5] << 8) + ($url[$k + 6] << 16) + ($url[$k + 7] << 24));
      $c += ($url[$k + 8] + ($url[$k + 9] << 8) + ($url[$k + 10] << 16) + ($url[$k + 11] << 24));
      $_mix = _mix($a,$b,$c);
      $a = $_mix[0]; $b = $_mix[1]; $c = $_mix[2];
      $k += 12;
      $len -= 12;
    }
    $c += $length;
    switch($len){
      case 11: $c += ($url[$k + 10] << 24);
      case 10: $c += ($url[$k + 9] << 16);
      case 9 : $c += ($url[$k + 8] << 8);
      case 8 : $b += ($url[$k + 7] << 24);
      case 7 : $b += ($url[$k + 6] << 16);
      case 6 : $b += ($url[$k + 5] << 8);
      case 5 : $b += ($url[$k + 4]);
      case 4 : $a += ($url[$k + 3] << 24);
      case 3 : $a += ($url[$k + 2] << 16);
      case 2 : $a += ($url[$k + 1] << 8);
      case 1 : $a += ($url[$k + 0]);
    }
    $_mix = _mix($a,$b,$c);
    return $_mix[2];
  }
 
  function _strord($string){
    for($i = 0;$i < strlen($string);$i++)
      $result[$i] = ord($string{$i});
    return $result;
}
 
  function getPageRank($url){
    $pagerank = -1;
    $ch = "6"._GoogleCH(_strord("info:" . $url));
    $fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
    if($fp){
      $out = "GET /search?client=navclient-auto&ch=" . $ch . "&features=Rank&q=info:" . $url . " HTTP/1.1\r\n";
      $out .= "Host: www.google.com\r\n";
      $out .= "Connection: Close\r\n\r\n";
      fwrite($fp, $out);
      while (!feof($fp)){
        $data = fgets($fp, 128);
        $pos = strpos($data, "Rank_");
        if($pos === false){
        }else
          $pagerank = substr($data, $pos + 9);
      }
      fclose($fp);
    }
    return $pagerank;
}
 
//////////////////////////////////////////
CALL URL'S FROM THE DATA BASE
MADE A WHILE {
$pr = getPageRank("$pagerankurl");
SAVE IN DATABASE
 
?>

Open in new window

0
Comment
Question by:Pedro Chagas
1 Comment
 
LVL 11

Accepted Solution

by:
BrianMM earned 500 total points
ID: 24420552
Hi,

Recently(ish) i implemented a web scraping tool following some hints from http://www.developertutorials.com/blog/php/parallel-web-scraping-in-php-curl-multi-functions-375/ which does cURL in parallel.

Check it out see if it gives you some pointers.

If not let me know and I'll see what can be done when I have more time to spend.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
MySQL Sub-Select Query Returning Duplicate Result 7 45
How to set the Tinymce Editor image path 4 22
jQuery force form POST 7 43
Log in through ID 5 17
Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

789 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question