Solved

Check pagerank in multiple cURL handles in parallel

Posted on 2009-05-18
1
500 Views
Last Modified: 2012-05-07
Hi E's, in snippet code I show the function I use to check page rank of each page. In this case and in every script I have about ten url's for check the page rank, and if I check one by one its a lot of time.
I know cURL run multiple cURL handles in parallel, with "curl_multi_init", but I don't know how I use. So i need a experts helps.

Basely, what I want to do is:
-I call the url's one by one from database
-I call the function (in parallel)
-Save in database

How I do?

Regards, JC
 

<?php
 
  define('GOOGLE_MAGIC', 0xE6359A60);
 
  function _zeroFill($a, $b){
    $z = hexdec(80000000);
    if ($z & $a){
      $a = ($a>>1);
      $a &= (~$z);
      $a |= 0x40000000;
      $a = ($a>>($b-1));
    }else
      $a = ($a>>$b);
    return $a;
  }
 
  function _mix($a,$b,$c){
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,13));
    $b -= $c; $b -= $a; $b ^= ($a<<8);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,13));
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,12));
    $b -= $c; $b -= $a; $b ^= ($a<<16);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,5));
    $a -= $b; $a -= $c; $a ^= (_zeroFill($c,3));
    $b -= $c; $b -= $a; $b ^= ($a<<10);
    $c -= $a; $c -= $b; $c ^= (_zeroFill($b,15));
    return array($a,$b,$c);
  }
 
  function _GoogleCH($url, $length=null, $init=GOOGLE_MAGIC){
    if(is_null($length))
      $length = sizeof($url);
    $a = $b = 0x9E3779B9;
    $c = $init;
    $k = 0;
    $len = $length;
    while($len >= 12){
      $a += ($url[$k + 0] + ($url[$k + 1] << 8) + ($url[$k + 2] << 16) + ($url[$k + 3] << 24));
      $b += ($url[$k + 4] + ($url[$k + 5] << 8) + ($url[$k + 6] << 16) + ($url[$k + 7] << 24));
      $c += ($url[$k + 8] + ($url[$k + 9] << 8) + ($url[$k + 10] << 16) + ($url[$k + 11] << 24));
      $_mix = _mix($a,$b,$c);
      $a = $_mix[0]; $b = $_mix[1]; $c = $_mix[2];
      $k += 12;
      $len -= 12;
    }
    $c += $length;
    switch($len){
      case 11: $c += ($url[$k + 10] << 24);
      case 10: $c += ($url[$k + 9] << 16);
      case 9 : $c += ($url[$k + 8] << 8);
      case 8 : $b += ($url[$k + 7] << 24);
      case 7 : $b += ($url[$k + 6] << 16);
      case 6 : $b += ($url[$k + 5] << 8);
      case 5 : $b += ($url[$k + 4]);
      case 4 : $a += ($url[$k + 3] << 24);
      case 3 : $a += ($url[$k + 2] << 16);
      case 2 : $a += ($url[$k + 1] << 8);
      case 1 : $a += ($url[$k + 0]);
    }
    $_mix = _mix($a,$b,$c);
    return $_mix[2];
  }
 
  function _strord($string){
    for($i = 0;$i < strlen($string);$i++)
      $result[$i] = ord($string{$i});
    return $result;
}
 
  function getPageRank($url){
    $pagerank = -1;
    $ch = "6"._GoogleCH(_strord("info:" . $url));
    $fp = fsockopen("www.google.com", 80, $errno, $errstr, 30);
    if($fp){
      $out = "GET /search?client=navclient-auto&ch=" . $ch . "&features=Rank&q=info:" . $url . " HTTP/1.1\r\n";
      $out .= "Host: www.google.com\r\n";
      $out .= "Connection: Close\r\n\r\n";
      fwrite($fp, $out);
      while (!feof($fp)){
        $data = fgets($fp, 128);
        $pos = strpos($data, "Rank_");
        if($pos === false){
        }else
          $pagerank = substr($data, $pos + 9);
      }
      fclose($fp);
    }
    return $pagerank;
}
 
//////////////////////////////////////////
CALL URL'S FROM THE DATA BASE
MADE A WHILE {
$pr = getPageRank("$pagerankurl");
SAVE IN DATABASE
 
?>

Open in new window

0
Comment
Question by:Pedro Chagas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 11

Accepted Solution

by:
BrianMM earned 500 total points
ID: 24420552
Hi,

Recently(ish) i implemented a web scraping tool following some hints from http://www.developertutorials.com/blog/php/parallel-web-scraping-in-php-curl-multi-functions-375/ which does cURL in parallel.

Check it out see if it gives you some pointers.

If not let me know and I'll see what can be done when I have more time to spend.
0

Featured Post

Salesforce Has Never Been Easier

Improve and reinforce salesforce training & adoption using WalkMe's digital adoption platform. Start saving on costly employee training by creating fast intuitive Walk-Thrus for Salesforce. Claim your Free Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question