Scrape Google SERPs

Posted on 2009-04-23
Last Modified: 2012-05-06

I've been at this for a bit today but I had the bright idea to build a script that would crawl Google SERPs to see how well we're ranking for certain keywords. It's going to be doing a thing or two extra but that's easy stuff, no help is needed there. The intent of this class is to return all links found in the results pages and store them in a text file. I'm stuck at find the links.

I've spent quite a bit of time ON Google today trying to find some good info on how to go about this. I've investigated and found a few decent approaches on how to crawl google but ultimately they have all failed due to Google returning their source as Javascript. I've found using cURL has worked but I can't seem to extract what I need


class crawl {

	public function __construct($keyword, $se) {

		$this->keyword 				= $keyword;

		$this->prepend_txt_raw 		= 'raw_';

		switch($se) {


			case 'google';

				$engine = 'google';

				$this->crawl_google($this->keyword, $this->prepend_txt_raw . $engine);



		$this->extract_links($this->prepend_txt_raw . $engine . '.txt');


	private function crawl_google($keyword, $fn){

		$ch 	= curl_init(''.urlencode($keyword).'&btnG=Google+Search&meta=');

		$file 	= fopen($fn . '.txt', "w");

		curl_setopt($ch, CURLOPT_FILE, $file);

		curl_setopt($ch, CURLOPT_HEADER, 0);





	private function extract_links($page) {

		$myFile = $page;

		$fh = fopen($myFile, 'r');

		$theData = fread($fh, filesize($myFile));

		preg_match('#<body[^>]*>.*?</body>#is', $data, $body);

		preg_match_all('/https?:[^\'" <>]+/i',$body[0],$matches);

	  	for ($i = 0; $i < count($matches[0]); $i++) {





	 	$urls = array_unique($urls);


		return $urls;



$c = new crawl('bird+is+the+word', 'google');


Open in new window

Question by:SOakley54
    LVL 34

    Accepted Solution

    Use this pattern

    "return clk\(this\.href,'','','res','([0-9]+)',''\)\"

    and the position in the Google rankings is given by the [0-9] bit. This was pulled from an EREGI rather than a PREG so a bit of recoding is probably needed.
    LVL 107

    Assisted Solution

    by:Ray Paseur
    Check Line 31:   $theData = fread($fh, filesize($myFile));
    Check Line 33:    preg_match('#<body[^>]*>.*?</body>#is', $data, $body);

    Should both of those be using the same variable - $theData vs $data - ?
    LVL 107

    Expert Comment

    by:Ray Paseur
    Thanks for the points.  

    Sidebar note to Brian and all those who have ereg functions in their code (and I have a LOT of ereg functions - both code that I wrote and code that I inherited).  See the note here:

    That change will account for probably 75% of my time in refactoring.  If somebody wanted to do the community a great favor, a function-to-function map that converted ereg to preg would be pretty spiffy!

    Best to all, ~Ray
    LVL 34

    Expert Comment

    by:Beverley Portlock
    Ray - same problem here with ereg.  I always like ereg, it was less of a PITA than preg...
    LVL 107

    Expert Comment

    by:Ray Paseur
    Yep, me too.  But if PHP can have a GOTO statement, maybe we can get ereg carried forward.  



    Featured Post

    Find Ransomware Secrets With All-Source Analysis

    Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

    Join & Write a Comment

    Suggested Solutions

    Title # Comments Views Activity
    php query string addition 4 24
    Write an app 9 35
    Amazon Product image url. 12 26
    create csv file from recordset  in php 4 20
    This is a general how to create your own custom plugin system for your PHP application that you designed (or wish to extend a third party program to have plugin functionality that doesn't have it yet).  This is not how to make plugins for existing s…
    Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
    The viewer will learn how to dynamically set the form action using jQuery.
    The viewer will learn how to count occurrences of each item in an array.

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now