?
Solved

google scraper code

Posted on 2012-03-15
3
Medium Priority
?
459 Views
Last Modified: 2012-03-19
I'm using a google scraper code, which i'm including, to pull top 50 results for a specific search terms.
It dumps the information into a google excel doc .
how can i adjust the code, that instead of giving me the top 50 results, it should split the results to give me just results 1-10, then 11-20, then 21-30 etc.
so I will be able to track better exactly which page the results came from.
/* ---------------------------------------------------------------------------
 * google_scraper.js
 * https://github.com/chrisle/google_scraper.js
 *
 * @desc    Google Scraper for Google Docs Spreadsheet.
 * @author  Chris Le - @djchrisle - chrisl at seerinteractive.com
 * @license MIT (see: http://www.opensource.org/licenses/mit-license.php)
 * @version 1.0.1
 * -------------------------------------------------------------------------*/

var SeerJs_GoogleScraper = (function() {

  var errorOccurred;

  /**
   * Gets stuff inside two tags
   * @param  {string} haystack String to look into
   * @param  {string} start Starting tag
   * @param  {string} end Ending tag
   * @return {string} Stuff inside the two tags
   */
  function getInside(haystack, start, end) {
    var startIndex = haystack.indexOf(start) + start.length;
    var endIndex = haystack.indexOf(end);
    return haystack.substr(startIndex, endIndex - startIndex);
  }

  /**
   * Fetch keywords from Google.  Returns error message if an error occurs.
   * @param {string} kw Keyword
   * @param {array} params Extra parameters as an array of key, values.
   */
  function fetch(kw, optResults) {
    errorOccurred = false;
    optResults = optResults || 10;
    try {
      var url = 'http://www.google.com/search?q=' + kw + "&num=" + optResults;
      return UrlFetchApp.fetch(url).getContentText()
    } catch(e) {
      errorOccurred = true;
      return e;
    }
  }

  /**
   * Extracts the URL from an organic result. Returns false if nothing is found.
   * @param {string} result XML string of the result
   */
  function extractUrl(result) {
    var url;
    if (result.match(/\/url\?q=/)) {
      url = getInside(result, "?q=", "&amp");
      return (url != '') ? url : false
    }
    return false;
  }

  /**
   * Extracts the organic results from the page and puts them into an array.
   * One per element.  Each element is an XMLElement.
   */
  function extractOrganic(html) {
    html = html.replace(/\n|\r/g, '');
    var allOrganic = html.match(/<li class=\"g\">(.*)<\/li>/gi).toString(),
        results = allOrganic.split("<li class=\"g\">"),
        organicData = [],
        i = 0,
        len = results.length,
        url;
    while(i < len) {
      url = extractUrl(results[i]);
      if (url && url.indexOf('http') == 0) {
        organicData.push(url);
      }
      i++;
    }
    return organicData;
  }

  /**
   * Transpose an array from row to cols
   */
  function transpose(ary) {
    var i = 0, len = ary.length, ret = [];
    while(i < len) {
      ret.push([ary[i]]);
      i++;
    }
    return ret;
  }

  //--------------------------------------------------------------------------

  return {
    /**
     * Returns Google SERPs for a given keyword
     * @param  {string} kw Keyword
     */
    get: function(kw, optResults) {
      var result = fetch(kw, optResults);
      if (errorOccurred) { return result; }
      return transpose(extractOrganic(result));
    }
  }
  
})();

function googleScraper(keyword, optResults) {
  return SeerJs_GoogleScraper.get(keyword, optResults);
}

function test() { 
  var withArg = googleScraper("seer interactive", 20);
  var noArg = googleScraper("seer interactive");
  return 0;
}

Open in new window

0
Comment
Question by:rivkamak
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 23

Expert Comment

by:Tony McCreath
ID: 37728056
If it goes to excel, can't you just add a column that works out the page number from the position?
0
 

Author Comment

by:rivkamak
ID: 37730128
That’s the least of my problems. The main thing is to get the top 50 search results in a spreadsheet broken up by page.
0
 
LVL 23

Accepted Solution

by:
Tony McCreath earned 1500 total points
ID: 37731508
I proposed it as an answer to your problems.

Instead of hacking into this code, why not solve it in the spreadsheet. I'm sure it's a lot easier to create a column where it's value is based on the row number divided by 10?
0

Featured Post

Are You Using the Best Web Development Editor?

The worlds of web hosting and web development are constantly evolving. Every year we see design trends change, coding standards adapt and new frameworks/CMS created. With such a quick pace of change it’s easy to get lost trying to keep up.

See if your editor made the list.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Color can increase conversions, create feelings of warmth or even incite people to get behind a cause. If you want your website to really impact site visitors, then it is vital to consider the impact color has on them.
When it comes to security, close monitoring is a must. According to WhiteHat Security annual report, a substantial number of all web applications are vulnerable always. Monitis offers a new product - fully-featured Website security monitoring and pr…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question