Link to home
Create AccountLog in
Avatar of Chris Andrews
Chris AndrewsFlag for United States of America

asked on

Hit counter counting bots

I have written a script that counts hits and divides an adshare up between me and my authors.

It is included on the page via include('counter.inc');

I running into a problem in that it's counting bots and referrer spam. That's a problem because my script divides up adsense between us, and counting the bots and referrer spam throws off the counters accuracy.

I have quite a bit of traffic so I need to somehow very efficiently weed out most (I understand I'll never get it 100%) of the bots/spam and only include the file for real hits.

Whats the best way to do that?

Thanks,   Chris
Avatar of jlindler
jlindler

Why not use something like Google Analytics?  It is free and will give you a ton more information (and may help you increase your $$).
Avatar of Chris Andrews

ASKER

But I don't think it will split up adshare for me, unless I missed something.
Check out this page for the data API....  You may be able to slice and dice what you need from this.
http://code.google.com/apis/analytics/docs/gdata/home.html
Avatar of Beverley Portlock
A simple method (but not 100% effective) is to look at the user agent string provided by $_SERVER['HTTP_USER_AGENT'] and either pick it apart manually or use get_browser if your installation has browsecap enabled (see http://www.php.net/manual/en/function.get-browser.php )

In essence you can look for certain strings as many bots DO identify themselves. For instance scan the string for "bot" or "slurp" (that's Yahoo for some odd reason) and reject any that match

$userAgent = strtolower( strip_tags( $_SERVER['HTTP_USER_AGENT'] ) );

if ( strpos( $userAgent, "bot" ) === false ) {

     if ( strpos( $userAgent, "slurp") === false ) {

          // Got to here so many bots have been eliminated
          //
          ..... count hits .....
     }
}

You could turn this on its head and look for useragent strings of known browsers using regexes. For instance

if ( strpos( $userAgent, "gecko") !== false ||
     strpos( $userAgent, "msie") !== false ||
     strpos( $userAgent, "opera") !== false ||
     strpos( $userAgent, "chrome") !== false ) {

     // Known browser......

}

None of these are perfect, but they'll do 95% of it.
A very old script, since I do not use ereg functions any more, but hopefully illustrative of the principles.  You might want to watch your server logs to see what the HTTP_USER_AGENT strings contain.  For my "real" browser, it looks like this:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13

And that is the sort of thing I like to use when I use CURL to read a web site.
<?php // RAY_bad_robots.php
error_reporting(E_ALL);

// A FUNCTION TO IDENTIFY THE BOTS
// SAMPLE USE:
//   if ($bad_robots()) { /* THIS IS A BOT ON MY SITE */ }

function bad_robots() 
{

// THE BOTS WE WANT TO IGNORE
   $bad_robots[]='crawler';
   $bad_robots[]='spider';
   $bad_robots[]='robot';
   $bad_robots[]='slurp';
   $bad_robots[]='Atomz';
   $bad_robots[]='googlebot';
   $bad_robots[]='VoilaBot';
   $bad_robots[]='msnbot';
   $bad_robots[]='Gaisbot';
   $bad_robots[]='Gigabot';
   $bad_robots[]='SBIder';
   $bad_robots[]='Zyborg';
   $bad_robots[]='FunWebProducts';
   $bad_robots[]='findlinks';
   $bad_robots[]='ia_archiver';
   $bad_robots[]='MJ12bot';
   $bad_robots[]='Ask Jeeves';
   $bad_robots[]='NG/2.0';
   $bad_robots[]='voyager';
   $bad_robots[]='Exabot';
   $bad_robots[]='Nutch';
   $bad_robots[]='Hercules';
   $bad_robots[]='psbot';
   $bad_robots[]='LocalcomBot';

// COMPARE THE BOT STRINGS TO THE USER AGENT STRING
   $my_agent = $_SERVER["HTTP_USER_AGENT"];
   $bad_guy  = 0;
   foreach ($bad_robots as $spider) 
   {
      if (eregi("$spider", $my_agent)) { $bad_guy++; }
   }
   if ($bad_guy > 0) 
   {
      return TRUE; 
   } 
   return FALSE; 
}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Thank you all for the help on this. I am going with this solution.

My apologies for not getting back to this q sooner, got involved in other issues - now I'm back to working on this :)