PHP script to tell me when a robot accesses a file

I want to have a php script that I will put in my robots.txt file to let me know when a search engine spider accesses the robots.txt file.

I would like the script to email me the referrer, the agent (for example, Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)), the ip address and time and date that it accessed my robots.txt file.

I would also like to be able to have it email me the domain name where the robots.txt file resides. If this can be done automatically that would be great, otherwise I would like to be able to input that information as a variable near the top of the file. I would like the domain name to be in the subject line of the email that is sent to me, such as a subject line of "Spider Access on www.mysite.com"

I usually have some information in my robots.txt file, such as
User-agent: *
Disallow:myfile.php

I would still like to be able to instruct the spider not to access certain files.

Thank You.
timshankAsked:
Who is Participating?
 
_GeG_Connect With a Mentor Commented:
ok,ok, a typo ;)
<?php
echo file_get_contents('robots.txt'); // this sends the file robots.txt to the spider
mail('you@domain.com', "Spider Access on {$_SERVER['HTTP_HOST']}", 'Date: '.date('d.m.Y H:i:s')."\n{$_SERVER['HTTP_USER_AGENT']}\n".
    "{$_SERVER['REMOTE_ADDR']}\n");
?>

if forgot a } after $_SERVER['HTTP_USER_AGENT'] :(
0
 
neorushCommented:
Its not formatted all pretty....you could send an HTML email if you wanted to to pretty it up...but it works...and it workds quickly...also probably gives you more info than you wanted.
<?PHP

$message = "Robots.txt Access Log: ".date("l dS of F Y h:i:s A")."\n\r\n\r";

foreach($_SERVER as $key => $value){

      $message .= $key." : ".$value."\n\n";

}

mail("youraddress@example.com","Robots.txt Access Log",$message);

?>
0
 
_GeG_Commented:
the problem is that when robots.txt is accessed, normally no php script is started.
Try this:
create a normal robots.txt file
make a .htaccess file in the root directory (your domains root directory):
____________
RewriteEngine On
RewriteRule ^robots.txt$ sendmail.php [L]
____________

and create sendmail.php:
____________
<?php
echo file_get_contents('robots.txt'); // this sends the file robots.txt to the spider
mail('you@domain.com', "Spider Access on {$_SERVER['HTTP_HOST']}", 'Date: '.date('d.m.Y H:i:s')."\n{$_SERVER['HTTP_USER_AGENT']\n".
    "{$_SERVER['REMOTE_ADDR']}\n");
?>
_____________

I don't think it make sense to include the referer, because all spiders test for robots.txt before they access the page, so there won't be a referer.
btw the referer thing will only work on apache
0
 
timshankAuthor Commented:
GeG. I'm getting the following error when I try to access robots.txt in my browser.

Warning: Unexpected character in input: '\' (ASCII=92) state=1 in /home/dogs/public_html/sendmail.php on line 3

Parse error: parse error, unexpected T_STRING, expecting '}' in /home/dogs/public_html/sendmail.php on line 3
0
All Courses

From novice to tech pro — start learning today.