Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

PHP script to tell me when a robot accesses a file

Posted on 2005-03-27
4
Medium Priority
?
244 Views
Last Modified: 2006-11-18
I want to have a php script that I will put in my robots.txt file to let me know when a search engine spider accesses the robots.txt file.

I would like the script to email me the referrer, the agent (for example, Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)), the ip address and time and date that it accessed my robots.txt file.

I would also like to be able to have it email me the domain name where the robots.txt file resides. If this can be done automatically that would be great, otherwise I would like to be able to input that information as a variable near the top of the file. I would like the domain name to be in the subject line of the email that is sent to me, such as a subject line of "Spider Access on www.mysite.com"

I usually have some information in my robots.txt file, such as
User-agent: *
Disallow:myfile.php

I would still like to be able to instruct the spider not to access certain files.

Thank You.
0
Comment
Question by:timshank
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 6

Expert Comment

by:neorush
ID: 13641669
Its not formatted all pretty....you could send an HTML email if you wanted to to pretty it up...but it works...and it workds quickly...also probably gives you more info than you wanted.
<?PHP

$message = "Robots.txt Access Log: ".date("l dS of F Y h:i:s A")."\n\r\n\r";

foreach($_SERVER as $key => $value){

      $message .= $key." : ".$value."\n\n";

}

mail("youraddress@example.com","Robots.txt Access Log",$message);

?>
0
 
LVL 9

Expert Comment

by:_GeG_
ID: 13644964
the problem is that when robots.txt is accessed, normally no php script is started.
Try this:
create a normal robots.txt file
make a .htaccess file in the root directory (your domains root directory):
____________
RewriteEngine On
RewriteRule ^robots.txt$ sendmail.php [L]
____________

and create sendmail.php:
____________
<?php
echo file_get_contents('robots.txt'); // this sends the file robots.txt to the spider
mail('you@domain.com', "Spider Access on {$_SERVER['HTTP_HOST']}", 'Date: '.date('d.m.Y H:i:s')."\n{$_SERVER['HTTP_USER_AGENT']\n".
    "{$_SERVER['REMOTE_ADDR']}\n");
?>
_____________

I don't think it make sense to include the referer, because all spiders test for robots.txt before they access the page, so there won't be a referer.
btw the referer thing will only work on apache
0
 

Author Comment

by:timshank
ID: 13645220
GeG. I'm getting the following error when I try to access robots.txt in my browser.

Warning: Unexpected character in input: '\' (ASCII=92) state=1 in /home/dogs/public_html/sendmail.php on line 3

Parse error: parse error, unexpected T_STRING, expecting '}' in /home/dogs/public_html/sendmail.php on line 3
0
 
LVL 9

Accepted Solution

by:
_GeG_ earned 2000 total points
ID: 13645329
ok,ok, a typo ;)
<?php
echo file_get_contents('robots.txt'); // this sends the file robots.txt to the spider
mail('you@domain.com', "Spider Access on {$_SERVER['HTTP_HOST']}", 'Date: '.date('d.m.Y H:i:s')."\n{$_SERVER['HTTP_USER_AGENT']}\n".
    "{$_SERVER['REMOTE_ADDR']}\n");
?>

if forgot a } after $_SERVER['HTTP_USER_AGENT'] :(
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this. Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it i…
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question