Marco Gasi
asked on
Prevent robots' visit to be registered in database
Hi all.
In my site I used a robots.txt to prevent crawlers' visit to be recorded in the database.
I used this code
But this seems to fail since I still get recorded visits from crawlers: for instance I still se in database visits of no more existent pages from Mountain View (that is by Google, isn't it?)
So what is the best way to accomplish my goal?
Thanks to all for any advice.
Cheers
In my site I used a robots.txt to prevent crawlers' visit to be recorded in the database.
I used this code
$allrobots = file_get_contents( 'allrobots.txt' ); //robot-name:
preg_match_all( '/(?<=robot-id:\s).*(?=$)/im', $allrobots, $crawlers );
if ( !in_array( strtolower( $_SERVER['HTTP_USER_AGENT'] ), $crawlers[0] ) )
{
//here write to the database the visitor's data
But this seems to fail since I still get recorded visits from crawlers: for instance I still se in database visits of no more existent pages from Mountain View (that is by Google, isn't it?)
So what is the best way to accomplish my goal?
Thanks to all for any advice.
Cheers
ASKER
Hi, Dave, thanks for your reply.
What I ned is not a way to prevent robots to scan my pages. I would only avoid to store in the database visits made by crawlers and generally by not human beings, but I don't know if it's possible.
What I ned is not a way to prevent robots to scan my pages. I would only avoid to store in the database visits made by crawlers and generally by not human beings, but I don't know if it's possible.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi, Ray. I don't need a 100% level and I'm sure your suggestion will satisfy my needs the best possible way. I'll sure do suggested tests.
Thank you.
Marco
Thank you.
Marco
ASKER
Thank you both for your help. Have a nice week-end.
Thanks, Marco. You too!
This page http://www.robotstxt.org/robotstxt.html tells you that to tell all (obedient) robots to not scan your pages, you should use the following code in a file called 'robots.txt' in the root of your web directories.
Open in new window