Link to home
Start Free TrialLog in
Avatar of sysautomation
sysautomation

asked on

robots.txt

Hi

I wan to block access to all robots except google, yahoo and msn to all of my website AND want to block one directory /i for ALL robots INCLUDING google, yahoo and msn. I have created following robots.txt. Can someone please confirm if this is correct? Also want robot not to kill my server therefore go as slow as they can.

User-agent: Googlebot
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: *
Disallow: /
Crawl-delay: 120
Disallow: /i/


I am using Apache on some machines and others are using Nginx all using CentOS.
ASKER CERTIFIED SOLUTION
Avatar of Dave Baldwin
Dave Baldwin
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sysautomation
sysautomation

ASKER

Thanks. Any advice to block 'Bad' bots who don't consider robots.txt ? I See many bots coming from a very different block of IPs hence doesn't seem possible to block from the firewall.
No, it's almost impossible to block them  The only reason I block search bots on one directory is to keep it from trying to index every day and time on my calendar page.  Other than that, I don't bother.  The 'good robots' aren't going to access your site very often.  Google is typically once every 3 months unless you become very popular and change content very often.  Bing is similar and Yahoo gets their search results from Bing now so I don't know if they even have search bots anymore.
For what it's worth, Baidu, the Chinese search engine, hits my site more than anyone else.

Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)