asked on

robots.txt

Hi

I wan to block access to all robots except google, yahoo and msn to all of my website AND want to block one directory /i for ALL robots INCLUDING google, yahoo and msn. I have created following robots.txt. Can someone please confirm if this is correct? Also want robot not to kill my server therefore go as slow as they can.

User-agent: Googlebot
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: *
Disallow: /
Crawl-delay: 120
Disallow: /i/

I am using Apache on some machines and others are using Nginx all using CentOS.

ASKER CERTIFIED SOLUTION

Dave Baldwin

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sysautomation

ASKER

Thanks. Any advice to block 'Bad' bots who don't consider robots.txt ? I See many bots coming from a very different block of IPs hence doesn't seem possible to block from the firewall.

Dave Baldwin

No, it's almost impossible to block them The only reason I block search bots on one directory is to keep it from trying to index every day and time on my calendar page. Other than that, I don't bother. The 'good robots' aren't going to access your site very often. Google is typically once every 3 months unless you become very popular and change content very often. Bing is similar and Yahoo gets their search results from Bing now so I don't know if they even have search bots anymore.

Dave Baldwin

For what it's worth, Baidu, the Chinese search engine, hits my site more than anyone else.

Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)