sysautomation
asked on
robots.txt
Hi
I wan to block access to all robots except google, yahoo and msn to all of my website AND want to block one directory /i for ALL robots INCLUDING google, yahoo and msn. I have created following robots.txt. Can someone please confirm if this is correct? Also want robot not to kill my server therefore go as slow as they can.
User-agent: Googlebot
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: *
Disallow: /
Crawl-delay: 120
Disallow: /i/
I am using Apache on some machines and others are using Nginx all using CentOS.
I wan to block access to all robots except google, yahoo and msn to all of my website AND want to block one directory /i for ALL robots INCLUDING google, yahoo and msn. I have created following robots.txt. Can someone please confirm if this is correct? Also want robot not to kill my server therefore go as slow as they can.
User-agent: Googlebot
Disallow:
User-agent: MSNBot
Disallow:
User-agent: Slurp
Disallow:
User-agent: *
Disallow: /
Crawl-delay: 120
Disallow: /i/
I am using Apache on some machines and others are using Nginx all using CentOS.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
No, it's almost impossible to block them The only reason I block search bots on one directory is to keep it from trying to index every day and time on my calendar page. Other than that, I don't bother. The 'good robots' aren't going to access your site very often. Google is typically once every 3 months unless you become very popular and change content very often. Bing is similar and Yahoo gets their search results from Bing now so I don't know if they even have search bots anymore.
For what it's worth, Baidu, the Chinese search engine, hits my site more than anyone else.
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
ASKER