Avatar of pacumming
pacummingFlag for United States of America

asked on 

Need help with Robots.txt to exclude my root directory only

Have a website www.mysite.com
I want to restrict the root which like most web servers is under public.html on a UNIX server.
However all other folders are in that same directory.

I have a sub site at www.mysite.com/mydirectory which I do want indexed.

***I currently have the following which is supposed to not index anything. But now that I added the subfolder of mydirectory off the root, I do want that indexed. I also want it re-indexed as often as possible.

User-agent: Googlebot
Disallow: /
User-agent: *
Disallow: /

What robots.txt file should I use to NOT index the root which contains www.mysite.com (index.php) but will index sub folders and also cause frequent re-indexing for better SEO results?

Thanks
Peter

Web Languages and StandardsWeb Applications

Avatar of undefined
Last Comment
dfxdeimos
Avatar of dfxdeimos
dfxdeimos
Flag of United States of America image

Well, the issue is that there is no support for a simple ALLOW command in the Robots.txt file.

Check out this ( http://www.robotstxt.org/robotstxt.html ) resource, it explains all the options you have with Robots.txt files and some common usage scenarios.
Avatar of pacumming
pacumming
Flag of United States of America image

ASKER

I need a concrete example for my case, I already have been to the site b4 I wrote the question..
ASKER CERTIFIED SOLUTION
Avatar of dfxdeimos
dfxdeimos
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of pacumming
pacumming
Flag of United States of America image

ASKER

Hmmm interesting, could disallow just the docs, good idea.
Peter
Avatar of dfxdeimos
dfxdeimos
Flag of United States of America image

Just a side note:
Apparently some "major crawlers" (like Google) are now utilizing the "Allow" command, according to Wikipedia (http://en.wikipedia.org/wiki/Robots.txt). In that case you could do something like:

User-agent: *
Disallow: /
Allow: /pcsupport/

Open in new window

Avatar of scrathcyboy
scrathcyboy
Flag of United States of America image

You cannot and should not deny access to your root (i.e. public_html) -- almost ALL robots START in the root, in fact some like google with NOT start anywhere else but the root of the website.  So to try to deny access to the root is to deny all search engines access to your entire site.  You have to rethink this one ...
Avatar of dfxdeimos
dfxdeimos
Flag of United States of America image

Scrathcyboy:
As long as you understand the implications it is perfectly acceptable (and do-able) to block indexing of your root directory.
The examples that I have listed above are all examples of valid robots.txt files, including the one that blocks indexing of the root and then specifically grants it to a subdirectory.
The key here is that you must understand the implications of blocking indexing of the root, which Peter clearly does.
Avatar of scrathcyboy
scrathcyboy
Flag of United States of America image

sure, that his site will never be indexed at all by google or Yahoo.  If you don't believe me, just try it for some real-world experience.
Avatar of dfxdeimos
dfxdeimos
Flag of United States of America image

Umm... that was the objective scrathcyboy. =\

I did a test on a domain that I own and if you disallow / and then allow /subfolder/ Google will index the contents of the subfolder.
Web Languages and Standards
Web Languages and Standards

Web development can range from developing the simplest static single page of plain text to the most complex web-based internet applications, electronic businesses, and social network services using a wide variety of languages and standards, including the familiar HTML, JavaScript and jQuery, ASP and ASP.NET, PHP, ColdFusion, CSS, PHP, Flex and Flash, but also the implementation of a broad list of standards including XML, WSDL, SSDL, VoiceXML and many more.

40K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo