Solved

Robots.txt -- I only want to allow access to the public root directory

Posted on 2006-10-31
5
448 Views
Last Modified: 2010-05-19
WHat is the best way to write a robots.txt file so that robots will only have access to the webite's main public root directory.  In this case the root is named public_html

THanks

Rowby
0
Comment
Question by:Rowby Goren
5 Comments
 
LVL 8

Expert Comment

by:radnor
ID: 17843859
0
 
LVL 3

Assisted Solution

by:jsev1995
jsev1995 earned 100 total points
ID: 17846049
You have to have a seperate disallow line for each folder to protect, no way to do it with a whitelist approach.

User-agent: *
Disallow: /folder/
Disallow: /folder2/

Please note that the / is relative to the root of the domain, NOT THE SERVER. Nobody but others on your server will have access to any path outside of you public_html folder. DO NOT EVER INCLUDE IT IN A LINK!
0
 
LVL 1

Accepted Solution

by:
austerhaus earned 400 total points
ID: 17847884
This site has (I think) a better description of the format and parameters for the robots.txt file. Remember, like jsev1995 indicated, robots.txt works like a "blacklist" rather than a "whitelist". Meaning that in order to disallow access to specific directories you must explicitly define each directory in robots.txt. It is not possible to say: "give access only to this directory". Instead, your rules must say: "block access to every directory in this list".

http://www.robotstxt.org/wc/exclusion-admin.html

If possible, you may want to put all of your "unlisted" folders inside a parent directory then simply block that parent directory in robots.txt. For example, if you structure the root directory like:
www.yoursite.com/
    -robots.txt
    -/alloweddirectory
         -/subdirectory1
         -/subdirectory2
         -/...
    -/blockeddirectory
         -/subdirectory1
         -/subdirectory2
         -/...

...and the contents of robots.txt looks like:
User-agent: *
Disallow: /blockeddirectory/

...this should allow search engines to access everything in "alloweddirectory" and deny access to everything in "blockeddirectory".

0
 
LVL 3

Expert Comment

by:jsev1995
ID: 17848995
depending on wether or not your site is already coded, moving the files will screw up all your links. So that might not work.
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 17849199
Thanks

Now I know *all about robots*

Rowby
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Learn by example how to specify CSS selectors for Selenium WebDriver test automation software.
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question