Solved

Robots.txt -- I only want to allow access to the public root directory

Posted on 2006-10-31
5
470 Views
Last Modified: 2010-05-19
WHat is the best way to write a robots.txt file so that robots will only have access to the webite's main public root directory.  In this case the root is named public_html

THanks

Rowby
0
Comment
Question by:Rowby Goren
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 8

Expert Comment

by:radnor
ID: 17843859
0
 
LVL 3

Assisted Solution

by:jsev1995
jsev1995 earned 100 total points
ID: 17846049
You have to have a seperate disallow line for each folder to protect, no way to do it with a whitelist approach.

User-agent: *
Disallow: /folder/
Disallow: /folder2/

Please note that the / is relative to the root of the domain, NOT THE SERVER. Nobody but others on your server will have access to any path outside of you public_html folder. DO NOT EVER INCLUDE IT IN A LINK!
0
 
LVL 1

Accepted Solution

by:
austerhaus earned 400 total points
ID: 17847884
This site has (I think) a better description of the format and parameters for the robots.txt file. Remember, like jsev1995 indicated, robots.txt works like a "blacklist" rather than a "whitelist". Meaning that in order to disallow access to specific directories you must explicitly define each directory in robots.txt. It is not possible to say: "give access only to this directory". Instead, your rules must say: "block access to every directory in this list".

http://www.robotstxt.org/wc/exclusion-admin.html

If possible, you may want to put all of your "unlisted" folders inside a parent directory then simply block that parent directory in robots.txt. For example, if you structure the root directory like:
www.yoursite.com/
    -robots.txt
    -/alloweddirectory
         -/subdirectory1
         -/subdirectory2
         -/...
    -/blockeddirectory
         -/subdirectory1
         -/subdirectory2
         -/...

...and the contents of robots.txt looks like:
User-agent: *
Disallow: /blockeddirectory/

...this should allow search engines to access everything in "alloweddirectory" and deny access to everything in "blockeddirectory".

0
 
LVL 3

Expert Comment

by:jsev1995
ID: 17848995
depending on wether or not your site is already coded, moving the files will screw up all your links. So that might not work.
0
 
LVL 9

Author Comment

by:Rowby Goren
ID: 17849199
Thanks

Now I know *all about robots*

Rowby
0

Featured Post

[Webinar] Code, Load, and Grow

Managing multiple websites, servers, applications, and security on a daily basis? Join us for a webinar on May 25th to learn how to simplify administration and management of virtual hosts for IT admins, create a secure environment, and deploy code more effectively and frequently.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Does your audience prefer people in photos or no people? How can you best highlight what you’re selling? What are your competitors doing, and what can you do that is different and unique from them?  Continue reading to learn how to make your images …
Australian government abolished Visa 457 earlier this April and this article describes how this decision might affect Australian IT scene and IT experts.
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question