Solved

Using Robots.txt with Add-on Domains and Subdomains

Posted on 2011-02-20
5
521 Views
Last Modified: 2013-12-09
Hi - I have seen a lot of similar answers but none that completely convinces me that what I'm doing is right.

I have a master account for mydomain.com.

Within mydomain.com i have created a number of subdomains for testing purposes and are stored in directories within the root directory for the website. For example, mydomain.com/sub1 and mydomain.com/sub2 are also mapped to subdomains sub1.mydomain.com and sub2.mydomain.com.

Some of these are also mapped to add-on domains like sub3.com and sub4.com. These are very small-scale, low budget websites that are mainly blogs/personal/very small business sites I host for friends, not commercial accounts, so they really can't justify the expense or effort of creating and maintaining their own separate hosting accounts.

I recently discovered that sub3.mydomain.com and sub4.mydomain.com are being indexed by Google (even though it is NOT linked to anywhere as it is a test site under development). We are using Google Wave for discussion of sub4, so it is possible they could have used that semi-private info (disturbing, but another story). Not sure how they would know about sub3.

In the root directory for the website, I added robots.txt (oops, was sloppy before and didn't create it for subs1-20). This includes the lines:

    User-agent: *
    Disallow: /sub1/
    Disallow: /sub2/
    Disallow: /sub3/
    Disallow: /sub4/

Will this effectively prevent mydomain.com/subX and subX.mydomain.com from being indexed while still allowing sub3.com and sub4.com to be indexed (and controlled using additional robots.txt in their root directories)?

Is there anything else I should do? Thanks!
0
Comment
Question by:nhtechgal
  • 3
  • 2
5 Comments
 

Author Comment

by:nhtechgal
ID: 34940370
P.S. I used webmaster tools/Crawler Access to test my robots.txt and the results I got for these subdomains was no different from what I got for my many other subdomains that I have yet to include in robots.txt (and which haven't been indexed by Google for whatever reason). Thanks.
0
 
LVL 29

Accepted Solution

by:
fibo earned 500 total points
ID: 34945341
 User-agent: *
    Disallow: /sub1/
    Disallow: /sub2/
    Disallow: /sub3/
    Disallow: /sub4/

Will this effectively prevent mydomain.com/subX and subX.mydomain.com from being indexed while still allowing sub3.com and sub4.com to be indexed (and controlled using additional robots.txt in their root directories)?

No, it won't (but leave it however)
Disallow does not say "it is forbidden to index this", it just says "please, be kind enough not to index this"
So any link to your unwanted addresses will still be indexed, even id the robot is smart enough to do not go on its own to these subdirs.

My suggested solution:
In each secondary subdir / site, place a mod-rewrite by htaccess which will solve the issue.
Ie, for sub1, place in /sub1/ an ht access like
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [L,R=301] 

Open in new window

This should solve your problem. If some "older" pages remain indexed, you will then remove them from the index with webmasters tools.
0
 
LVL 29

Expert Comment

by:fibo
ID: 35110813
Hi, have you been able to test my suggestions?
0
 

Author Closing Comment

by:nhtechgal
ID: 35951254
Had to search for a bit as I'm not a server administrator, but found most of the selections.
0
 
LVL 29

Expert Comment

by:fibo
ID: 35951394
B-) Glad it could help. Thx for the points and grade.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
xss alert in domino url 9 52
How to control cache of some js files ? 7 53
Website Question - New Site 3 31
How to build a web site 17 49
Boost your ability to deliver ambitious and competitive web apps by choosing the right JavaScript framework to best suit your project’s needs.
This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question