Solved

Using Robots.txt with Add-on Domains and Subdomains

Posted on 2011-02-20
5
493 Views
Last Modified: 2013-12-09
Hi - I have seen a lot of similar answers but none that completely convinces me that what I'm doing is right.

I have a master account for mydomain.com.

Within mydomain.com i have created a number of subdomains for testing purposes and are stored in directories within the root directory for the website. For example, mydomain.com/sub1 and mydomain.com/sub2 are also mapped to subdomains sub1.mydomain.com and sub2.mydomain.com.

Some of these are also mapped to add-on domains like sub3.com and sub4.com. These are very small-scale, low budget websites that are mainly blogs/personal/very small business sites I host for friends, not commercial accounts, so they really can't justify the expense or effort of creating and maintaining their own separate hosting accounts.

I recently discovered that sub3.mydomain.com and sub4.mydomain.com are being indexed by Google (even though it is NOT linked to anywhere as it is a test site under development). We are using Google Wave for discussion of sub4, so it is possible they could have used that semi-private info (disturbing, but another story). Not sure how they would know about sub3.

In the root directory for the website, I added robots.txt (oops, was sloppy before and didn't create it for subs1-20). This includes the lines:

    User-agent: *
    Disallow: /sub1/
    Disallow: /sub2/
    Disallow: /sub3/
    Disallow: /sub4/

Will this effectively prevent mydomain.com/subX and subX.mydomain.com from being indexed while still allowing sub3.com and sub4.com to be indexed (and controlled using additional robots.txt in their root directories)?

Is there anything else I should do? Thanks!
0
Comment
Question by:nhtechgal
  • 3
  • 2
5 Comments
 

Author Comment

by:nhtechgal
ID: 34940370
P.S. I used webmaster tools/Crawler Access to test my robots.txt and the results I got for these subdomains was no different from what I got for my many other subdomains that I have yet to include in robots.txt (and which haven't been indexed by Google for whatever reason). Thanks.
0
 
LVL 29

Accepted Solution

by:
fibo earned 500 total points
ID: 34945341
 User-agent: *
    Disallow: /sub1/
    Disallow: /sub2/
    Disallow: /sub3/
    Disallow: /sub4/

Will this effectively prevent mydomain.com/subX and subX.mydomain.com from being indexed while still allowing sub3.com and sub4.com to be indexed (and controlled using additional robots.txt in their root directories)?

No, it won't (but leave it however)
Disallow does not say "it is forbidden to index this", it just says "please, be kind enough not to index this"
So any link to your unwanted addresses will still be indexed, even id the robot is smart enough to do not go on its own to these subdirs.

My suggested solution:
In each secondary subdir / site, place a mod-rewrite by htaccess which will solve the issue.
Ie, for sub1, place in /sub1/ an ht access like
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [L,R=301] 

Open in new window

This should solve your problem. If some "older" pages remain indexed, you will then remove them from the index with webmasters tools.
0
 
LVL 29

Expert Comment

by:fibo
ID: 35110813
Hi, have you been able to test my suggestions?
0
 

Author Closing Comment

by:nhtechgal
ID: 35951254
Had to search for a bit as I'm not a server administrator, but found most of the selections.
0
 
LVL 29

Expert Comment

by:fibo
ID: 35951394
B-) Glad it could help. Thx for the points and grade.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Read about why website design really matters in today's demanding market.
Get to know the ins and outs of building a web-based ERP system for your enterprise. Development timeline, technology, and costs outlined.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.

939 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now