Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Using Robots.txt with Add-on Domains and Subdomains

Posted on 2011-02-20
5
Medium Priority
?
569 Views
Last Modified: 2013-12-09
Hi - I have seen a lot of similar answers but none that completely convinces me that what I'm doing is right.

I have a master account for mydomain.com.

Within mydomain.com i have created a number of subdomains for testing purposes and are stored in directories within the root directory for the website. For example, mydomain.com/sub1 and mydomain.com/sub2 are also mapped to subdomains sub1.mydomain.com and sub2.mydomain.com.

Some of these are also mapped to add-on domains like sub3.com and sub4.com. These are very small-scale, low budget websites that are mainly blogs/personal/very small business sites I host for friends, not commercial accounts, so they really can't justify the expense or effort of creating and maintaining their own separate hosting accounts.

I recently discovered that sub3.mydomain.com and sub4.mydomain.com are being indexed by Google (even though it is NOT linked to anywhere as it is a test site under development). We are using Google Wave for discussion of sub4, so it is possible they could have used that semi-private info (disturbing, but another story). Not sure how they would know about sub3.

In the root directory for the website, I added robots.txt (oops, was sloppy before and didn't create it for subs1-20). This includes the lines:

    User-agent: *
    Disallow: /sub1/
    Disallow: /sub2/
    Disallow: /sub3/
    Disallow: /sub4/

Will this effectively prevent mydomain.com/subX and subX.mydomain.com from being indexed while still allowing sub3.com and sub4.com to be indexed (and controlled using additional robots.txt in their root directories)?

Is there anything else I should do? Thanks!
0
Comment
Question by:nhtechgal
  • 3
  • 2
5 Comments
 

Author Comment

by:nhtechgal
ID: 34940370
P.S. I used webmaster tools/Crawler Access to test my robots.txt and the results I got for these subdomains was no different from what I got for my many other subdomains that I have yet to include in robots.txt (and which haven't been indexed by Google for whatever reason). Thanks.
0
 
LVL 29

Accepted Solution

by:
fibo earned 1500 total points
ID: 34945341
 User-agent: *
    Disallow: /sub1/
    Disallow: /sub2/
    Disallow: /sub3/
    Disallow: /sub4/

Will this effectively prevent mydomain.com/subX and subX.mydomain.com from being indexed while still allowing sub3.com and sub4.com to be indexed (and controlled using additional robots.txt in their root directories)?

No, it won't (but leave it however)
Disallow does not say "it is forbidden to index this", it just says "please, be kind enough not to index this"
So any link to your unwanted addresses will still be indexed, even id the robot is smart enough to do not go on its own to these subdirs.

My suggested solution:
In each secondary subdir / site, place a mod-rewrite by htaccess which will solve the issue.
Ie, for sub1, place in /sub1/ an ht access like
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [L,R=301] 

Open in new window

This should solve your problem. If some "older" pages remain indexed, you will then remove them from the index with webmasters tools.
0
 
LVL 29

Expert Comment

by:fibo
ID: 35110813
Hi, have you been able to test my suggestions?
0
 

Author Closing Comment

by:nhtechgal
ID: 35951254
Had to search for a bit as I'm not a server administrator, but found most of the selections.
0
 
LVL 29

Expert Comment

by:fibo
ID: 35951394
B-) Glad it could help. Thx for the points and grade.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article was originally published on Monitis Blog, you can check it here . Today it’s fairly well known that high-performing websites and applications bring in more visitors, higher SEO, and ultimately more sales. By the same token, downtime…
Australian government abolished Visa 457 earlier this April and this article describes how this decision might affect Australian IT scene and IT experts.
Viewers will get an overview of the benefits and risks of using Bitcoin to accept payments. What Bitcoin is: Legality: Risks: Benefits: Which businesses are best suited?: Other things you should know: How to get started:
This video teaches users how to migrate an existing Wordpress website to a new domain.
Suggested Courses

927 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question