Solved

Using Robots.txt with Add-on Domains and Subdomains

Posted on 2011-02-20
5
480 Views
Last Modified: 2013-12-09
Hi - I have seen a lot of similar answers but none that completely convinces me that what I'm doing is right.

I have a master account for mydomain.com.

Within mydomain.com i have created a number of subdomains for testing purposes and are stored in directories within the root directory for the website. For example, mydomain.com/sub1 and mydomain.com/sub2 are also mapped to subdomains sub1.mydomain.com and sub2.mydomain.com.

Some of these are also mapped to add-on domains like sub3.com and sub4.com. These are very small-scale, low budget websites that are mainly blogs/personal/very small business sites I host for friends, not commercial accounts, so they really can't justify the expense or effort of creating and maintaining their own separate hosting accounts.

I recently discovered that sub3.mydomain.com and sub4.mydomain.com are being indexed by Google (even though it is NOT linked to anywhere as it is a test site under development). We are using Google Wave for discussion of sub4, so it is possible they could have used that semi-private info (disturbing, but another story). Not sure how they would know about sub3.

In the root directory for the website, I added robots.txt (oops, was sloppy before and didn't create it for subs1-20). This includes the lines:

    User-agent: *
    Disallow: /sub1/
    Disallow: /sub2/
    Disallow: /sub3/
    Disallow: /sub4/

Will this effectively prevent mydomain.com/subX and subX.mydomain.com from being indexed while still allowing sub3.com and sub4.com to be indexed (and controlled using additional robots.txt in their root directories)?

Is there anything else I should do? Thanks!
0
Comment
Question by:nhtechgal
  • 3
  • 2
5 Comments
 

Author Comment

by:nhtechgal
ID: 34940370
P.S. I used webmaster tools/Crawler Access to test my robots.txt and the results I got for these subdomains was no different from what I got for my many other subdomains that I have yet to include in robots.txt (and which haven't been indexed by Google for whatever reason). Thanks.
0
 
LVL 29

Accepted Solution

by:
fibo earned 500 total points
ID: 34945341
 User-agent: *
    Disallow: /sub1/
    Disallow: /sub2/
    Disallow: /sub3/
    Disallow: /sub4/

Will this effectively prevent mydomain.com/subX and subX.mydomain.com from being indexed while still allowing sub3.com and sub4.com to be indexed (and controlled using additional robots.txt in their root directories)?

No, it won't (but leave it however)
Disallow does not say "it is forbidden to index this", it just says "please, be kind enough not to index this"
So any link to your unwanted addresses will still be indexed, even id the robot is smart enough to do not go on its own to these subdirs.

My suggested solution:
In each secondary subdir / site, place a mod-rewrite by htaccess which will solve the issue.
Ie, for sub1, place in /sub1/ an ht access like
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.mydomain\.com
RewriteRule ^(.*)$ http://www.mydomain.com/$1 [L,R=301] 

Open in new window

This should solve your problem. If some "older" pages remain indexed, you will then remove them from the index with webmasters tools.
0
 
LVL 29

Expert Comment

by:fibo
ID: 35110813
Hi, have you been able to test my suggestions?
0
 

Author Closing Comment

by:nhtechgal
ID: 35951254
Had to search for a bit as I'm not a server administrator, but found most of the selections.
0
 
LVL 29

Expert Comment

by:fibo
ID: 35951394
B-) Glad it could help. Thx for the points and grade.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

I've been asked to discuss some of the UX activities that I'm using with my team. Here I will share some details about how we approach UX projects.
Any business that wants to seriously grow needs to keep the needs and desires of an international audience of their websites in mind. Making a website friendly to international users isn’t prohibitively expensive and can provide an incredible return…
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now