Link to home
Start Free TrialLog in
Avatar of Victor Kimura
Victor KimuraFlag for Canada

asked on

disallow crawl of search engine bots .hatccess or robots.txt

Hi,
Have a few questions:

1) I'm just wondering if it's better from a SEO perspective whether to disallow the crawl of folders using htaccess or the robots.txt file?

2) How do I disallow folders/files using htaccess?

3) Sorry, another question. I have these subdomains but I'd like to create either an .htaccess file or create a command in the robots.txt file to block the index pages since there is nothing there.
http://finance.ultratrust.com/
http://estate-planning.ultratrust.com/

4) I have these pages in the subdomain estate-planning.ultratrust.com:
http://estate-planning.ultratrust.com/estate-planning-asset-protection.html
http://estate-planning.ultratrust.com/estate-taxes-grat-life-insurance-trusts.html

Since I only have these pages how do I create an entry to only allow the search engine bots to crawl these two pages. There are other files that are created by the cPanel when creating a  subdomain like the 400.shtml files and I don't wish to include those in the crawl or have them indexed.


Thanks,
Victor
SOLUTION
Avatar of Tony McCreath
Tony McCreath
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Victor Kimura

ASKER

Hi, thanks for the feedback. What I'm doing right now is placing an index.php file there so I won't get these errors.
I guess you wouldn't know about these obscure pages that the googlebot is crawling, would you? I'm looking at the errors via the webmaster tools.
SOLUTION
Avatar of Dave Baldwin
Dave Baldwin
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi,

Yes, I created the index.php files now with the php redirect header command. It wasn't server wide. Just an a couple subdomains.

Is it really necessary for me to use the meta on the .shtml files though?

I found this on the robots meta tag from google's blog:
http://googlewebmastercentral.blogspot.ca/2007/03/using-robots-meta-tag.html

It should be <meta name='robots' content='noindex,nofollow' /> or just noindex

I have a related problem with regards to the panda update though:
https://www.experts-exchange.com/questions/27715947/SEO-after-Panda-update.html?anchorAnswerId=37968813#a37968813

thank you.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you all (Tiggerito, DaveBaldwin, mwecomputers).

mwecomputers, that looks interesting (the perl script). I'll take a look at it in the near future. =)