Link to home
Start Free TrialLog in
Avatar of nutnut
nutnutFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Robot.txt

Hi,

Can somebody please explain what this is telling google to do in the Robot.txt

User-agent: *
Disallow: /Blog/
Allow: /Blog/post

Is it valid?

Would (notice the line switch)

User-agent: *
Allow: /Blog/post
Disallow: /Blog/

be better?
Avatar of Meir Rivkin
Meir Rivkin
Flag of Israel image

user-agent * -> An entry that applies to all bots
Disallow: /Blog/ -> URLs matching /Blog/ would be disallowed for Googlebot (and everything in this directory).
Allow: /Blog/post -> URLs matching /Blog/post would be allowed for Googlebot (and everything in this directory).

the answer depends of your requirement, if u want to allow blog/post but disallow blog then use the 1st case, other wise use the 2nd one.
Both valid but in order to be compatible to all robots it is necessary to place the Allow directive(s) first, followed by the Disallow, for example:

User-agent: *
Allow: /Blog/post
Disallow: /Blog/
The order is only important to robots that follow the standard; in the case of the Google or Bing bots, the order is not important.
Here is a reference on robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449

in this case, only the URLs matching /Blog/ would be disallowed for Googlebot.

User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Avatar of nutnut

ASKER

Hi thanks for the responses.

So I need to

Allow:
www.mysite.com
www.mysite.com/Blog/post ...and then everything below it.

Disallow:
www.mysite.com/Blog/ ..everything underneath EXCEPT www.mysite.com/Blog/post

How would I do that in a Robot.txt

Reason is that google is seeing massive duplication of my site due to tags

So,

/Blog?page=1 and /Blog?page=2 & /Blog?page=3 are seen as the same
/Blog?Tag=BlahBlah is seen /Blog?Tag=DohDoh are seen as the same.  

I just want google to read my main site www.mysite.com and then under read nothing under  www.mysite.com/Blog EXCEPT www.mysite.com/Blog/post ..and below.

Is there a better way to do this outside of Robot.txt.  All post under www.mysite.com/Blog/post so for example www.mysite.com/Blog/post/My-Blog-Post.aspx is Canonicalized fine.

Thanks
There is no problem to use

User-agent: *
Disallow: /blog?Tag
Disallow: /blog?page
Avatar of nutnut

ASKER

How about

User-agent: *
Allow: /Blog/post
Allow: /Blog/post/
Disallow: /Blog
Disallow: /Blog/

Is this ok syntax wise
ASKER CERTIFIED SOLUTION
Avatar of TvMpt
TvMpt
Flag of Portugal image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nutnut

ASKER

Thanks very much. I have gone for

User-agent: *
Allow: /Blog/post/
Disallow: /Blog