nutnut
asked on
Robot.txt
Hi,
Can somebody please explain what this is telling google to do in the Robot.txt
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Is it valid?
Would (notice the line switch)
User-agent: *
Allow: /Blog/post
Disallow: /Blog/
be better?
Can somebody please explain what this is telling google to do in the Robot.txt
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Is it valid?
Would (notice the line switch)
User-agent: *
Allow: /Blog/post
Disallow: /Blog/
be better?
Both valid but in order to be compatible to all robots it is necessary to place the Allow directive(s) first, followed by the Disallow, for example:
User-agent: *
Allow: /Blog/post
Disallow: /Blog/
User-agent: *
Allow: /Blog/post
Disallow: /Blog/
check here for explanation:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
The order is only important to robots that follow the standard; in the case of the Google or Bing bots, the order is not important.
Here is a reference on robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
in this case, only the URLs matching /Blog/ would be disallowed for Googlebot.
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
in this case, only the URLs matching /Blog/ would be disallowed for Googlebot.
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
ASKER
Hi thanks for the responses.
So I need to
Allow:
www.mysite.com
www.mysite.com/Blog/post ...and then everything below it.
Disallow:
www.mysite.com/Blog/ ..everything underneath EXCEPT www.mysite.com/Blog/post
How would I do that in a Robot.txt
Reason is that google is seeing massive duplication of my site due to tags
So,
/Blog?page=1 and /Blog?page=2 & /Blog?page=3 are seen as the same
/Blog?Tag=BlahBlah is seen /Blog?Tag=DohDoh are seen as the same.
I just want google to read my main site www.mysite.com and then under read nothing under www.mysite.com/Blog EXCEPT www.mysite.com/Blog/post ..and below.
Is there a better way to do this outside of Robot.txt. All post under www.mysite.com/Blog/post so for example www.mysite.com/Blog/post/My-Blog-Post.aspx is Canonicalized fine.
Thanks
So I need to
Allow:
www.mysite.com
www.mysite.com/Blog/post ...and then everything below it.
Disallow:
www.mysite.com/Blog/ ..everything underneath EXCEPT www.mysite.com/Blog/post
How would I do that in a Robot.txt
Reason is that google is seeing massive duplication of my site due to tags
So,
/Blog?page=1 and /Blog?page=2 & /Blog?page=3 are seen as the same
/Blog?Tag=BlahBlah is seen /Blog?Tag=DohDoh are seen as the same.
I just want google to read my main site www.mysite.com and then under read nothing under www.mysite.com/Blog EXCEPT www.mysite.com/Blog/post ..and below.
Is there a better way to do this outside of Robot.txt. All post under www.mysite.com/Blog/post so for example www.mysite.com/Blog/post/My-Blog-Post.aspx is Canonicalized fine.
Thanks
There is no problem to use
User-agent: *
Disallow: /blog?Tag
Disallow: /blog?page
User-agent: *
Disallow: /blog?Tag
Disallow: /blog?page
ASKER
How about
User-agent: *
Allow: /Blog/post
Allow: /Blog/post/
Disallow: /Blog
Disallow: /Blog/
Is this ok syntax wise
User-agent: *
Allow: /Blog/post
Allow: /Blog/post/
Disallow: /Blog
Disallow: /Blog/
Is this ok syntax wise
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks very much. I have gone for
User-agent: *
Allow: /Blog/post/
Disallow: /Blog
User-agent: *
Allow: /Blog/post/
Disallow: /Blog
Disallow: /Blog/ -> URLs matching /Blog/ would be disallowed for Googlebot (and everything in this directory).
Allow: /Blog/post -> URLs matching /Blog/post would be allowed for Googlebot (and everything in this directory).
the answer depends of your requirement, if u want to allow blog/post but disallow blog then use the 1st case, other wise use the 2nd one.