• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 615
  • Last Modified:

Robot.txt

Hi,

Can somebody please explain what this is telling google to do in the Robot.txt

User-agent: *
Disallow: /Blog/
Allow: /Blog/post

Is it valid?

Would (notice the line switch)

User-agent: *
Allow: /Blog/post
Disallow: /Blog/

be better?
0
nutnut
Asked:
nutnut
  • 4
  • 3
  • 2
  • +1
1 Solution
 
Meir RivkinFull stack Software EngineerCommented:
user-agent * -> An entry that applies to all bots
Disallow: /Blog/ -> URLs matching /Blog/ would be disallowed for Googlebot (and everything in this directory).
Allow: /Blog/post -> URLs matching /Blog/post would be allowed for Googlebot (and everything in this directory).

the answer depends of your requirement, if u want to allow blog/post but disallow blog then use the 1st case, other wise use the 2nd one.
0
 
TvMptCommented:
Both valid but in order to be compatible to all robots it is necessary to place the Allow directive(s) first, followed by the Disallow, for example:

User-agent: *
Allow: /Blog/post
Disallow: /Blog/
0
 
Meir RivkinFull stack Software EngineerCommented:
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
TvMptCommented:
The order is only important to robots that follow the standard; in the case of the Google or Bing bots, the order is not important.
0
 
Om PrakashCommented:
Here is a reference on robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449

in this case, only the URLs matching /Blog/ would be disallowed for Googlebot.

User-agent: *
Disallow: /Blog/
Allow: /Blog/post
0
 
nutnutAuthor Commented:
Hi thanks for the responses.

So I need to

Allow:
www.mysite.com
www.mysite.com/Blog/post ...and then everything below it.

Disallow:
www.mysite.com/Blog/ ..everything underneath EXCEPT www.mysite.com/Blog/post

How would I do that in a Robot.txt

Reason is that google is seeing massive duplication of my site due to tags

So,

/Blog?page=1 and /Blog?page=2 & /Blog?page=3 are seen as the same
/Blog?Tag=BlahBlah is seen /Blog?Tag=DohDoh are seen as the same.  

I just want google to read my main site www.mysite.com and then under read nothing under  www.mysite.com/Blog EXCEPT www.mysite.com/Blog/post ..and below.

Is there a better way to do this outside of Robot.txt.  All post under www.mysite.com/Blog/post so for example www.mysite.com/Blog/post/My-Blog-Post.aspx is Canonicalized fine.

Thanks
0
 
TvMptCommented:
There is no problem to use

User-agent: *
Disallow: /blog?Tag
Disallow: /blog?page
0
 
nutnutAuthor Commented:
How about

User-agent: *
Allow: /Blog/post
Allow: /Blog/post/
Disallow: /Blog
Disallow: /Blog/

Is this ok syntax wise
0
 
TvMptCommented:
Example:
If you end with a "/" then it will specify that as the match.
That means this;
   Disallow: /wp-includes/
will block these;
   Disallow: /wp-includes/this.html
   Disallow: /wp-includes/that.php
   Disallow: /wp-includes/thisstoo.jpg
   Disallow: /wp-includes/here/here2/anythinginhere.aswell
etc.

If you use this;
   Disallow: /wp-includes
(without the / at the end)
then it would not only block the above, but also
   Disallow: /wp-includes-this
   Disallow: /wp-includesplusthis
   Disallow: /wp-includes-thistoo-andthebelow
   Disallow: /wp-includess

Open in new window


Did you see the difference? :)

Try using the robots.txt testing tool in Google WebMaster Tools.
0
 
nutnutAuthor Commented:
Thanks very much. I have gone for

User-agent: *
Allow: /Blog/post/
Disallow: /Blog
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

  • 4
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now