Solved

Robot.txt

Posted on 2013-06-25
10
611 Views
Last Modified: 2013-06-25
Hi,

Can somebody please explain what this is telling google to do in the Robot.txt

User-agent: *
Disallow: /Blog/
Allow: /Blog/post

Is it valid?

Would (notice the line switch)

User-agent: *
Allow: /Blog/post
Disallow: /Blog/

be better?
0
Comment
Question by:nutnut
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 39274116
user-agent * -> An entry that applies to all bots
Disallow: /Blog/ -> URLs matching /Blog/ would be disallowed for Googlebot (and everything in this directory).
Allow: /Blog/post -> URLs matching /Blog/post would be allowed for Googlebot (and everything in this directory).

the answer depends of your requirement, if u want to allow blog/post but disallow blog then use the 1st case, other wise use the 2nd one.
0
 
LVL 9

Expert Comment

by:TvMpt
ID: 39274117
Both valid but in order to be compatible to all robots it is necessary to place the Allow directive(s) first, followed by the Disallow, for example:

User-agent: *
Allow: /Blog/post
Disallow: /Blog/
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39274118
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 9

Expert Comment

by:TvMpt
ID: 39274119
The order is only important to robots that follow the standard; in the case of the Google or Bing bots, the order is not important.
0
 
LVL 22

Expert Comment

by:Om Prakash
ID: 39274120
Here is a reference on robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449

in this case, only the URLs matching /Blog/ would be disallowed for Googlebot.

User-agent: *
Disallow: /Blog/
Allow: /Blog/post
0
 

Author Comment

by:nutnut
ID: 39274131
Hi thanks for the responses.

So I need to

Allow:
www.mysite.com
www.mysite.com/Blog/post ...and then everything below it.

Disallow:
www.mysite.com/Blog/ ..everything underneath EXCEPT www.mysite.com/Blog/post

How would I do that in a Robot.txt

Reason is that google is seeing massive duplication of my site due to tags

So,

/Blog?page=1 and /Blog?page=2 & /Blog?page=3 are seen as the same
/Blog?Tag=BlahBlah is seen /Blog?Tag=DohDoh are seen as the same.  

I just want google to read my main site www.mysite.com and then under read nothing under  www.mysite.com/Blog EXCEPT www.mysite.com/Blog/post ..and below.

Is there a better way to do this outside of Robot.txt.  All post under www.mysite.com/Blog/post so for example www.mysite.com/Blog/post/My-Blog-Post.aspx is Canonicalized fine.

Thanks
0
 
LVL 9

Expert Comment

by:TvMpt
ID: 39274224
There is no problem to use

User-agent: *
Disallow: /blog?Tag
Disallow: /blog?page
0
 

Author Comment

by:nutnut
ID: 39274237
How about

User-agent: *
Allow: /Blog/post
Allow: /Blog/post/
Disallow: /Blog
Disallow: /Blog/

Is this ok syntax wise
0
 
LVL 9

Accepted Solution

by:
TvMpt earned 500 total points
ID: 39274255
Example:
If you end with a "/" then it will specify that as the match.
That means this;
   Disallow: /wp-includes/
will block these;
   Disallow: /wp-includes/this.html
   Disallow: /wp-includes/that.php
   Disallow: /wp-includes/thisstoo.jpg
   Disallow: /wp-includes/here/here2/anythinginhere.aswell
etc.

If you use this;
   Disallow: /wp-includes
(without the / at the end)
then it would not only block the above, but also
   Disallow: /wp-includes-this
   Disallow: /wp-includesplusthis
   Disallow: /wp-includes-thistoo-andthebelow
   Disallow: /wp-includess

Open in new window


Did you see the difference? :)

Try using the robots.txt testing tool in Google WebMaster Tools.
0
 

Author Closing Comment

by:nutnut
ID: 39274262
Thanks very much. I have gone for

User-agent: *
Allow: /Blog/post/
Disallow: /Blog
0

Featured Post

Building an interactive eFuture classroom

Watch and learn how ATEN provided a total control system solution including seamless switching matrix switch, HDBaseT extenders, PDU, lighting control to build an interactive eFuture classroom.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Lease-to-own eliminates the expenditure of hardware replacement and allows you to pay off the server over time. Usually, this is much cheaper than leasing servers. Think of lease-to-own as credit without interest.
There are cases when e.g. an IT administrator wants to have full access and view into selected mailboxes on Exchange server, directly from his own email account in Outlook or Outlook Web Access. This proves useful when for example administrator want…
In this video, viewers are given an introduction to using the Windows 10 Snipping Tool, how to quickly locate it when it's needed and also how make it always available with a single click of a mouse button, by pinning it to the Desktop Task Bar. Int…
Suggested Courses

626 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question