Solved

Robot.txt

Posted on 2013-06-25
10
606 Views
Last Modified: 2013-06-25
Hi,

Can somebody please explain what this is telling google to do in the Robot.txt

User-agent: *
Disallow: /Blog/
Allow: /Blog/post

Is it valid?

Would (notice the line switch)

User-agent: *
Allow: /Blog/post
Disallow: /Blog/

be better?
0
Comment
Question by:nutnut
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 39274116
user-agent * -> An entry that applies to all bots
Disallow: /Blog/ -> URLs matching /Blog/ would be disallowed for Googlebot (and everything in this directory).
Allow: /Blog/post -> URLs matching /Blog/post would be allowed for Googlebot (and everything in this directory).

the answer depends of your requirement, if u want to allow blog/post but disallow blog then use the 1st case, other wise use the 2nd one.
0
 
LVL 9

Expert Comment

by:TvMpt
ID: 39274117
Both valid but in order to be compatible to all robots it is necessary to place the Allow directive(s) first, followed by the Disallow, for example:

User-agent: *
Allow: /Blog/post
Disallow: /Blog/
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39274118
0
 
LVL 9

Expert Comment

by:TvMpt
ID: 39274119
The order is only important to robots that follow the standard; in the case of the Google or Bing bots, the order is not important.
0
 
LVL 22

Expert Comment

by:Om Prakash
ID: 39274120
Here is a reference on robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449

in this case, only the URLs matching /Blog/ would be disallowed for Googlebot.

User-agent: *
Disallow: /Blog/
Allow: /Blog/post
0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 

Author Comment

by:nutnut
ID: 39274131
Hi thanks for the responses.

So I need to

Allow:
www.mysite.com
www.mysite.com/Blog/post ...and then everything below it.

Disallow:
www.mysite.com/Blog/ ..everything underneath EXCEPT www.mysite.com/Blog/post

How would I do that in a Robot.txt

Reason is that google is seeing massive duplication of my site due to tags

So,

/Blog?page=1 and /Blog?page=2 & /Blog?page=3 are seen as the same
/Blog?Tag=BlahBlah is seen /Blog?Tag=DohDoh are seen as the same.  

I just want google to read my main site www.mysite.com and then under read nothing under  www.mysite.com/Blog EXCEPT www.mysite.com/Blog/post ..and below.

Is there a better way to do this outside of Robot.txt.  All post under www.mysite.com/Blog/post so for example www.mysite.com/Blog/post/My-Blog-Post.aspx is Canonicalized fine.

Thanks
0
 
LVL 9

Expert Comment

by:TvMpt
ID: 39274224
There is no problem to use

User-agent: *
Disallow: /blog?Tag
Disallow: /blog?page
0
 

Author Comment

by:nutnut
ID: 39274237
How about

User-agent: *
Allow: /Blog/post
Allow: /Blog/post/
Disallow: /Blog
Disallow: /Blog/

Is this ok syntax wise
0
 
LVL 9

Accepted Solution

by:
TvMpt earned 500 total points
ID: 39274255
Example:
If you end with a "/" then it will specify that as the match.
That means this;
   Disallow: /wp-includes/
will block these;
   Disallow: /wp-includes/this.html
   Disallow: /wp-includes/that.php
   Disallow: /wp-includes/thisstoo.jpg
   Disallow: /wp-includes/here/here2/anythinginhere.aswell
etc.

If you use this;
   Disallow: /wp-includes
(without the / at the end)
then it would not only block the above, but also
   Disallow: /wp-includes-this
   Disallow: /wp-includesplusthis
   Disallow: /wp-includes-thistoo-andthebelow
   Disallow: /wp-includess

Open in new window


Did you see the difference? :)

Try using the robots.txt testing tool in Google WebMaster Tools.
0
 

Author Closing Comment

by:nutnut
ID: 39274262
Thanks very much. I have gone for

User-agent: *
Allow: /Blog/post/
Disallow: /Blog
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article shows how to use the open source plupload control to upload multiple images. The images are resized on the client side before uploading and the upload is done in chunks. Background I had to provide a way for user…
Lync server 2013 or Skype for business Backup Service Error ID 4049 – After File Share Migration
This Micro Tutorial will give you a basic overview how to record your screen with Microsoft Expression Encoder. This program is still free and open for the public to download. This will be demonstrated using Microsoft Expression Encoder 4.
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now