Expiring Today—Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Robot.txt

Posted on 2013-06-25
10
Medium Priority
?
612 Views
Last Modified: 2013-06-25
Hi,

Can somebody please explain what this is telling google to do in the Robot.txt

User-agent: *
Disallow: /Blog/
Allow: /Blog/post

Is it valid?

Would (notice the line switch)

User-agent: *
Allow: /Blog/post
Disallow: /Blog/

be better?
0
Comment
Question by:nutnut
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +1
10 Comments
 
LVL 42

Expert Comment

by:sedgwick
ID: 39274116
user-agent * -> An entry that applies to all bots
Disallow: /Blog/ -> URLs matching /Blog/ would be disallowed for Googlebot (and everything in this directory).
Allow: /Blog/post -> URLs matching /Blog/post would be allowed for Googlebot (and everything in this directory).

the answer depends of your requirement, if u want to allow blog/post but disallow blog then use the 1st case, other wise use the 2nd one.
0
 
LVL 9

Expert Comment

by:TvMpt
ID: 39274117
Both valid but in order to be compatible to all robots it is necessary to place the Allow directive(s) first, followed by the Disallow, for example:

User-agent: *
Allow: /Blog/post
Disallow: /Blog/
0
 
LVL 42

Expert Comment

by:sedgwick
ID: 39274118
0
Looking for a new Web Host?

Lunarpages' assortment of hosting products and solutions ensure a perfect fit for anyone looking to get their vision or products to market. Our award winning customer support and 30-day money back guarantee show the pride we take in being the industry's premier MSP.

 
LVL 9

Expert Comment

by:TvMpt
ID: 39274119
The order is only important to robots that follow the standard; in the case of the Google or Bing bots, the order is not important.
0
 
LVL 22

Expert Comment

by:Om Prakash
ID: 39274120
Here is a reference on robots.txt
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449

in this case, only the URLs matching /Blog/ would be disallowed for Googlebot.

User-agent: *
Disallow: /Blog/
Allow: /Blog/post
0
 

Author Comment

by:nutnut
ID: 39274131
Hi thanks for the responses.

So I need to

Allow:
www.mysite.com
www.mysite.com/Blog/post ...and then everything below it.

Disallow:
www.mysite.com/Blog/ ..everything underneath EXCEPT www.mysite.com/Blog/post

How would I do that in a Robot.txt

Reason is that google is seeing massive duplication of my site due to tags

So,

/Blog?page=1 and /Blog?page=2 & /Blog?page=3 are seen as the same
/Blog?Tag=BlahBlah is seen /Blog?Tag=DohDoh are seen as the same.  

I just want google to read my main site www.mysite.com and then under read nothing under  www.mysite.com/Blog EXCEPT www.mysite.com/Blog/post ..and below.

Is there a better way to do this outside of Robot.txt.  All post under www.mysite.com/Blog/post so for example www.mysite.com/Blog/post/My-Blog-Post.aspx is Canonicalized fine.

Thanks
0
 
LVL 9

Expert Comment

by:TvMpt
ID: 39274224
There is no problem to use

User-agent: *
Disallow: /blog?Tag
Disallow: /blog?page
0
 

Author Comment

by:nutnut
ID: 39274237
How about

User-agent: *
Allow: /Blog/post
Allow: /Blog/post/
Disallow: /Blog
Disallow: /Blog/

Is this ok syntax wise
0
 
LVL 9

Accepted Solution

by:
TvMpt earned 2000 total points
ID: 39274255
Example:
If you end with a "/" then it will specify that as the match.
That means this;
   Disallow: /wp-includes/
will block these;
   Disallow: /wp-includes/this.html
   Disallow: /wp-includes/that.php
   Disallow: /wp-includes/thisstoo.jpg
   Disallow: /wp-includes/here/here2/anythinginhere.aswell
etc.

If you use this;
   Disallow: /wp-includes
(without the / at the end)
then it would not only block the above, but also
   Disallow: /wp-includes-this
   Disallow: /wp-includesplusthis
   Disallow: /wp-includes-thistoo-andthebelow
   Disallow: /wp-includess

Open in new window


Did you see the difference? :)

Try using the robots.txt testing tool in Google WebMaster Tools.
0
 

Author Closing Comment

by:nutnut
ID: 39274262
Thanks very much. I have gone for

User-agent: *
Allow: /Blog/post/
Disallow: /Blog
0

Featured Post

Manage your data center from practically anywhere

The KN8164V features HD resolution of 1920 x 1200, FIPS 140-2 with level 1 security standards and virtual media transmissions at twice the speed. Built for reliability, the KN series provides local console and remote over IP access, ensuring 24/7 availability to all servers.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Most ColdFusion developers get confused between the CFSet, Duplicate, and Structcopy methods of copying a Structure, especially which one to use when. This Article will explain the differences in the approaches with examples; therefore, after readin…
A phishing scam that claims a recipient’s credit card details have been “suspended” is the latest trend in spoof emails.
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
Suggested Courses

718 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question