Solved

robots.txt

Posted on 2012-04-12
2
279 Views
Last Modified: 2012-04-13
I have created a cache.mydomain that I use to spead up page load times and is a complete copy of the live site.
However, Google has started to index this site instead of my live site.
Is there an easy way to stop this happening? Using robots.txt maybe?

Thanks
0
Comment
Question by:d1114170
2 Comments
 
LVL 17

Assisted Solution

by:Anuroopsundd
Anuroopsundd earned 250 total points
ID: 37837229
The "/robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
In this example, three directories are excluded.

Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records.

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:


When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

http://www.robotstxt.org/robotstxt.html
0
 
LVL 15

Accepted Solution

by:
Ess Kay earned 250 total points
ID: 37837663
add a robots.txt file

with the following

User-agent: *
Disallow: /

Open in new window

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

E-commerce is quite a gambling world, and you should never entrust your business to a lucky chance. In order to outrun your competitors in a race to attract as many customers as possible, you need to have a well thought-out strategy under your belt.…
SSL stands for “Secure Sockets Layer” and an SSL certificate is a critical component to keeping your website safe, secured, and compliant. Any ecommerce website must have an SSL certificate to ensure the safe handling of sensitive information like…
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now