Posted on 2012-04-12
Medium Priority
Last Modified: 2012-04-13
I have created a cache.mydomain that I use to spead up page load times and is a complete copy of the live site.
However, Google has started to index this site instead of my live site.
Is there an easy way to stop this happening? Using robots.txt maybe?

Question by:d1114170
LVL 17

Assisted Solution

Anuroopsundd earned 1000 total points
ID: 37837229
The "/robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/
In this example, three directories are excluded.

Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records.

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".

What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:

When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.

For example, for "http://www.example.com/shop/index.html, it will remove the "/shop/index.html", and replace it with "/robots.txt", and will end up with "http://www.example.com/robots.txt".

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.

LVL 15

Accepted Solution

Ess Kay earned 1000 total points
ID: 37837663
add a robots.txt file

with the following

User-agent: *
Disallow: /

Open in new window


Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

WooCommerce is becoming the most powerful e-commerce plugin for Wordpress. And why not. The platform comprises of numerous core plugins that may come in handy, powerful options to make your website development task much easier.
In this age of digitization where the online market is increasingly becoming competitive each day, I’ll give you the truth bomb: simply putting your business out there is not enough. Sure, you’ve got impressive content and interesting graphic design.
This Micro Tutorial will demonstrate how to add subdomains to your content reports. This can be very importing in having a site with multiple subdomains.
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

587 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question