Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Dynamic robots.txt  to block SE's from https versions.

Posted on 2009-04-02
3
636 Views
Last Modified: 2013-12-08
Hi,
We have a site as site as http://www.xxx.com and also have the secured URL https://secure.xxx.com which serves the same contents of http://www.xxx.com.  We already have a robots.txt file in the root directory http://www.xxx.com which also serves for https://secure.xxx.com. What we need is to block Google(and other se's) from crawling https versions. So we are planning to create a new robots file with code for not to crawl https versions.
          We got a chance to see this link, http://www.kleenecode.net/2007/11/17/dynamic-robotstxt-with-aspnet-20/
, it is very helpful because they are telling we can create different robots.txt file for http and https. And also you can see a code for Robots.txt to prevent from crawling to https versions. So to implement this concept, we have done the following steps as the instructions in the post.
1) Create a Robots.txt with the given code in the root of our web project.
 After that for checking, i browse robots.txt file, then i can see the full code as i typed.
2)To get the path to the ASPX engine,
    a) Open IIS and right click on our website and bring up the properties screen
    b) Go to Home Directory > Configuration. then got Mappings Tab
    c) Locate the ASPX item and click Edit - Copy the path in the Executable Field and cancel out of that window
 3) Create the ISAPI entry for .txt .
     a) Still on "Mappings Tab"
     b) Click "Add"
     c) Populate the Executable path with the value I copied in the last section    
     d) Enter GET in the Limit To field
     e) Enter ".txt" in the "Extension"  field(they are not specifying this one)
     f) Press "OK" to save all my changes.
Next for checking the result , they are telling when we browse robots.txt file , will get a blank page. But we are not getting blank page, we got a page with the full code as i typed.
Could anyone help me to resolve this? If anyone t-ry this dynamic creation of robots.txt, just share their experience also.
Thanks in advance for any help
0
Comment
Question by:olmuser
3 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 24067222
Are you adicted to IIS?
With apache you can use mod_rewrite to server different robots.txt
0
 
LVL 23

Accepted Solution

by:
Tony McCreath earned 250 total points
ID: 24070089
Your configuration lets you intercept requests in Asp.Net.

So what you need to do now is write the code that intercepts request to robots.txt and dynamically writes the response.

On e way to do this is to write and register an Http Module (IHttpModule).

Here's some code I quickly whipped up that should do what you want. secure robots.txt request will contain the data from a secure-robots.txt file.

// web.config for registering a module
<configuration>
	<system.web>
		<httpModules>
			<add type="FileMapperModule" name="FileMapperModule"/>
		</httpModules>
	</system.web>
</configuration>
 
// module that intercepts secure robots.txt requests
public class FileMapperModule : IHttpModule
{
    public void Init(System.Web.HttpApplication Appl)
    {
        Appl.BeginRequest += new System.EventHandler(Rewrite_BeginRequest); // interecept requests
    }
	
	public void Rewrite_BeginRequest(object sender, System.EventArgs args)
    {
        HttpApplication Appl = (HttpApplication)sender;
 
		// if its a secure request for the robots.txt file 
		if (Appl.Request.AppRelativeCurrentExecutionFilePath.StartsWith("~/robots.txt") && Appl.Request.IsSecureConnection)
		{
			Appl.Context.RewritePath("~/secure-robots.txt); // return the content of secure-robots.txt
		}
	}
}

Open in new window

0

Featured Post

Connect further...control easier

With the ATEN CE624, you can now enjoy a high-quality visual experience powered by HDBaseT technology and the convenience of a single Cat6 cable to transmit uncompressed video with zero latency and multi-streaming for dual-view applications where remote access is required.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Because your company can’t afford for you to make SEO mistakes, you’ll want to ensure you’re taking the right steps each and every time you post a new piece of content. This list of optimization do’s and don’ts can help you become an SEO wizard.
There’s a good reason for why it’s called a homepage – it closely resembles that of a physical house and the only real difference is that it’s online. Your website’s homepage is where people come to visit you. It’s the family room of your website wh…
This tutorial walks through the best practices in adding a local business to Google Maps including how to properly search for duplicates, marker placement, and inputing business details. Login to your Google Account, then search for "Google Mapmaker…
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question