?
Solved

remove https from robots.txt file

Posted on 2010-01-11
13
Medium Priority
?
401 Views
Last Modified: 2013-12-24
From my robots.txt file, I need to prevent crawlers from indexing all pages that are https.  How would I code that?  If it means anything, I'm using coldfusion 8.
0
Comment
Question by:COwebmaster
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
13 Comments
 
LVL 8

Expert Comment

by:Jon500
ID: 26286513
The easiest way is to ensure that your http and https root folders have their own copy of robots.txt.

Do your https pages have their own root folder or web server?

Regards,
Jon
0
 
LVL 2

Expert Comment

by:xpert13
ID: 26286571
I think that you can't do by robots.txt
But you can use htaccess for this

This code must work:

RewriteCond %{HTTP_USER_AGENT} (Googlebot|Slurp|spider|Twiceler|heritrix|
	Combine|appie|boitho|e-SocietyRobot|Exabot|Nutch|OmniExplorer|
	MJ12bot|ZyBorg/1|Ask\ Jeeves|AskJeeves|ActiveTouristBot|
	JemmaTheTourist| agadine3|BecomeBot|Clustered-Search-Bot|
	MSIECrawler|freefind|galaxy|genieknows|INGRID|grub-client|
	MojeekBot|NaverBot|NetNose-Crawler|OnetSzukaj|PrassoSunner|
	Asterias\ Crawler|T-H-U-N-D-E-R-S-T-O-N-E|GeorgeTheTouristBot|
	VoilaBot|Vagabondo|fantomBro wser|stealthBrowser|cloakBrowser|
	fantomCrew\ Browser|Girafabot|Indy\ Library|Intelliseek|Zealbot|
	Windows\ 95|^Mozilla/4\.05\ \[en\]$|^Mozilla/4\.0$) [NC]
RewriteRule ^(https://.*)$ - [F]

Open in new window

0
 

Author Comment

by:COwebmaster
ID: 26287383
xpert13, what does your code do exactly?

Jon500, there is no seperate root folder or web folder.
0
Percona Live Europe 2017 | Sep 25 - 27, 2017

The Percona Live Open Source Database Conference Europe 2017 is the premier event for the diverse and active European open source database community, as well as businesses that develop and use open source database software.

 
LVL 2

Expert Comment

by:xpert13
ID: 26287428
It must deny for all search bots access to https links.

But I didn't test it.
0
 

Author Comment

by:COwebmaster
ID: 26287541
what if I do the following..

add to robots.txt:
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots\.txt$ robots_ssl.txt [L]

In robots_ssl.txt, add:
User-agent: *
Disallow: /

Should the above work?  Also, will all major search engines crawl the robots file though?
0
 
LVL 2

Accepted Solution

by:
xpert13 earned 1336 total points
ID: 26287627
Those lines:
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots\.txt$ robots_ssl.txt [L]

Need add to ".htaccess" (not to robots.txt). And it should work fine.
You can test it: https://your-site.com/robots.txt
0
 
LVL 2

Expert Comment

by:xpert13
ID: 26287660
"Also, will all major search engines crawl the robots file though?"
All search engines read robots.txt, but this file like recommendation, not a rule.


0
 

Author Comment

by:COwebmaster
ID: 26287669
I saw it here.. http://www.webmasterworld.com/google/3876287.htm (look at key_master).

So I'm assuming what it does is if any requested page is https, it points them to robots_ssl.txt?

Obviously, I want all my other normal http pages to be crawled, just not the https pages.  will this still work then?

0
 
LVL 2

Assisted Solution

by:xpert13
xpert13 earned 1336 total points
ID: 26287735
Yes. All https pages open by using 443 port. 


---

In .htaccess:
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots\.txt$ robots_ssl.txt [L]
This rule means, that all request from 443 port to file robots.txt must be  redirected to robots_ssl.txt.

0
 
LVL 36

Assisted Solution

by:SidFishes
SidFishes earned 664 total points
ID: 26288468
the rewrite might work for you but I'd not count on robots.txt as noted above it's a "suggestion"

The only way to properly protect secure pages is to protect with some kind of login session tracking ie: a spider can't crawl pages that require it to be logged in.

and also use this

<cfif cgi.SERVER_PORT NEQ "443">
disallow or cflocate to secure url
<cfelse>
allow
</cfif>





0
 

Author Comment

by:COwebmaster
ID: 26288864
The problem is Bing.com indexed a secured site of mine that does not exist. in other words, the site on http exists but not https.  why would they do that?  I'm trying to find preventive measures so that it doesn't happen in the future.  I don't want to buy another cert but prevent all SEs from indexing https pages.
0
 

Author Comment

by:COwebmaster
ID: 26294391
one last question on this.. If I block the crawlers from indexing https pages yet already have an existing page that has a google page rank of 4 that is currently on https, will that page get dinged by google and not rank as high?
0
 

Author Closing Comment

by:COwebmaster
ID: 31675684
Thanks!
0

Featured Post

Supports up to 4K resolution!

The VS192 2-Port 4K DisplayPort Splitter is perfect for anyone who needs to send one source of DisplayPort high definition video to two or four DisplayPort displays. The VS192 can split and also expand DisplayPort audio/video signal on two or four DisplayPort monitors.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Most ColdFusion developers get confused between the CFSet, Duplicate, and Structcopy methods of copying a Structure, especially which one to use when. This Article will explain the differences in the approaches with examples; therefore, after readin…
SSL stands for “Secure Sockets Layer” and an SSL certificate is a critical component to keeping your website safe, secured, and compliant. Any ecommerce website must have an SSL certificate to ensure the safe handling of sensitive information like…
This is my first video review of Microsoft Bookings, I will be doing a part two with a bit more information, but wanted to get this out to you folks.
In this video you will find out how to export Office 365 mailboxes using the built in eDiscovery tool. Bear in mind that although this method might be useful in some cases, using PST files as Office 365 backup is troublesome in a long run (more on t…

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question