Solved

development site is indexed by google even though behind htpasswd

Posted on 2016-11-11
7
77 Views
Last Modified: 2016-11-15
As stated in the title. My development site is protected with an htpasswd, yet somehow google is indexing it. how is that possible?
0
Comment
Question by:jblayney
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
7 Comments
 
LVL 14

Expert Comment

by:Phil Phillips
ID: 41884313
htpasswd should prevent Google from accessing the site, though you might want to double check that you have the config set up to lock down all URLs. In the web logs, do you still see Google accessing it with successful response codes?

Also, it could have just indexed a previous version. If that's the case, it'll take some time for it to age out.  To speed up the process, you can login to Google webmaster tools and remove the site from the index.
0
 
LVL 27

Expert Comment

by:Dr. Klahn
ID: 41884320
Update robots.txt to exclude all robots on that site:

User-agent: *
Disallow: /

Open in new window


I've noticed that Googlebot occasionally visits my sites with faked browser credentials to avoid complying with robots.txt, often enough that I wrote a mod_rewrite rule to block the behavior. So I'd also throw in an exclusion rule for the Googlebot IP ranges:

# Googlebot's various /24 blocks
66.249.64.0/22
66.249.69.0/24
66.249.73.0/24
66.249.79.0/24

Open in new window

0
 
LVL 1

Author Comment

by:jblayney
ID: 41886558
thanks for responding, you mean this?

Order Allow,Deny
Deny from 66.249.64.0/22
Deny from 66.249.69.0/24
Deny from 66.249.73.0/24
Deny from 66.249.79.0/24
Allow from all

Open in new window

0
Building an interactive eFuture classroom

Watch and learn how ATEN provided a total control system solution including seamless switching matrix switch, HDBaseT extenders, PDU, lighting control to build an interactive eFuture classroom.

 
LVL 1

Author Comment

by:jblayney
ID: 41886559
Phil,

where do I do this?
In the web logs, do you still see Google accessing it with successful response codes?
0
 
LVL 14

Assisted Solution

by:Phil Phillips
Phil Phillips earned 250 total points
ID: 41886654
It depends how you have Apache configured to store your logs. If you're on Linux, a common default place is: /var/log/httpd or /var/log/apache2
0
 
LVL 27

Accepted Solution

by:
Dr. Klahn earned 250 total points
ID: 41887051
thanks for responding, you mean this?

(... list of IP blocks)

That's the one.  It can be done in the main config file or in an htaccess file.  If you don't want googlebot in the site at all, it is more efficient to do it in iptables which blocks the request before it gets to Apache.
0
 
LVL 1

Author Closing Comment

by:jblayney
ID: 41888001
thank you
0

Featured Post

What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Rewrite Rule head scratcher 18 47
Losing default program file association 13 79
How to fix Dual Server Conflict GitLab vs Apache2 3 55
use .htaccess to re-write URL 3 22
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Lease-to-own eliminates the expenditure of hardware replacement and allows you to pay off the server over time. Usually, this is much cheaper than leasing servers. Think of lease-to-own as credit without interest.
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
Many of my clients call in with monstrous Gmail overloading issues with Outlook. A quick tip is to turn off the All Mail and Important folders from synching. Here is a quick video I made to show you how to turn off these and other folders in Gmail s…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question