How does EE let Google Index its Solutions, but not allow users to see it without login.

Hi,

I am in the process of building a website that I want Google to index, but I want the user to be asked to login if they click through on the link.

How does Experts Exchange do this? The only way I can think of so far it they are looking at the User agent, and if it is Googlebot, they are then looking at the IP range to ensure that it is Google, and if these two conditions are met, they are letting them view the full page, otherwise the user is asked to login.

The problem with this method is is that I will need to know all of the IP ranges Google comes from.. So I guess my questions are:

Is this the method EE uses?

If not, what is a better method for doing this?

If it is, does anyone know what IP ranges Googles Indexers use?

Thanks
Daniel
danielparkerNZAsked:
Who is Participating?
 
Julian MatzJoint ChairpersonCommented:
Another method would be to use the PHP get_browser() function
http://ie.php.net/manual/en/function.get-browser.php

$useragent = isset($_SERVER['USER_AGENT']) ? $_SERVER['USER_AGENT'] : '';
$browser_info = get_browser($useragent);

$Crawler = $browser_info['crawler'];

^^
Would return 'Googlebot' for Google, or 'msn' for MSN, etc.

This function depends on the freely available browsercap.ini...
0
 
gangwischCommented:
they most likely use the http_refer meaning that if the refer="google.com" then display this html
0
 
Julian MatzJoint ChairpersonCommented:
You do not need to know the IP range... You can do a hostname check. Convert the IP to hostname using reverseip or something and then check against

*.googlebot.com
0
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

 
Julian MatzJoint ChairpersonCommented:
This range belongs to Google:
66.249.64.0 - 66.249.95.255

I don't know how many other ranges, if any, they have... But your safest bet is to use the hostname. It should always be *.googlebot.com....
0
 
John-BaylesCommented:
its simple! and does not involve the ip address!
if it where to involve the ip address what would happen if the site was indexed by MSN or yahoo?

it would be done using php and cookies!

                        You click the link in google
                                         |
                      ---------------------------------
                                          |
                                   Goto Website
                                           |
                        Check for cookie saying user is active            
                            |                                  |
                 User Active                        User Not Active
                      |                                         |
                      |                               Show Question only      
                      |                                         |
                      |                        User Enters Username And Password
                        \                                       /
                          \                                   /
                          Show Question and Answers  

Well anyway this is the kind of structure id use. When the user clicks the link in google it then see's if the user is active if they are then shows question and answer. if not then shows only the question and whe nthe user logs in it shows the question and answer and sets the cookie to say the user is active!                
0
 
John-BaylesCommented:
Also: because when googles spiders scan the page they are not active users they cannot see the page answers!
0
 
Julian MatzJoint ChairpersonCommented:
Hi John-Bayles,

<< because when googles spiders scan the page they are not active users they cannot see the page answers!
That was the author's question... How to let Google see and index the site properly.

It can be done by checking the hostname and automatically setting the user (Googlebot) active. Obviously, Googlebot cannot literally login, so you can write some php: check hostname, if Google, automatically create an active session and keep it live for a certain period of time:

$IP = isset($_SERVER['REMOTE_ADDR']) ? $_SERVER['REMOTE_ADDR'] : ''
$hostname = gethostbyaddr($hostname);

// I'm not an expert with regular expressions, am just trying to show an example...
if (eregi("[Googelbot]",$hostname)) {
 session_start();
 $_SESSION['username'] = 'Googlebot';
}

if (!isset($_SESSION['username'])) {
 header ("Location: login.php");
 exit;
 // this redirects to login.php, but you can place here whatever code should be executed if there is no active session
}
0
 
danielparkerNZAuthor Commented:
It seams to me that it is likely EE use both a reverse IP lookup, along with checking the Useragent..

After doing alot of reserch on this, this seams to be Cloaking.. and against Googles Terms of Service.

If it is cloaking, how are EE not breaking Google's TOS, and not getting banned for the index?
0
 
Julian MatzJoint ChairpersonCommented:
How have you come to this conclusion?

Have you ever looked at Google's cache of the site? The cache looks pretty much like the site would if I wasn't logged in. The answers are there, but you have to do a lot of scrolling to see it... And of course, there's the Intellitext ads...
0
 
Julian MatzJoint ChairpersonCommented:
Ok, not all solutions have the answers without being logged in, but a lot of them do...
0
 
danielparkerNZAuthor Commented:
Hmm.. I wasn't aware of that. Well I guess they are not Cloaking then. I hadn't checked the cache.

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.