Link to home
Start Free TrialLog in
Avatar of danielparkerNZ
danielparkerNZ

asked on

How does EE let Google Index its Solutions, but not allow users to see it without login.

Hi,

I am in the process of building a website that I want Google to index, but I want the user to be asked to login if they click through on the link.

How does Experts Exchange do this? The only way I can think of so far it they are looking at the User agent, and if it is Googlebot, they are then looking at the IP range to ensure that it is Google, and if these two conditions are met, they are letting them view the full page, otherwise the user is asked to login.

The problem with this method is is that I will need to know all of the IP ranges Google comes from.. So I guess my questions are:

Is this the method EE uses?

If not, what is a better method for doing this?

If it is, does anyone know what IP ranges Googles Indexers use?

Thanks
Daniel
Avatar of gangwisch
gangwisch

they most likely use the http_refer meaning that if the refer="google.com" then display this html
SOLUTION
Avatar of Julian Matz
Julian Matz
Flag of Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
its simple! and does not involve the ip address!
if it where to involve the ip address what would happen if the site was indexed by MSN or yahoo?

it would be done using php and cookies!

                        You click the link in google
                                         |
                      ---------------------------------
                                          |
                                   Goto Website
                                           |
                        Check for cookie saying user is active            
                            |                                  |
                 User Active                        User Not Active
                      |                                         |
                      |                               Show Question only      
                      |                                         |
                      |                        User Enters Username And Password
                        \                                       /
                          \                                   /
                          Show Question and Answers  

Well anyway this is the kind of structure id use. When the user clicks the link in google it then see's if the user is active if they are then shows question and answer. if not then shows only the question and whe nthe user logs in it shows the question and answer and sets the cookie to say the user is active!                
Also: because when googles spiders scan the page they are not active users they cannot see the page answers!
Hi John-Bayles,

<< because when googles spiders scan the page they are not active users they cannot see the page answers!
That was the author's question... How to let Google see and index the site properly.

It can be done by checking the hostname and automatically setting the user (Googlebot) active. Obviously, Googlebot cannot literally login, so you can write some php: check hostname, if Google, automatically create an active session and keep it live for a certain period of time:

$IP = isset($_SERVER['REMOTE_ADDR']) ? $_SERVER['REMOTE_ADDR'] : ''
$hostname = gethostbyaddr($hostname);

// I'm not an expert with regular expressions, am just trying to show an example...
if (eregi("[Googelbot]",$hostname)) {
 session_start();
 $_SESSION['username'] = 'Googlebot';
}

if (!isset($_SESSION['username'])) {
 header ("Location: login.php");
 exit;
 // this redirects to login.php, but you can place here whatever code should be executed if there is no active session
}
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of danielparkerNZ

ASKER

It seams to me that it is likely EE use both a reverse IP lookup, along with checking the Useragent..

After doing alot of reserch on this, this seams to be Cloaking.. and against Googles Terms of Service.

If it is cloaking, how are EE not breaking Google's TOS, and not getting banned for the index?
How have you come to this conclusion?

Have you ever looked at Google's cache of the site? The cache looks pretty much like the site would if I wasn't logged in. The answers are there, but you have to do a lot of scrolling to see it... And of course, there's the Intellitext ads...
Ok, not all solutions have the answers without being logged in, but a lot of them do...
Hmm.. I wasn't aware of that. Well I guess they are not Cloaking then. I hadn't checked the cache.