Solved

Parsing the HTTP REFERER variable to detect the type of refering site

Posted on 2004-07-31
9
1,295 Views
Last Modified: 2008-03-06
Hello!

I collect the visits to my web site, storing all HTTP REFERER variables into a database.

Now from this variable, I would like to detect if the referer is :

- a newsgroup

- Google

- another search engine than Google

- a direct access (url typed in, or a browser bookmark)

- a web site other than my web site and other than a search engine

- my web site

Any clue?

Regards
Stephane
0
Comment
Question by:stephaneeybert
  • 5
  • 4
9 Comments
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11685161
Im not sure what you are asking for.

The referrer is just a string containing the url where the link to your page was clicked. if the address was entered directly in the address bar or was click in favourites then teh referrer is empty; otherwise it is the url where the link was clicked.  I don't believe there is an detail beyond that, so yuo would just have to paresr the url to get the site name.

Cd&
0
 

Author Comment

by:stephaneeybert
ID: 11685879
Hello,

Thanks for the comment. In fact I know that. My question is about how to parse it, what to look for, to get the informations I want...

Cheers
Stephane
0
 

Author Comment

by:stephaneeybert
ID: 11685928
What logic to put in the parsing to get the details I need...
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11685959
There are so many possible variations on Urls that parsing out specifics will almost require a regular experession for just about each instance tht you are looking for.  What is it you are trying to parse out?  Why you not just use the whole url?

Cd&
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:stephaneeybert
ID: 11687003
I'm doing a web site and I made a page to show the visitors and visits statistics.
The page shows which browsers are being used, which operating systems, how many visitors and visits per month... Now I would like to complete the work by adding in the page, where the visitors came from when visiting the web site. I would like to display how many came from:
- a newsgroup

- Google

- another search engine than Google

- a direct access (url typed in, or a browser bookmark)

- a web site other than my web site and other than a search engine

- my web site

And I'll display it with a graph (that part I know how to do).

The only thing that is hard for me to do, is how to parse the urls, with regular expressions, to retrieve the matches against the 6 options listed before.

Say, how to parse the url to check if it comes from a newsgroup, and if not, if it comes from Google...

Regards
Stephane
0
 

Author Comment

by:stephaneeybert
ID: 11687006
Doing a regular expression for each option that I am looking for is fine with me. Only, I'm no good with regular expressions...
0
 
LVL 53

Accepted Solution

by:
COBOLdinosaur earned 125 total points
ID: 11688322
You don't understand ther are hundreds of thousands of news groups maybe over a million.
There is nothing that indicates they are a news group.

There are thousands of search engines, and there ae seach engines that serach otther search engines.  There is nothing that indicates they are a search engine.

You would need to have a database with the names of all the news groups and all the search engines and test against that.  Even if you keep it to a short list there is still problem. consider Yahoo.  what does Yahoo.com tell you?  They have a search engine.  They also have nes groups.  They have email which might contain links.  They host virtual domains.  Google is the same way.  Google.com==search || ==gmail || ==newsgroup.

What I would suggest is that you look at the urls of specific sites you want to track and then you just have to do a simple substring search for them, and you won't need regexp:

In JavaScript I would do something like this:

if (referrer=='')
alert('this is an unknown site')
if (referrer.toLowerCase().indexOf('google.com') !=-1)
alert ('this is google');
else if (referrer.toLowerCase().indexOf('yahoo.com') !=-1)
alert ('this is yahoo');
else if (referrer.toLowerCase().indexOf('yoursite.com') !=-1)
alert ('this is yoursite');
else
alert('this is another site');

Cd&
0
 

Author Comment

by:stephaneeybert
ID: 11688355
Yeah, I started doing a strstr() search for the Google case.

I'll do the same with the web site hostname for the internal hits.

Thanks anyway

Cheers

Steph
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11688560
Glad I could help.  Thanks for the A. :^)

Cd&
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Preface In the first article: A Better Website Login System (http://www.experts-exchange.com/A_2902.html) I introduced the EE Collaborative Login System and its intended purpose. In this article I will discuss some of the design consideratio…
Shoutout to Emily Plummer (http://www.experts-exchange.com/members/eplummer26.html) for giving me this article! She did most of it, I just finished it up and posted it for her :)    Introduction In a previous article (http://www.experts-exchang…
Viewers will learn about basic arrays, how to declare them, and how to use them. Introduction and definition: Declare an array and cover the syntax of declaring them: Initialize every index in the created array: Example/Features of a basic arr…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now