Solved

Parsing the HTTP REFERER variable to detect the type of refering site

Posted on 2004-07-31
9
1,293 Views
Last Modified: 2008-03-06
Hello!

I collect the visits to my web site, storing all HTTP REFERER variables into a database.

Now from this variable, I would like to detect if the referer is :

- a newsgroup

- Google

- another search engine than Google

- a direct access (url typed in, or a browser bookmark)

- a web site other than my web site and other than a search engine

- my web site

Any clue?

Regards
Stephane
0
Comment
Question by:stephaneeybert
  • 5
  • 4
9 Comments
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11685161
Im not sure what you are asking for.

The referrer is just a string containing the url where the link to your page was clicked. if the address was entered directly in the address bar or was click in favourites then teh referrer is empty; otherwise it is the url where the link was clicked.  I don't believe there is an detail beyond that, so yuo would just have to paresr the url to get the site name.

Cd&
0
 

Author Comment

by:stephaneeybert
ID: 11685879
Hello,

Thanks for the comment. In fact I know that. My question is about how to parse it, what to look for, to get the informations I want...

Cheers
Stephane
0
 

Author Comment

by:stephaneeybert
ID: 11685928
What logic to put in the parsing to get the details I need...
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11685959
There are so many possible variations on Urls that parsing out specifics will almost require a regular experession for just about each instance tht you are looking for.  What is it you are trying to parse out?  Why you not just use the whole url?

Cd&
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:stephaneeybert
ID: 11687003
I'm doing a web site and I made a page to show the visitors and visits statistics.
The page shows which browsers are being used, which operating systems, how many visitors and visits per month... Now I would like to complete the work by adding in the page, where the visitors came from when visiting the web site. I would like to display how many came from:
- a newsgroup

- Google

- another search engine than Google

- a direct access (url typed in, or a browser bookmark)

- a web site other than my web site and other than a search engine

- my web site

And I'll display it with a graph (that part I know how to do).

The only thing that is hard for me to do, is how to parse the urls, with regular expressions, to retrieve the matches against the 6 options listed before.

Say, how to parse the url to check if it comes from a newsgroup, and if not, if it comes from Google...

Regards
Stephane
0
 

Author Comment

by:stephaneeybert
ID: 11687006
Doing a regular expression for each option that I am looking for is fine with me. Only, I'm no good with regular expressions...
0
 
LVL 53

Accepted Solution

by:
COBOLdinosaur earned 125 total points
ID: 11688322
You don't understand ther are hundreds of thousands of news groups maybe over a million.
There is nothing that indicates they are a news group.

There are thousands of search engines, and there ae seach engines that serach otther search engines.  There is nothing that indicates they are a search engine.

You would need to have a database with the names of all the news groups and all the search engines and test against that.  Even if you keep it to a short list there is still problem. consider Yahoo.  what does Yahoo.com tell you?  They have a search engine.  They also have nes groups.  They have email which might contain links.  They host virtual domains.  Google is the same way.  Google.com==search || ==gmail || ==newsgroup.

What I would suggest is that you look at the urls of specific sites you want to track and then you just have to do a simple substring search for them, and you won't need regexp:

In JavaScript I would do something like this:

if (referrer=='')
alert('this is an unknown site')
if (referrer.toLowerCase().indexOf('google.com') !=-1)
alert ('this is google');
else if (referrer.toLowerCase().indexOf('yahoo.com') !=-1)
alert ('this is yahoo');
else if (referrer.toLowerCase().indexOf('yoursite.com') !=-1)
alert ('this is yoursite');
else
alert('this is another site');

Cd&
0
 

Author Comment

by:stephaneeybert
ID: 11688355
Yeah, I started doing a strstr() search for the Google case.

I'll do the same with the web site hostname for the internal hits.

Thanks anyway

Cheers

Steph
0
 
LVL 53

Expert Comment

by:COBOLdinosaur
ID: 11688560
Glad I could help.  Thanks for the A. :^)

Cd&
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

What is Node.js? Node.js is a server side scripting language much like PHP or ASP but is used to implement the complete package of HTTP webserver and application framework. The difference is that Node.js’s execution engine is asynchronous and event…
Introduction Knockoutjs (Knockout) is a JavaScript framework (Model View ViewModel or MVVM framework).   The main ideology behind Knockout is to control from JavaScript how a page looks whilst creating an engaging user experience in the least …
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now