Solved

Help writing a few rows perl.

Posted on 1997-09-11
10
152 Views
Last Modified: 2013-12-25
I'm trying to write a simpel cgi-script and need some help.

I need to split {'HTTP_USER_AGENT'} into two variabels $browser and $operatingsystem.
(For exampel $browser=Monzilla/3.01 and $operatingsystem=Linux )

Can someone help me??
0
Comment
Question by:pucko
10 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 1830264
I don't see a consistent way to do it which accounts for all the different ways
that different browsers can present that information (if they present it at all)
But something like this looks like it might work for many of them:

($browser,$operatingsystem) = ($ENV{'HTTP_USER_AGENT'} =~ /([^(]+?)\s*(\(.*\)|\b[^(\s)]*)$/);


0
 

Expert Comment

by:mgjv
ID: 1830265
Well, it's not that simple, unfortunately. Every browser sends a string (or should) which can look basically any way they want it. Things like:

MSIE 3.02 spoofing as Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95) via Squid Cache version 1.0.20

Mozilla/3.01 (Win95; I) via Squid Cache version 1.0.22

Mozilla/3.02GoldC-KIT (Win95; I)

IBM WebExplorer DLL /v1.03

are all possible, and much weirder permutations.

You can probably see it's not that simple to extract the real browser name, version and operating system (if supplied at all) from this garbled mess.

Most of the time, the operating system with modifications can be found between the two first brackets ().

If you try something like:

($browser, $OS) = /^([^\(])\(([^\))/;

This will put everything before the first ( into $browser, and everything from that bracket up to the first \) into $OS.

Of course, you are still left with all the garbage to get rid off.

The only way to really do this reliably is to check for all weirdness (strip off spoofing stuff, etc), and check the OS string for some keywords.

I have a little statistics script here that i run over the browser logs now and again, to see what the breakdown of browsers visiting our site is. I could probably post it here, although it might be slightly too large for that.

But remember, because of all the work that needs to be done to strip out this information, you might not want to run this on every access you get to your page, or your ISP might get very cross with you.
0
 

Expert Comment

by:mgjv
ID: 1830266
Heh,

I see that ozo was sort of typing the same as I was typing at the same time :)
0
 
LVL 1

Author Comment

by:pucko
ID: 1830267
OK! If I change my question to this:

The only operatingsystem I'm intrested in is: Win95, WinNT Win3.X Mac, Linux and SunOS. Is it possible to solve the problem now???
How??
0
 

Expert Comment

by:mgjv
ID: 1830268
Same answer. The environment variable REMOTE_USER is part of the CGI specification, and should be available to all CGI applications, regardless of operating system.

Having said that: There might be some broken web servers out there that don't bother setting it. I don't know if Mac does this, but perl should deal with the specifics of the operating system for you.

Or if you mean that you are only interested in browsers on those specific platforms: The answer stays the same. If you have a look at what exactly all those brwosers send, you will be able to see that there is no general standard. The only way to do it is to build an 'expert system'. This basically means that you have a whole bunch of if..elsif...else things to find out what exactly it is.

If you're only interested in a very limited set of browsers/operating systems, then all you need to do is look for a few identifiers. Shouldn't be too hard. The same regex would work, then split up the fields further.

As long as browser manufacturers don't use a standard format like

browsername/version (OS; modifiers) proxyline

it's hard to do anything sensible.

sorry it's not easier.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 84

Expert Comment

by:ozo
ID: 1830269
Well, if you're limiting the question to HTTP_USER_AGENT strings containing
Win95, WinNT Win3.X Mac, Linux and SunOS.
then a solution might be:
($preos,$os,$postos)= ( $ENV{'HTTP_USER_AGENT'} =~
/(.*?)(Win95|WinNT|Win3.X|Mac|Linux|SunOS)(.*)/) || ($ENV{'HTTP_USER_AGENT'},'UNKNOWN','');
Or if you have an explicit list of HTTP_USER_AGENT strings which are relevant to you,
you may be able to do something with that.
0
 

Expert Comment

by:mgjv
ID: 1830270
MS IE has versions that identify the OS as

(compatible; MSIE 3.02; Update a; AK; Windows 95)
(compatible; MSIE 3.01; Windows NT)

Old Netscape browsers do it like:

Mozilla/1.22 (Windows; I; 16bit)
Mozilla/1.1N (Macintosh; I; 68K)

So you'll need to look for a few more things than just the string above.. Like I said, probably needs a bit of an expert system.
0
 
LVL 84

Expert Comment

by:ozo
ID: 1830271
And some browsers allow the user to set the string to anything they choose.
(tinkering users might set any string in any browser)
Which raises the question of what do want the variables
$browser and $operatingsystem for?
e.g. if you're just setting defaults in a user menu,
and the user can explicitly select what they want anyway,
then maybe a half-assed job is sufficient?
Or if there are particular browser versions/operating systems
you wish to identify, then maybe you can explicitly list them?

0
 
LVL 4

Expert Comment

by:unicorntech
ID: 1830272
I agree with ozo. The only sure-fire way to do this is to allow the user to input the variables themselves through some sort of form interface.

This should solve the problem as most people will input the correct info for you.

There are no "standards" for this type of info so until it happens you will either have to scan each line and use else statements eg if this or if this or if this, etc....

To answer the question no-one can really help as there is no real way to do this unless you write a lot of code like shown.

Or alternatively use the form inputmethod....

Jason
0
 

Accepted Solution

by:
mgjv earned 20 total points
ID: 1830273
I just had a little look at a script I once wrote that I use here to do some stats on the browser log.

Mind you, it doesn't try to get an OS out of the string (in my opinion that is too much of a hassle) but it does give you a list of the following:

number of hits for every :

      user agent full string
      user agent without OS/proxy, etc. (full version)
      user agent by version (just numbers)
      user agent by type

I would never run this code in a CGI, because it is way too bulky in my opinion,

# NOTE: This code is based on someone else's code. Forgot who.

# NOTE 2: this is part of a bigger program, just two subroutines. Not meant to be run as is, but you can read thropugh them and get the idea.
sub read_log {
    my $agent_log = shift;
    my $rawagents = shift;
      my ($line, $Agent, $spoofer, $refscounter);

      if ( $agent_log =~ /\.gz$/ ) {
            open(AGENTLOG,"$GZCAT $agent_log |") ||
                  die "Can't open $agent_log: $!\n";
      }
      else {
            open(AGENTLOG,"$agent_log") ||
                  die "Can't open $agent_log: $!\n";
      }

      while (defined($line=<AGENTLOG>)) {

              (($. % 5000) == 0) && _verbose('.');
              $refscounter++;
              chomp $line;
            $line =~ s#\s+# #go;       # Fixes proxy info bug. Fix suggested by
                                                 # James Walter Martin III <jwm3@chubb.com>
            ($Agent) = $line =~ /^\S+\s+(.*)$/;

              $Agent = "Unknown" if (($Agent eq "-") || ($Agent eq ""));

              # Undo any URL encoding of user agent

              $Agent =~ tr/+/ /;
              $Agent =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

              # Despoofs WebTV spoofing as MSIE spoofing as Mozilla.
      
              if ($Agent =~ m#^\S+\s+(WebTV/\S+)#o) {
                  $Agent = "$1 spoofing as $Agent";
              }

              # despoofs people using pseudo-'standard' of 'compatible'

              if ($Agent =~ m#^Mozilla.*\(compatible; *([^;)]+)#oi) {
                 $spoofer =  $1;
                 $spoofer =~ s#/#-#og;
                 $spoofer =~ s/\W+$//o;
                 $Agent="$spoofer spoofing as $Agent";
              }

              # Lets not let children play with dangerous toys...

              $Agent =~ s#<#\&lt;#go;
              $Agent =~ s#\&#\&amp;#go;
              $Agent =~ s#>#\&gt;#go;
              $Agent =~ s#"#\&quot;#go;

              $rawagents->{$Agent}++;
      }
      return($refscounter);
}      

sub process_agents {
      my ($rawagents, $agentgroup, $agentversion, $baseagent) = @_;

      my ($base, $longagent, $agent, $name, $version);

      $^W = 0;
      foreach $agent (keys (%$rawagents)) {
            $longagent=$agent;

            ($base)         =  $longagent =~ m#^([^\(\[]+)#o;
            $base          =~ s#\s+$##o;
            $base          =~ s#via proxy.*$##ogi;

            ($name,$version) = $base =~ m#^([^\d\/]+)[\s\/vV]+(\d[\.\d]+)#o;
            ($name) = $base =~ m#^([^\d\/]+)#o if (! $name);
            $name =~ s#[-_]# #go;
            $name =~ s#\s+$##o;
            $name =~ s#^(NCSA Mosaic).*#NCSA Mosaic#oi;
            $name =~ s#MSIE#Microsoft Internet Explorer#oi;
            $name =~ s#.*surfbot.*#SurfBot#oi;
            $agentgroup->{$name} += $rawagents->{$agent};
            $agentversion->{"$name $version"} += $rawagents->{$agent};
            $baseagent->{$base} += $rawagents->{$agent};

            $all_browsers{"$name $version"} = 1;
      }
      $^W = 1;
}
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

The following is a collection of cases for strange behaviour when using advanced techniques in DOS batch files. You should have some basic experience in batch "programming", as I'm assuming some knowledge and not further explain the basics. For some…
In this tutorial I will aim to show you how simple is making a small application in WhizBase, how to add, remove and update data in the DB. I will make a small address book application where you can add, browse, update and remove addresses. I wi…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now