Help writing a few rows perl.

Posted on 1997-09-11
Last Modified: 2013-12-25
I'm trying to write a simpel cgi-script and need some help.

I need to split {'HTTP_USER_AGENT'} into two variabels $browser and $operatingsystem.
(For exampel $browser=Monzilla/3.01 and $operatingsystem=Linux )

Can someone help me??
Question by:pucko
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 84

Expert Comment

ID: 1830264
I don't see a consistent way to do it which accounts for all the different ways
that different browsers can present that information (if they present it at all)
But something like this looks like it might work for many of them:

($browser,$operatingsystem) = ($ENV{'HTTP_USER_AGENT'} =~ /([^(]+?)\s*(\(.*\)|\b[^(\s)]*)$/);


Expert Comment

ID: 1830265
Well, it's not that simple, unfortunately. Every browser sends a string (or should) which can look basically any way they want it. Things like:

MSIE 3.02 spoofing as Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95) via Squid Cache version 1.0.20

Mozilla/3.01 (Win95; I) via Squid Cache version 1.0.22

Mozilla/3.02GoldC-KIT (Win95; I)

IBM WebExplorer DLL /v1.03

are all possible, and much weirder permutations.

You can probably see it's not that simple to extract the real browser name, version and operating system (if supplied at all) from this garbled mess.

Most of the time, the operating system with modifications can be found between the two first brackets ().

If you try something like:

($browser, $OS) = /^([^\(])\(([^\))/;

This will put everything before the first ( into $browser, and everything from that bracket up to the first \) into $OS.

Of course, you are still left with all the garbage to get rid off.

The only way to really do this reliably is to check for all weirdness (strip off spoofing stuff, etc), and check the OS string for some keywords.

I have a little statistics script here that i run over the browser logs now and again, to see what the breakdown of browsers visiting our site is. I could probably post it here, although it might be slightly too large for that.

But remember, because of all the work that needs to be done to strip out this information, you might not want to run this on every access you get to your page, or your ISP might get very cross with you.

Expert Comment

ID: 1830266

I see that ozo was sort of typing the same as I was typing at the same time :)
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users


Author Comment

ID: 1830267
OK! If I change my question to this:

The only operatingsystem I'm intrested in is: Win95, WinNT Win3.X Mac, Linux and SunOS. Is it possible to solve the problem now???

Expert Comment

ID: 1830268
Same answer. The environment variable REMOTE_USER is part of the CGI specification, and should be available to all CGI applications, regardless of operating system.

Having said that: There might be some broken web servers out there that don't bother setting it. I don't know if Mac does this, but perl should deal with the specifics of the operating system for you.

Or if you mean that you are only interested in browsers on those specific platforms: The answer stays the same. If you have a look at what exactly all those brwosers send, you will be able to see that there is no general standard. The only way to do it is to build an 'expert system'. This basically means that you have a whole bunch of if..elsif...else things to find out what exactly it is.

If you're only interested in a very limited set of browsers/operating systems, then all you need to do is look for a few identifiers. Shouldn't be too hard. The same regex would work, then split up the fields further.

As long as browser manufacturers don't use a standard format like

browsername/version (OS; modifiers) proxyline

it's hard to do anything sensible.

sorry it's not easier.
LVL 84

Expert Comment

ID: 1830269
Well, if you're limiting the question to HTTP_USER_AGENT strings containing
Win95, WinNT Win3.X Mac, Linux and SunOS.
then a solution might be:
($preos,$os,$postos)= ( $ENV{'HTTP_USER_AGENT'} =~
/(.*?)(Win95|WinNT|Win3.X|Mac|Linux|SunOS)(.*)/) || ($ENV{'HTTP_USER_AGENT'},'UNKNOWN','');
Or if you have an explicit list of HTTP_USER_AGENT strings which are relevant to you,
you may be able to do something with that.

Expert Comment

ID: 1830270
MS IE has versions that identify the OS as

(compatible; MSIE 3.02; Update a; AK; Windows 95)
(compatible; MSIE 3.01; Windows NT)

Old Netscape browsers do it like:

Mozilla/1.22 (Windows; I; 16bit)
Mozilla/1.1N (Macintosh; I; 68K)

So you'll need to look for a few more things than just the string above.. Like I said, probably needs a bit of an expert system.
LVL 84

Expert Comment

ID: 1830271
And some browsers allow the user to set the string to anything they choose.
(tinkering users might set any string in any browser)
Which raises the question of what do want the variables
$browser and $operatingsystem for?
e.g. if you're just setting defaults in a user menu,
and the user can explicitly select what they want anyway,
then maybe a half-assed job is sufficient?
Or if there are particular browser versions/operating systems
you wish to identify, then maybe you can explicitly list them?


Expert Comment

ID: 1830272
I agree with ozo. The only sure-fire way to do this is to allow the user to input the variables themselves through some sort of form interface.

This should solve the problem as most people will input the correct info for you.

There are no "standards" for this type of info so until it happens you will either have to scan each line and use else statements eg if this or if this or if this, etc....

To answer the question no-one can really help as there is no real way to do this unless you write a lot of code like shown.

Or alternatively use the form inputmethod....


Accepted Solution

mgjv earned 20 total points
ID: 1830273
I just had a little look at a script I once wrote that I use here to do some stats on the browser log.

Mind you, it doesn't try to get an OS out of the string (in my opinion that is too much of a hassle) but it does give you a list of the following:

number of hits for every :

      user agent full string
      user agent without OS/proxy, etc. (full version)
      user agent by version (just numbers)
      user agent by type

I would never run this code in a CGI, because it is way too bulky in my opinion,

# NOTE: This code is based on someone else's code. Forgot who.

# NOTE 2: this is part of a bigger program, just two subroutines. Not meant to be run as is, but you can read thropugh them and get the idea.
sub read_log {
    my $agent_log = shift;
    my $rawagents = shift;
      my ($line, $Agent, $spoofer, $refscounter);

      if ( $agent_log =~ /\.gz$/ ) {
            open(AGENTLOG,"$GZCAT $agent_log |") ||
                  die "Can't open $agent_log: $!\n";
      else {
            open(AGENTLOG,"$agent_log") ||
                  die "Can't open $agent_log: $!\n";

      while (defined($line=<AGENTLOG>)) {

              (($. % 5000) == 0) && _verbose('.');
              chomp $line;
            $line =~ s#\s+# #go;       # Fixes proxy info bug. Fix suggested by
                                                 # James Walter Martin III <>
            ($Agent) = $line =~ /^\S+\s+(.*)$/;

              $Agent = "Unknown" if (($Agent eq "-") || ($Agent eq ""));

              # Undo any URL encoding of user agent

              $Agent =~ tr/+/ /;
              $Agent =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

              # Despoofs WebTV spoofing as MSIE spoofing as Mozilla.
              if ($Agent =~ m#^\S+\s+(WebTV/\S+)#o) {
                  $Agent = "$1 spoofing as $Agent";

              # despoofs people using pseudo-'standard' of 'compatible'

              if ($Agent =~ m#^Mozilla.*\(compatible; *([^;)]+)#oi) {
                 $spoofer =  $1;
                 $spoofer =~ s#/#-#og;
                 $spoofer =~ s/\W+$//o;
                 $Agent="$spoofer spoofing as $Agent";

              # Lets not let children play with dangerous toys...

              $Agent =~ s#<#\&lt;#go;
              $Agent =~ s#\&#\&amp;#go;
              $Agent =~ s#>#\&gt;#go;
              $Agent =~ s#"#\&quot;#go;


sub process_agents {
      my ($rawagents, $agentgroup, $agentversion, $baseagent) = @_;

      my ($base, $longagent, $agent, $name, $version);

      $^W = 0;
      foreach $agent (keys (%$rawagents)) {

            ($base)         =  $longagent =~ m#^([^\(\[]+)#o;
            $base          =~ s#\s+$##o;
            $base          =~ s#via proxy.*$##ogi;

            ($name,$version) = $base =~ m#^([^\d\/]+)[\s\/vV]+(\d[\.\d]+)#o;
            ($name) = $base =~ m#^([^\d\/]+)#o if (! $name);
            $name =~ s#[-_]# #go;
            $name =~ s#\s+$##o;
            $name =~ s#^(NCSA Mosaic).*#NCSA Mosaic#oi;
            $name =~ s#MSIE#Microsoft Internet Explorer#oi;
            $name =~ s#.*surfbot.*#SurfBot#oi;
            $agentgroup->{$name} += $rawagents->{$agent};
            $agentversion->{"$name $version"} += $rawagents->{$agent};
            $baseagent->{$base} += $rawagents->{$agent};

            $all_browsers{"$name $version"} = 1;
      $^W = 1;

Featured Post

Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I hope you'll find this tutorial useful and interesting. So let's try to extend Tcl with a new package.  For anyone more deeply interested please check out the book "Practical Programming in Tcl and Tk". It's really one of the best written books abo…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question