Help writing a few rows perl.

Posted on 1997-09-11
Last Modified: 2013-12-25
I'm trying to write a simpel cgi-script and need some help.

I need to split {'HTTP_USER_AGENT'} into two variabels $browser and $operatingsystem.
(For exampel $browser=Monzilla/3.01 and $operatingsystem=Linux )

Can someone help me??
Question by:pucko
LVL 84

Expert Comment

ID: 1830264
I don't see a consistent way to do it which accounts for all the different ways
that different browsers can present that information (if they present it at all)
But something like this looks like it might work for many of them:

($browser,$operatingsystem) = ($ENV{'HTTP_USER_AGENT'} =~ /([^(]+?)\s*(\(.*\)|\b[^(\s)]*)$/);


Expert Comment

ID: 1830265
Well, it's not that simple, unfortunately. Every browser sends a string (or should) which can look basically any way they want it. Things like:

MSIE 3.02 spoofing as Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95) via Squid Cache version 1.0.20

Mozilla/3.01 (Win95; I) via Squid Cache version 1.0.22

Mozilla/3.02GoldC-KIT (Win95; I)

IBM WebExplorer DLL /v1.03

are all possible, and much weirder permutations.

You can probably see it's not that simple to extract the real browser name, version and operating system (if supplied at all) from this garbled mess.

Most of the time, the operating system with modifications can be found between the two first brackets ().

If you try something like:

($browser, $OS) = /^([^\(])\(([^\))/;

This will put everything before the first ( into $browser, and everything from that bracket up to the first \) into $OS.

Of course, you are still left with all the garbage to get rid off.

The only way to really do this reliably is to check for all weirdness (strip off spoofing stuff, etc), and check the OS string for some keywords.

I have a little statistics script here that i run over the browser logs now and again, to see what the breakdown of browsers visiting our site is. I could probably post it here, although it might be slightly too large for that.

But remember, because of all the work that needs to be done to strip out this information, you might not want to run this on every access you get to your page, or your ISP might get very cross with you.

Expert Comment

ID: 1830266

I see that ozo was sort of typing the same as I was typing at the same time :)

Author Comment

ID: 1830267
OK! If I change my question to this:

The only operatingsystem I'm intrested in is: Win95, WinNT Win3.X Mac, Linux and SunOS. Is it possible to solve the problem now???

Expert Comment

ID: 1830268
Same answer. The environment variable REMOTE_USER is part of the CGI specification, and should be available to all CGI applications, regardless of operating system.

Having said that: There might be some broken web servers out there that don't bother setting it. I don't know if Mac does this, but perl should deal with the specifics of the operating system for you.

Or if you mean that you are only interested in browsers on those specific platforms: The answer stays the same. If you have a look at what exactly all those brwosers send, you will be able to see that there is no general standard. The only way to do it is to build an 'expert system'. This basically means that you have a whole bunch of if..elsif...else things to find out what exactly it is.

If you're only interested in a very limited set of browsers/operating systems, then all you need to do is look for a few identifiers. Shouldn't be too hard. The same regex would work, then split up the fields further.

As long as browser manufacturers don't use a standard format like

browsername/version (OS; modifiers) proxyline

it's hard to do anything sensible.

sorry it's not easier.
DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

LVL 84

Expert Comment

ID: 1830269
Well, if you're limiting the question to HTTP_USER_AGENT strings containing
Win95, WinNT Win3.X Mac, Linux and SunOS.
then a solution might be:
($preos,$os,$postos)= ( $ENV{'HTTP_USER_AGENT'} =~
/(.*?)(Win95|WinNT|Win3.X|Mac|Linux|SunOS)(.*)/) || ($ENV{'HTTP_USER_AGENT'},'UNKNOWN','');
Or if you have an explicit list of HTTP_USER_AGENT strings which are relevant to you,
you may be able to do something with that.

Expert Comment

ID: 1830270
MS IE has versions that identify the OS as

(compatible; MSIE 3.02; Update a; AK; Windows 95)
(compatible; MSIE 3.01; Windows NT)

Old Netscape browsers do it like:

Mozilla/1.22 (Windows; I; 16bit)
Mozilla/1.1N (Macintosh; I; 68K)

So you'll need to look for a few more things than just the string above.. Like I said, probably needs a bit of an expert system.
LVL 84

Expert Comment

ID: 1830271
And some browsers allow the user to set the string to anything they choose.
(tinkering users might set any string in any browser)
Which raises the question of what do want the variables
$browser and $operatingsystem for?
e.g. if you're just setting defaults in a user menu,
and the user can explicitly select what they want anyway,
then maybe a half-assed job is sufficient?
Or if there are particular browser versions/operating systems
you wish to identify, then maybe you can explicitly list them?


Expert Comment

ID: 1830272
I agree with ozo. The only sure-fire way to do this is to allow the user to input the variables themselves through some sort of form interface.

This should solve the problem as most people will input the correct info for you.

There are no "standards" for this type of info so until it happens you will either have to scan each line and use else statements eg if this or if this or if this, etc....

To answer the question no-one can really help as there is no real way to do this unless you write a lot of code like shown.

Or alternatively use the form inputmethod....


Accepted Solution

mgjv earned 20 total points
ID: 1830273
I just had a little look at a script I once wrote that I use here to do some stats on the browser log.

Mind you, it doesn't try to get an OS out of the string (in my opinion that is too much of a hassle) but it does give you a list of the following:

number of hits for every :

      user agent full string
      user agent without OS/proxy, etc. (full version)
      user agent by version (just numbers)
      user agent by type

I would never run this code in a CGI, because it is way too bulky in my opinion,

# NOTE: This code is based on someone else's code. Forgot who.

# NOTE 2: this is part of a bigger program, just two subroutines. Not meant to be run as is, but you can read thropugh them and get the idea.
sub read_log {
    my $agent_log = shift;
    my $rawagents = shift;
      my ($line, $Agent, $spoofer, $refscounter);

      if ( $agent_log =~ /\.gz$/ ) {
            open(AGENTLOG,"$GZCAT $agent_log |") ||
                  die "Can't open $agent_log: $!\n";
      else {
            open(AGENTLOG,"$agent_log") ||
                  die "Can't open $agent_log: $!\n";

      while (defined($line=<AGENTLOG>)) {

              (($. % 5000) == 0) && _verbose('.');
              chomp $line;
            $line =~ s#\s+# #go;       # Fixes proxy info bug. Fix suggested by
                                                 # James Walter Martin III <>
            ($Agent) = $line =~ /^\S+\s+(.*)$/;

              $Agent = "Unknown" if (($Agent eq "-") || ($Agent eq ""));

              # Undo any URL encoding of user agent

              $Agent =~ tr/+/ /;
              $Agent =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

              # Despoofs WebTV spoofing as MSIE spoofing as Mozilla.
              if ($Agent =~ m#^\S+\s+(WebTV/\S+)#o) {
                  $Agent = "$1 spoofing as $Agent";

              # despoofs people using pseudo-'standard' of 'compatible'

              if ($Agent =~ m#^Mozilla.*\(compatible; *([^;)]+)#oi) {
                 $spoofer =  $1;
                 $spoofer =~ s#/#-#og;
                 $spoofer =~ s/\W+$//o;
                 $Agent="$spoofer spoofing as $Agent";

              # Lets not let children play with dangerous toys...

              $Agent =~ s#<#\&lt;#go;
              $Agent =~ s#\&#\&amp;#go;
              $Agent =~ s#>#\&gt;#go;
              $Agent =~ s#"#\&quot;#go;


sub process_agents {
      my ($rawagents, $agentgroup, $agentversion, $baseagent) = @_;

      my ($base, $longagent, $agent, $name, $version);

      $^W = 0;
      foreach $agent (keys (%$rawagents)) {

            ($base)         =  $longagent =~ m#^([^\(\[]+)#o;
            $base          =~ s#\s+$##o;
            $base          =~ s#via proxy.*$##ogi;

            ($name,$version) = $base =~ m#^([^\d\/]+)[\s\/vV]+(\d[\.\d]+)#o;
            ($name) = $base =~ m#^([^\d\/]+)#o if (! $name);
            $name =~ s#[-_]# #go;
            $name =~ s#\s+$##o;
            $name =~ s#^(NCSA Mosaic).*#NCSA Mosaic#oi;
            $name =~ s#MSIE#Microsoft Internet Explorer#oi;
            $name =~ s#.*surfbot.*#SurfBot#oi;
            $agentgroup->{$name} += $rawagents->{$agent};
            $agentversion->{"$name $version"} += $rawagents->{$agent};
            $baseagent->{$base} += $rawagents->{$agent};

            $all_browsers{"$name $version"} = 1;
      $^W = 1;

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Autoit restart command not working 6 58
Changing Audit Policies through scripting 5 56
Automating a script for user accounts LINUX 14 70
Folder Permission Powershell 4 51
Recently I have been answering a lot of questions like this in IT forums that I frequent. The question posed is usually something along the lines of "We have software X installed and need to uninstall it for reason Y" or some other variant of the sa…
This article will show, step by step, how to integrate R code into a R Sweave document
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now