Help writing a few rows perl.

Posted on 1997-09-11
Last Modified: 2013-12-25
I'm trying to write a simpel cgi-script and need some help.

I need to split {'HTTP_USER_AGENT'} into two variabels $browser and $operatingsystem.
(For exampel $browser=Monzilla/3.01 and $operatingsystem=Linux )

Can someone help me??
Question by:pucko
LVL 84

Expert Comment

ID: 1830264
I don't see a consistent way to do it which accounts for all the different ways
that different browsers can present that information (if they present it at all)
But something like this looks like it might work for many of them:

($browser,$operatingsystem) = ($ENV{'HTTP_USER_AGENT'} =~ /([^(]+?)\s*(\(.*\)|\b[^(\s)]*)$/);


Expert Comment

ID: 1830265
Well, it's not that simple, unfortunately. Every browser sends a string (or should) which can look basically any way they want it. Things like:

MSIE 3.02 spoofing as Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95) via Squid Cache version 1.0.20

Mozilla/3.01 (Win95; I) via Squid Cache version 1.0.22

Mozilla/3.02GoldC-KIT (Win95; I)

IBM WebExplorer DLL /v1.03

are all possible, and much weirder permutations.

You can probably see it's not that simple to extract the real browser name, version and operating system (if supplied at all) from this garbled mess.

Most of the time, the operating system with modifications can be found between the two first brackets ().

If you try something like:

($browser, $OS) = /^([^\(])\(([^\))/;

This will put everything before the first ( into $browser, and everything from that bracket up to the first \) into $OS.

Of course, you are still left with all the garbage to get rid off.

The only way to really do this reliably is to check for all weirdness (strip off spoofing stuff, etc), and check the OS string for some keywords.

I have a little statistics script here that i run over the browser logs now and again, to see what the breakdown of browsers visiting our site is. I could probably post it here, although it might be slightly too large for that.

But remember, because of all the work that needs to be done to strip out this information, you might not want to run this on every access you get to your page, or your ISP might get very cross with you.

Expert Comment

ID: 1830266

I see that ozo was sort of typing the same as I was typing at the same time :)
Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.


Author Comment

ID: 1830267
OK! If I change my question to this:

The only operatingsystem I'm intrested in is: Win95, WinNT Win3.X Mac, Linux and SunOS. Is it possible to solve the problem now???

Expert Comment

ID: 1830268
Same answer. The environment variable REMOTE_USER is part of the CGI specification, and should be available to all CGI applications, regardless of operating system.

Having said that: There might be some broken web servers out there that don't bother setting it. I don't know if Mac does this, but perl should deal with the specifics of the operating system for you.

Or if you mean that you are only interested in browsers on those specific platforms: The answer stays the same. If you have a look at what exactly all those brwosers send, you will be able to see that there is no general standard. The only way to do it is to build an 'expert system'. This basically means that you have a whole bunch of if..elsif...else things to find out what exactly it is.

If you're only interested in a very limited set of browsers/operating systems, then all you need to do is look for a few identifiers. Shouldn't be too hard. The same regex would work, then split up the fields further.

As long as browser manufacturers don't use a standard format like

browsername/version (OS; modifiers) proxyline

it's hard to do anything sensible.

sorry it's not easier.
LVL 84

Expert Comment

ID: 1830269
Well, if you're limiting the question to HTTP_USER_AGENT strings containing
Win95, WinNT Win3.X Mac, Linux and SunOS.
then a solution might be:
($preos,$os,$postos)= ( $ENV{'HTTP_USER_AGENT'} =~
/(.*?)(Win95|WinNT|Win3.X|Mac|Linux|SunOS)(.*)/) || ($ENV{'HTTP_USER_AGENT'},'UNKNOWN','');
Or if you have an explicit list of HTTP_USER_AGENT strings which are relevant to you,
you may be able to do something with that.

Expert Comment

ID: 1830270
MS IE has versions that identify the OS as

(compatible; MSIE 3.02; Update a; AK; Windows 95)
(compatible; MSIE 3.01; Windows NT)

Old Netscape browsers do it like:

Mozilla/1.22 (Windows; I; 16bit)
Mozilla/1.1N (Macintosh; I; 68K)

So you'll need to look for a few more things than just the string above.. Like I said, probably needs a bit of an expert system.
LVL 84

Expert Comment

ID: 1830271
And some browsers allow the user to set the string to anything they choose.
(tinkering users might set any string in any browser)
Which raises the question of what do want the variables
$browser and $operatingsystem for?
e.g. if you're just setting defaults in a user menu,
and the user can explicitly select what they want anyway,
then maybe a half-assed job is sufficient?
Or if there are particular browser versions/operating systems
you wish to identify, then maybe you can explicitly list them?


Expert Comment

ID: 1830272
I agree with ozo. The only sure-fire way to do this is to allow the user to input the variables themselves through some sort of form interface.

This should solve the problem as most people will input the correct info for you.

There are no "standards" for this type of info so until it happens you will either have to scan each line and use else statements eg if this or if this or if this, etc....

To answer the question no-one can really help as there is no real way to do this unless you write a lot of code like shown.

Or alternatively use the form inputmethod....


Accepted Solution

mgjv earned 20 total points
ID: 1830273
I just had a little look at a script I once wrote that I use here to do some stats on the browser log.

Mind you, it doesn't try to get an OS out of the string (in my opinion that is too much of a hassle) but it does give you a list of the following:

number of hits for every :

      user agent full string
      user agent without OS/proxy, etc. (full version)
      user agent by version (just numbers)
      user agent by type

I would never run this code in a CGI, because it is way too bulky in my opinion,

# NOTE: This code is based on someone else's code. Forgot who.

# NOTE 2: this is part of a bigger program, just two subroutines. Not meant to be run as is, but you can read thropugh them and get the idea.
sub read_log {
    my $agent_log = shift;
    my $rawagents = shift;
      my ($line, $Agent, $spoofer, $refscounter);

      if ( $agent_log =~ /\.gz$/ ) {
            open(AGENTLOG,"$GZCAT $agent_log |") ||
                  die "Can't open $agent_log: $!\n";
      else {
            open(AGENTLOG,"$agent_log") ||
                  die "Can't open $agent_log: $!\n";

      while (defined($line=<AGENTLOG>)) {

              (($. % 5000) == 0) && _verbose('.');
              chomp $line;
            $line =~ s#\s+# #go;       # Fixes proxy info bug. Fix suggested by
                                                 # James Walter Martin III <>
            ($Agent) = $line =~ /^\S+\s+(.*)$/;

              $Agent = "Unknown" if (($Agent eq "-") || ($Agent eq ""));

              # Undo any URL encoding of user agent

              $Agent =~ tr/+/ /;
              $Agent =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

              # Despoofs WebTV spoofing as MSIE spoofing as Mozilla.
              if ($Agent =~ m#^\S+\s+(WebTV/\S+)#o) {
                  $Agent = "$1 spoofing as $Agent";

              # despoofs people using pseudo-'standard' of 'compatible'

              if ($Agent =~ m#^Mozilla.*\(compatible; *([^;)]+)#oi) {
                 $spoofer =  $1;
                 $spoofer =~ s#/#-#og;
                 $spoofer =~ s/\W+$//o;
                 $Agent="$spoofer spoofing as $Agent";

              # Lets not let children play with dangerous toys...

              $Agent =~ s#<#\&lt;#go;
              $Agent =~ s#\&#\&amp;#go;
              $Agent =~ s#>#\&gt;#go;
              $Agent =~ s#"#\&quot;#go;


sub process_agents {
      my ($rawagents, $agentgroup, $agentversion, $baseagent) = @_;

      my ($base, $longagent, $agent, $name, $version);

      $^W = 0;
      foreach $agent (keys (%$rawagents)) {

            ($base)         =  $longagent =~ m#^([^\(\[]+)#o;
            $base          =~ s#\s+$##o;
            $base          =~ s#via proxy.*$##ogi;

            ($name,$version) = $base =~ m#^([^\d\/]+)[\s\/vV]+(\d[\.\d]+)#o;
            ($name) = $base =~ m#^([^\d\/]+)#o if (! $name);
            $name =~ s#[-_]# #go;
            $name =~ s#\s+$##o;
            $name =~ s#^(NCSA Mosaic).*#NCSA Mosaic#oi;
            $name =~ s#MSIE#Microsoft Internet Explorer#oi;
            $name =~ s#.*surfbot.*#SurfBot#oi;
            $agentgroup->{$name} += $rawagents->{$agent};
            $agentversion->{"$name $version"} += $rawagents->{$agent};
            $baseagent->{$base} += $rawagents->{$agent};

            $all_browsers{"$name $version"} = 1;
      $^W = 1;

Featured Post

Space-Age Communications Transitions to DevOps

ViaSat, a global provider of satellite and wireless communications, securely connects businesses, governments, and organizations to the Internet. Learn how ViaSat’s Network Solutions Engineer, drove the transition from a traditional network support to a DevOps-centric model.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Creating 2 files from output with Powershell 5 62
Bartender label printing - switch on and off graphics 3 71
Selecting Right Partition 6 80
Find unused columns in a table 12 67
Introduction This tutorial will give you a fast look what you can do with WhizBase. I expect you already know how to work with HTML at least, and that you understand the basics of the internet and how the internet works. WhizBase is a server-s…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

685 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question