Go Premium for a chance to win a PS4. Enter to Win


Help writing a few rows perl.

Posted on 1997-09-11
Medium Priority
Last Modified: 2013-12-25
I'm trying to write a simpel cgi-script and need some help.

I need to split {'HTTP_USER_AGENT'} into two variabels $browser and $operatingsystem.
(For exampel $browser=Monzilla/3.01 and $operatingsystem=Linux )

Can someone help me??
Question by:pucko
LVL 85

Expert Comment

ID: 1830264
I don't see a consistent way to do it which accounts for all the different ways
that different browsers can present that information (if they present it at all)
But something like this looks like it might work for many of them:

($browser,$operatingsystem) = ($ENV{'HTTP_USER_AGENT'} =~ /([^(]+?)\s*(\(.*\)|\b[^(\s)]*)$/);


Expert Comment

ID: 1830265
Well, it's not that simple, unfortunately. Every browser sends a string (or should) which can look basically any way they want it. Things like:

MSIE 3.02 spoofing as Mozilla/2.0 (compatible; MSIE 3.02; Update a; AK; Windows 95) via Squid Cache version 1.0.20

Mozilla/3.01 (Win95; I) via Squid Cache version 1.0.22

Mozilla/3.02GoldC-KIT (Win95; I)

IBM WebExplorer DLL /v1.03

are all possible, and much weirder permutations.

You can probably see it's not that simple to extract the real browser name, version and operating system (if supplied at all) from this garbled mess.

Most of the time, the operating system with modifications can be found between the two first brackets ().

If you try something like:

($browser, $OS) = /^([^\(])\(([^\))/;

This will put everything before the first ( into $browser, and everything from that bracket up to the first \) into $OS.

Of course, you are still left with all the garbage to get rid off.

The only way to really do this reliably is to check for all weirdness (strip off spoofing stuff, etc), and check the OS string for some keywords.

I have a little statistics script here that i run over the browser logs now and again, to see what the breakdown of browsers visiting our site is. I could probably post it here, although it might be slightly too large for that.

But remember, because of all the work that needs to be done to strip out this information, you might not want to run this on every access you get to your page, or your ISP might get very cross with you.

Expert Comment

ID: 1830266

I see that ozo was sort of typing the same as I was typing at the same time :)
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.


Author Comment

ID: 1830267
OK! If I change my question to this:

The only operatingsystem I'm intrested in is: Win95, WinNT Win3.X Mac, Linux and SunOS. Is it possible to solve the problem now???

Expert Comment

ID: 1830268
Same answer. The environment variable REMOTE_USER is part of the CGI specification, and should be available to all CGI applications, regardless of operating system.

Having said that: There might be some broken web servers out there that don't bother setting it. I don't know if Mac does this, but perl should deal with the specifics of the operating system for you.

Or if you mean that you are only interested in browsers on those specific platforms: The answer stays the same. If you have a look at what exactly all those brwosers send, you will be able to see that there is no general standard. The only way to do it is to build an 'expert system'. This basically means that you have a whole bunch of if..elsif...else things to find out what exactly it is.

If you're only interested in a very limited set of browsers/operating systems, then all you need to do is look for a few identifiers. Shouldn't be too hard. The same regex would work, then split up the fields further.

As long as browser manufacturers don't use a standard format like

browsername/version (OS; modifiers) proxyline

it's hard to do anything sensible.

sorry it's not easier.
LVL 85

Expert Comment

ID: 1830269
Well, if you're limiting the question to HTTP_USER_AGENT strings containing
Win95, WinNT Win3.X Mac, Linux and SunOS.
then a solution might be:
($preos,$os,$postos)= ( $ENV{'HTTP_USER_AGENT'} =~
/(.*?)(Win95|WinNT|Win3.X|Mac|Linux|SunOS)(.*)/) || ($ENV{'HTTP_USER_AGENT'},'UNKNOWN','');
Or if you have an explicit list of HTTP_USER_AGENT strings which are relevant to you,
you may be able to do something with that.

Expert Comment

ID: 1830270
MS IE has versions that identify the OS as

(compatible; MSIE 3.02; Update a; AK; Windows 95)
(compatible; MSIE 3.01; Windows NT)

Old Netscape browsers do it like:

Mozilla/1.22 (Windows; I; 16bit)
Mozilla/1.1N (Macintosh; I; 68K)

So you'll need to look for a few more things than just the string above.. Like I said, probably needs a bit of an expert system.
LVL 85

Expert Comment

ID: 1830271
And some browsers allow the user to set the string to anything they choose.
(tinkering users might set any string in any browser)
Which raises the question of what do want the variables
$browser and $operatingsystem for?
e.g. if you're just setting defaults in a user menu,
and the user can explicitly select what they want anyway,
then maybe a half-assed job is sufficient?
Or if there are particular browser versions/operating systems
you wish to identify, then maybe you can explicitly list them?


Expert Comment

ID: 1830272
I agree with ozo. The only sure-fire way to do this is to allow the user to input the variables themselves through some sort of form interface.

This should solve the problem as most people will input the correct info for you.

There are no "standards" for this type of info so until it happens you will either have to scan each line and use else statements eg if this or if this or if this, etc....

To answer the question no-one can really help as there is no real way to do this unless you write a lot of code like shown.

Or alternatively use the form inputmethod....


Accepted Solution

mgjv earned 40 total points
ID: 1830273
I just had a little look at a script I once wrote that I use here to do some stats on the browser log.

Mind you, it doesn't try to get an OS out of the string (in my opinion that is too much of a hassle) but it does give you a list of the following:

number of hits for every :

      user agent full string
      user agent without OS/proxy, etc. (full version)
      user agent by version (just numbers)
      user agent by type

I would never run this code in a CGI, because it is way too bulky in my opinion,

# NOTE: This code is based on someone else's code. Forgot who.

# NOTE 2: this is part of a bigger program, just two subroutines. Not meant to be run as is, but you can read thropugh them and get the idea.
sub read_log {
    my $agent_log = shift;
    my $rawagents = shift;
      my ($line, $Agent, $spoofer, $refscounter);

      if ( $agent_log =~ /\.gz$/ ) {
            open(AGENTLOG,"$GZCAT $agent_log |") ||
                  die "Can't open $agent_log: $!\n";
      else {
            open(AGENTLOG,"$agent_log") ||
                  die "Can't open $agent_log: $!\n";

      while (defined($line=<AGENTLOG>)) {

              (($. % 5000) == 0) && _verbose('.');
              chomp $line;
            $line =~ s#\s+# #go;       # Fixes proxy info bug. Fix suggested by
                                                 # James Walter Martin III <jwm3@chubb.com>
            ($Agent) = $line =~ /^\S+\s+(.*)$/;

              $Agent = "Unknown" if (($Agent eq "-") || ($Agent eq ""));

              # Undo any URL encoding of user agent

              $Agent =~ tr/+/ /;
              $Agent =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

              # Despoofs WebTV spoofing as MSIE spoofing as Mozilla.
              if ($Agent =~ m#^\S+\s+(WebTV/\S+)#o) {
                  $Agent = "$1 spoofing as $Agent";

              # despoofs people using pseudo-'standard' of 'compatible'

              if ($Agent =~ m#^Mozilla.*\(compatible; *([^;)]+)#oi) {
                 $spoofer =  $1;
                 $spoofer =~ s#/#-#og;
                 $spoofer =~ s/\W+$//o;
                 $Agent="$spoofer spoofing as $Agent";

              # Lets not let children play with dangerous toys...

              $Agent =~ s#<#\&lt;#go;
              $Agent =~ s#\&#\&amp;#go;
              $Agent =~ s#>#\&gt;#go;
              $Agent =~ s#"#\&quot;#go;


sub process_agents {
      my ($rawagents, $agentgroup, $agentversion, $baseagent) = @_;

      my ($base, $longagent, $agent, $name, $version);

      $^W = 0;
      foreach $agent (keys (%$rawagents)) {

            ($base)         =  $longagent =~ m#^([^\(\[]+)#o;
            $base          =~ s#\s+$##o;
            $base          =~ s#via proxy.*$##ogi;

            ($name,$version) = $base =~ m#^([^\d\/]+)[\s\/vV]+(\d[\.\d]+)#o;
            ($name) = $base =~ m#^([^\d\/]+)#o if (! $name);
            $name =~ s#[-_]# #go;
            $name =~ s#\s+$##o;
            $name =~ s#^(NCSA Mosaic).*#NCSA Mosaic#oi;
            $name =~ s#MSIE#Microsoft Internet Explorer#oi;
            $name =~ s#.*surfbot.*#SurfBot#oi;
            $agentgroup->{$name} += $rawagents->{$agent};
            $agentversion->{"$name $version"} += $rawagents->{$agent};
            $baseagent->{$base} += $rawagents->{$agent};

            $all_browsers{"$name $version"} = 1;
      $^W = 1;

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Batch, VBS, and scripts in general are incredibly useful for repetitive tasks.  Some tasks can take a while to complete and it can be annoying to check back only to discover that your script finished 5 minutes ago.  Some scripts may complete nearly …
The Windows functions GetTickCount and timeGetTime retrieve the number of milliseconds since the system was started. However, the value is stored in a DWORD, which means that it wraps around to zero every 49.7 days. This article shows how to solve t…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …
Suggested Courses

971 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question