Solved

Perl hash help with parsing apache logs

Posted on 2011-03-05
5
468 Views
Last Modified: 2012-05-11
Hello,
I am reading an apache file and parsing the data.
I need to display number of accesses per hostname, number of accesses, and
a percentage of the total accesses that each host accounted for as follows.

I am having a problem calculating the percentage ot total access.

Thanks in advance

   Hits   %-age    Resource
 -----      -----        -----
     7       1            h10.163.23.98.static.ip.windstream.net
     6       1            ip98-179-8-48.om.om.cox.net
     4       1            ip98-168-193-160.om.om.cox.net
     3       1            ip68-110-22-151.om.om.cox.net
 
#reading in file
my ($file) = @ARGV;
open (LOG, $file);
 
 my ($host,$date,$method,$urls,$httpver,$status,$size,$referrer,$agent);
 
#hash for hits
my %Hits;

while ( my $line=<LOG>) {
   ($host,$date,$method,$urls,$httpver,$status,$size,$referrer,$agent) = $line =~
          m/^(\S+) - - \[(\S+ [\-|\+]\d{4})\] "(\S+) (\S+) ([^"]+)" (\d{3}) (\d+|-) "(.*?)" "([^"]+)"$/;

 #Counting number of hits per host, &Hnames is a subroutine that calls $host and does a reverse dns lookup 
$Hits{&Hnames}++

}
 
 
#------------------------------
       print "=" x 78,"\n";
       print "HOSTNAMES\n";
       print "=" x 78,"\n";
       printf "%6s %4s %s\n", "Hits", "%-age", "Recourse";
       printf "%6s %4s %s\n", "-----", "-----","-----";

# Sorting on hits high -> low
foreach my $key ( sort { $Hits{ $b } <=> $Hits{ $a } } (keys %Hits) ) {
     
        my $num += $Hits{$key};
        my $perc = $Hits{$key}/$num;
      
        printf "%6d %4d %5s\n", $Hits{ $key }, $perc, $key;
    
}

Open in new window

0
Comment
Question by:fac66
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 35044677
This should do what you want.  If not, let me know where you are seeing an issue...

The problem is that you need the total number of hits prior to looping through the keys to do the output.
#reading in file
my ($file) = @ARGV;
open (LOG, $file);
 
 my ($host,$date,$method,$urls,$httpver,$status,$size,$referrer,$agent);
 
#hash for hits
my (%Hits, $ttl);

while ( my $line=<LOG>) {
   ($host,$date,$method,$urls,$httpver,$status,$size,$referrer,$agent) = $line =~
          m/^(\S+) - - \[(\S+ [\-|\+]\d{4})\] "(\S+) (\S+) ([^"]+)" (\d{3}) (\d+|-) "(.*?)" "([^"]+)"$/;

 #Counting number of hits per host, &Hnames is a subroutine that calls $host and does a reverse dns lookup 
$Hits{&Hnames}++
$ttl++;

}
 
 
#------------------------------
       print "=" x 78,"\n";
       print "HOSTNAMES\n";
       print "=" x 78,"\n";
       printf "%6s %4s %s\n", "Hits", "%-age", "Recourse";
       printf "%6s %4s %s\n", "-----", "-----","-----";

# Sorting on hits high -> low
foreach my $key ( sort { $Hits{ $b } <=> $Hits{ $a } } (keys %Hits) ) {
     
        my $perc = $Hits{$key}/$ttl;
      
        printf "%6d %4d %5s\n", $Hits{ $key }, $perc, $key;
    
}

Open in new window

0
 

Author Comment

by:fac66
ID: 35044817
Thanks for the response!

This is what i get..

Hits   %-age   Resource
 -----   -----     -----
     7     0      h10.163.23.98.static.ip.windstream.net
     6     0      ip98-179-8-48.om.om.cox.net
     4     0      ip98-168-193-160.om.om.cox.net
0
 

Author Comment

by:fac66
ID: 35044850
Actually it does work:

=============================================================================
  Hits   %-age    Resource
 -----   -----    -----
     7   0.69   h10.163.23.98.static.ip.windstream.net
     6   0.59   ip98-179-8-48.om.om.cox.net
     4   0.40   ip98-168-193-160.om.om.cox.net
     3   0.30   ip68-110-22-151.om.om.cox.net

I had to chage the printf to reflect floating point.

printf "%6d %4.2f %5s\n", $Hits{ $key }, $perc, $key;

Open in new window

0
 

Author Comment

by:fac66
ID: 35044868

One other question please..

Is there a way I can get the same results but sorting alphabetically?
I would have to sort by the value rather than key.
What I have tried results in losing the hits count.
# Sorting on hits high -> low
foreach my $key ( sort { $Hits{ $b } <=> $Hits{ $a } } (keys %Hits) ) {     
        my $perc = $Hits{$key}/$ttl;     
        printf "%6d %4d %5s\n", $Hits{ $key }, $perc, $key;    
}

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 35045231
foreach my $key ( sort keys %Hits ) {
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question