Solved

Perl hash help with parsing apache logs

Posted on 2011-03-05
5
469 Views
Last Modified: 2012-05-11
Hello,
I am reading an apache file and parsing the data.
I need to display number of accesses per hostname, number of accesses, and
a percentage of the total accesses that each host accounted for as follows.

I am having a problem calculating the percentage ot total access.

Thanks in advance

   Hits   %-age    Resource
 -----      -----        -----
     7       1            h10.163.23.98.static.ip.windstream.net
     6       1            ip98-179-8-48.om.om.cox.net
     4       1            ip98-168-193-160.om.om.cox.net
     3       1            ip68-110-22-151.om.om.cox.net
 
#reading in file
my ($file) = @ARGV;
open (LOG, $file);
 
 my ($host,$date,$method,$urls,$httpver,$status,$size,$referrer,$agent);
 
#hash for hits
my %Hits;

while ( my $line=<LOG>) {
   ($host,$date,$method,$urls,$httpver,$status,$size,$referrer,$agent) = $line =~
          m/^(\S+) - - \[(\S+ [\-|\+]\d{4})\] "(\S+) (\S+) ([^"]+)" (\d{3}) (\d+|-) "(.*?)" "([^"]+)"$/;

 #Counting number of hits per host, &Hnames is a subroutine that calls $host and does a reverse dns lookup 
$Hits{&Hnames}++

}
 
 
#------------------------------
       print "=" x 78,"\n";
       print "HOSTNAMES\n";
       print "=" x 78,"\n";
       printf "%6s %4s %s\n", "Hits", "%-age", "Recourse";
       printf "%6s %4s %s\n", "-----", "-----","-----";

# Sorting on hits high -> low
foreach my $key ( sort { $Hits{ $b } <=> $Hits{ $a } } (keys %Hits) ) {
     
        my $num += $Hits{$key};
        my $perc = $Hits{$key}/$num;
      
        printf "%6d %4d %5s\n", $Hits{ $key }, $perc, $key;
    
}

Open in new window

0
Comment
Question by:fac66
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
5 Comments
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 35044677
This should do what you want.  If not, let me know where you are seeing an issue...

The problem is that you need the total number of hits prior to looping through the keys to do the output.
#reading in file
my ($file) = @ARGV;
open (LOG, $file);
 
 my ($host,$date,$method,$urls,$httpver,$status,$size,$referrer,$agent);
 
#hash for hits
my (%Hits, $ttl);

while ( my $line=<LOG>) {
   ($host,$date,$method,$urls,$httpver,$status,$size,$referrer,$agent) = $line =~
          m/^(\S+) - - \[(\S+ [\-|\+]\d{4})\] "(\S+) (\S+) ([^"]+)" (\d{3}) (\d+|-) "(.*?)" "([^"]+)"$/;

 #Counting number of hits per host, &Hnames is a subroutine that calls $host and does a reverse dns lookup 
$Hits{&Hnames}++
$ttl++;

}
 
 
#------------------------------
       print "=" x 78,"\n";
       print "HOSTNAMES\n";
       print "=" x 78,"\n";
       printf "%6s %4s %s\n", "Hits", "%-age", "Recourse";
       printf "%6s %4s %s\n", "-----", "-----","-----";

# Sorting on hits high -> low
foreach my $key ( sort { $Hits{ $b } <=> $Hits{ $a } } (keys %Hits) ) {
     
        my $perc = $Hits{$key}/$ttl;
      
        printf "%6d %4d %5s\n", $Hits{ $key }, $perc, $key;
    
}

Open in new window

0
 

Author Comment

by:fac66
ID: 35044817
Thanks for the response!

This is what i get..

Hits   %-age   Resource
 -----   -----     -----
     7     0      h10.163.23.98.static.ip.windstream.net
     6     0      ip98-179-8-48.om.om.cox.net
     4     0      ip98-168-193-160.om.om.cox.net
0
 

Author Comment

by:fac66
ID: 35044850
Actually it does work:

=============================================================================
  Hits   %-age    Resource
 -----   -----    -----
     7   0.69   h10.163.23.98.static.ip.windstream.net
     6   0.59   ip98-179-8-48.om.om.cox.net
     4   0.40   ip98-168-193-160.om.om.cox.net
     3   0.30   ip68-110-22-151.om.om.cox.net

I had to chage the printf to reflect floating point.

printf "%6d %4.2f %5s\n", $Hits{ $key }, $perc, $key;

Open in new window

0
 

Author Comment

by:fac66
ID: 35044868

One other question please..

Is there a way I can get the same results but sorting alphabetically?
I would have to sort by the value rather than key.
What I have tried results in losing the hits count.
# Sorting on hits high -> low
foreach my $key ( sort { $Hits{ $b } <=> $Hits{ $a } } (keys %Hits) ) {     
        my $perc = $Hits{$key}/$ttl;     
        printf "%6d %4d %5s\n", $Hits{ $key }, $perc, $key;    
}

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 35045231
foreach my $key ( sort keys %Hits ) {
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question