Solved

Parse the log file to calculate the average time

Posted on 2008-10-29
7
677 Views
Last Modified: 2008-10-30
Hi,

   I have a log file and I do
grep "doMatch\(\)" info.log | tail -5
with the output like
Case 1:
===================
2008-10-29 17:52:28,606 [INFO ] AM_thread -- END doMatch()
2008-10-29 17:56:28,288 [INFO ] AM_thread -- START doMatch()
2008-10-29 17:59:39,496 [INFO ] AM_thread -- END doMatch()
2008-10-29 18:03:39,582 [INFO ] AM_thread -- START doMatch()
2008-10-29 18:06:57,863 [INFO ] AM_thread -- END doMatch()
====================
or
Case 2:
----------
2008-10-29 17:59:39,496 [INFO ] AM_thread -- END doMatch()
2008-10-29 18:03:39,582 [INFO ] AM_thread -- START doMatch()
2008-10-29 18:06:57,863 [INFO ] AM_thread -- END doMatch()
========================
   I want to extract out the event start and end time and calculate the average time span for the event.
In case 1, there are two events,
 Event1 started at 2008-10-29 17:56:28,288 and ended at 2008-10-29 17:59:39,496. So this event spends 191.208 seconds.
 Event 2 started at 2008-10-29 18:03:39,582 and ended at 2008-10-29 18:06:57,863. So event 2 spends 198.281 seconds.

  For average of two events, it is ( 191.208 + 198.281 ) / 2 = 194.745 seconds

In case 2, there are one event,
 This event started at 2008-10-29 18:03:39,582 and ended at 2008-10-29 18:06:57,863. So event 2 spends 198.281 seconds.
 So the average time is  198.281 seconds

  I need the help to write a script to parse the log files and and calculate the average time in case 1 or case 2.
This script will
1. parse the output from "grep "doMatch\(\)" info.log | tail -5" and
  if there are 4 or 5 lines from the output, then calculate the average time with two events, (between START - END)
  if there are 3 or 2 lines from the output, then calculate the time difference between START - END
  if there are less than 2 lines, error out with message "There are less than two timestamp to calculate the average time"
2. convert those timestamp into ms format and calculate the difference and average time.
3. Print out the result.

The output of this script will sho
1. The end time of the last event, in this example, it is 2008-10-29 18:06:57,863
2. The average time, 194.745 (or 198.281) seconds in my example.
such as
-----------
Time calculated : 2008-10-29 18:06:57,863
Average time    : 194.745 seconds
-----------

    The challenge part is converting the time format (2008-10-29 18:06:57,863) into ms format then do the calculation.

    It is really appreciated that some experts can help me out on this.
0
Comment
Question by:wesly_chen
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
7 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 22834583

#!/usr/bin/perl
use strict;
use warnings;
use Time::Local;
 
my @pairs;
my $last;
while(<>) {
	next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;
	$last = $1;
	my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;
	my $se = $9;
	next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));
	if($se eq 'END') {
		push @{$pairs[-1]}, $tl;
	}
	else {
		push @pairs, [$tl];
	}
}
 
my ($sum, $count);
$sum+=$_->[1] - $_->[0], $count++ foreach (@pairs);
my $average = $sum/$count;
 
print "Time calculated : $last\n";
printf "Average time    : %.3f seconds\n", $average;

Open in new window

0
 
LVL 38

Author Comment

by:wesly_chen
ID: 22834740
Thanks Adam314 for the script.

With the input
-----------
2008-10-29 19:10:40,017 [INFO ] AM_thread -- START doMatch()
2008-10-29 19:14:12,467 [INFO ] AM_thread -- END doMatch()
2008-10-29 19:18:13,016 [INFO ] AM_thread -- START doMatch()
2008-10-29 19:21:34,862 [INFO ] AM_thread -- END doMatch()
2008-10-29 19:25:33,930 [INFO ] AM_thread -- START doMatch()
----------

You script fails with the following error messages:
-------------
Use of uninitialized value in subtraction (-) at /home/om/scripts/t_avg.pl line 23, <> line 5.
Time calculated : 2008-10-29 19:25:33,930
Average time    : -408435973.211 seconds
-------------
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22836732
With 5 lines, I expected it to always begin with END.  Here is an update that will work with your data.
...
my ($sum, $count);
foreach (@pairs) {
	next unless ($_->[0] and $_->[1]);
	$sum+=$_->[1] - $_->[0], $count++ 
}
my $average = $sum/$count;
...

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 38

Author Comment

by:wesly_chen
ID: 22837004
Thanks for the update.

One of criteria doesn't match is
-----
  if there are less than 2 lines, error out with message "There are less than two timestamp to calculate the average time"
-----
So if the output of "grep "doMatch\(\)" info.log | tail -5" have only one line or
---------
2008-10-29 17:52:28,606 [INFO ] AM_thread -- END doMatch()
2008-10-29 17:56:28,288 [INFO ] AM_thread -- START doMatch()
-----------
Then the script should print out error messages.

Now I got the following error message for 1 line input:
---------
Use of uninitialized value in division (/) at /home/om/scripts/match_avg.pl line 27, <> line 1.
Use of uninitialized value in division (/) at /home/om/scripts/match_avg.pl line 27, <> line 1.
Illegal division by zero at /home/om/scripts/match_avg.pl line 27, <> line 1.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22837518
forgot about that part... here is an updated version again
#!/usr/bin/perl
use strict;
use warnings;
use Time::Local;
 
my @pairs;
my $last;
while(<>) {
        next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;
        $last = $1;
        my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;
        my $se = $9;
        next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));
        if($se eq 'END') {
                push @{$pairs[-1]}, $tl;
        }
        else {
                push @pairs, [$tl];
        }
}
my ($sum, $count);
foreach (@pairs) {
        next unless ($_->[0] and $_->[1]);
        $sum+=$_->[1] - $_->[0], $count++ 
}
if(!$count) {
	print "There are less than two timestamp to calculate the average time\n";
}
else {
	my $average = $sum/$count;
	
	print "Time calculated : $last\n";
	printf "Average time    : %.3f seconds\n", $average;
}

Open in new window

0
 
LVL 38

Author Comment

by:wesly_chen
ID: 22837738
Thanks a lot.
Could you give me some explanation about your scripts?
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 500 total points
ID: 22837844

#!/usr/bin/perl
use strict;       #Require variable declarations
use warnings;     #warn about possible bugs
use Time::Local;  #include this module (converts sec,min,hour,month day,month,year into unix time)
 
my @pairs;
my $last;
while(<>) {  #while there is a line to read
        Look for pattern yyyy-mm-dd hh:mm:ss.hhh, skip line if it doesn't match
        next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;
        $last = $1;  #save time to variable $last
        my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;  #save local time in unix format
        my $se = $9;  #save start/end in variable $se
        #skip this line if it is an END and there was no START
        next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));  
        if($se eq 'END') {
                #If an END, save with matching START
                push @{$pairs[-1]}, $tl;
        }
        else {
                #If a START, save a new entry
                push @pairs, [$tl];
        }
}
my ($sum, $count);
foreach (@pairs) {  #Loop through all START/END pairs
        #Skip unless there is a START and END
        next unless ($_->[0] and $_->[1]);
        $sum+=$_->[1] - $_->[0], $count++; #add this difference to $sum and $count
}
if(!$count) {  #if no pairs, display error
        print "There are less than two timestamp to calculate the average time\n";
}
else { #otherwise, display results
        my $average = $sum/$count;
        
        print "Time calculated : $last\n";
        printf "Average time    : %.3f seconds\n", $average;
}

Open in new window

0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Unix/bash: scripted arithmetic 13 103
Perl script to delete older files 6 100
read an xml file in perl 2 86
need a help to fix one if else shell script issue 6 33
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded (http://www.mongodb.org/downloads),  installed and started the daemon, and had a console window open. After an hour or two of playing at the command …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question