Parse the log file to calculate the average time

Hi,

   I have a log file and I do
grep "doMatch\(\)" info.log | tail -5
with the output like
Case 1:
===================
2008-10-29 17:52:28,606 [INFO ] AM_thread -- END doMatch()
2008-10-29 17:56:28,288 [INFO ] AM_thread -- START doMatch()
2008-10-29 17:59:39,496 [INFO ] AM_thread -- END doMatch()
2008-10-29 18:03:39,582 [INFO ] AM_thread -- START doMatch()
2008-10-29 18:06:57,863 [INFO ] AM_thread -- END doMatch()
====================
or
Case 2:
----------
2008-10-29 17:59:39,496 [INFO ] AM_thread -- END doMatch()
2008-10-29 18:03:39,582 [INFO ] AM_thread -- START doMatch()
2008-10-29 18:06:57,863 [INFO ] AM_thread -- END doMatch()
========================
   I want to extract out the event start and end time and calculate the average time span for the event.
In case 1, there are two events,
 Event1 started at 2008-10-29 17:56:28,288 and ended at 2008-10-29 17:59:39,496. So this event spends 191.208 seconds.
 Event 2 started at 2008-10-29 18:03:39,582 and ended at 2008-10-29 18:06:57,863. So event 2 spends 198.281 seconds.

  For average of two events, it is ( 191.208 + 198.281 ) / 2 = 194.745 seconds

In case 2, there are one event,
 This event started at 2008-10-29 18:03:39,582 and ended at 2008-10-29 18:06:57,863. So event 2 spends 198.281 seconds.
 So the average time is  198.281 seconds

  I need the help to write a script to parse the log files and and calculate the average time in case 1 or case 2.
This script will
1. parse the output from "grep "doMatch\(\)" info.log | tail -5" and
  if there are 4 or 5 lines from the output, then calculate the average time with two events, (between START - END)
  if there are 3 or 2 lines from the output, then calculate the time difference between START - END
  if there are less than 2 lines, error out with message "There are less than two timestamp to calculate the average time"
2. convert those timestamp into ms format and calculate the difference and average time.
3. Print out the result.

The output of this script will sho
1. The end time of the last event, in this example, it is 2008-10-29 18:06:57,863
2. The average time, 194.745 (or 198.281) seconds in my example.
such as
-----------
Time calculated : 2008-10-29 18:06:57,863
Average time    : 194.745 seconds
-----------

    The challenge part is converting the time format (2008-10-29 18:06:57,863) into ms format then do the calculation.

    It is really appreciated that some experts can help me out on this.
LVL 38
wesly_chenAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Adam314Commented:

#!/usr/bin/perl
use strict;
use warnings;
use Time::Local;
 
my @pairs;
my $last;
while(<>) {
	next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;
	$last = $1;
	my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;
	my $se = $9;
	next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));
	if($se eq 'END') {
		push @{$pairs[-1]}, $tl;
	}
	else {
		push @pairs, [$tl];
	}
}
 
my ($sum, $count);
$sum+=$_->[1] - $_->[0], $count++ foreach (@pairs);
my $average = $sum/$count;
 
print "Time calculated : $last\n";
printf "Average time    : %.3f seconds\n", $average;

Open in new window

0
wesly_chenAuthor Commented:
Thanks Adam314 for the script.

With the input
-----------
2008-10-29 19:10:40,017 [INFO ] AM_thread -- START doMatch()
2008-10-29 19:14:12,467 [INFO ] AM_thread -- END doMatch()
2008-10-29 19:18:13,016 [INFO ] AM_thread -- START doMatch()
2008-10-29 19:21:34,862 [INFO ] AM_thread -- END doMatch()
2008-10-29 19:25:33,930 [INFO ] AM_thread -- START doMatch()
----------

You script fails with the following error messages:
-------------
Use of uninitialized value in subtraction (-) at /home/om/scripts/t_avg.pl line 23, <> line 5.
Time calculated : 2008-10-29 19:25:33,930
Average time    : -408435973.211 seconds
-------------
0
Adam314Commented:
With 5 lines, I expected it to always begin with END.  Here is an update that will work with your data.
...
my ($sum, $count);
foreach (@pairs) {
	next unless ($_->[0] and $_->[1]);
	$sum+=$_->[1] - $_->[0], $count++ 
}
my $average = $sum/$count;
...

Open in new window

0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

wesly_chenAuthor Commented:
Thanks for the update.

One of criteria doesn't match is
-----
  if there are less than 2 lines, error out with message "There are less than two timestamp to calculate the average time"
-----
So if the output of "grep "doMatch\(\)" info.log | tail -5" have only one line or
---------
2008-10-29 17:52:28,606 [INFO ] AM_thread -- END doMatch()
2008-10-29 17:56:28,288 [INFO ] AM_thread -- START doMatch()
-----------
Then the script should print out error messages.

Now I got the following error message for 1 line input:
---------
Use of uninitialized value in division (/) at /home/om/scripts/match_avg.pl line 27, <> line 1.
Use of uninitialized value in division (/) at /home/om/scripts/match_avg.pl line 27, <> line 1.
Illegal division by zero at /home/om/scripts/match_avg.pl line 27, <> line 1.
0
Adam314Commented:
forgot about that part... here is an updated version again
#!/usr/bin/perl
use strict;
use warnings;
use Time::Local;
 
my @pairs;
my $last;
while(<>) {
        next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;
        $last = $1;
        my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;
        my $se = $9;
        next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));
        if($se eq 'END') {
                push @{$pairs[-1]}, $tl;
        }
        else {
                push @pairs, [$tl];
        }
}
my ($sum, $count);
foreach (@pairs) {
        next unless ($_->[0] and $_->[1]);
        $sum+=$_->[1] - $_->[0], $count++ 
}
if(!$count) {
	print "There are less than two timestamp to calculate the average time\n";
}
else {
	my $average = $sum/$count;
	
	print "Time calculated : $last\n";
	printf "Average time    : %.3f seconds\n", $average;
}

Open in new window

0
wesly_chenAuthor Commented:
Thanks a lot.
Could you give me some explanation about your scripts?
0
Adam314Commented:

#!/usr/bin/perl
use strict;       #Require variable declarations
use warnings;     #warn about possible bugs
use Time::Local;  #include this module (converts sec,min,hour,month day,month,year into unix time)
 
my @pairs;
my $last;
while(<>) {  #while there is a line to read
        Look for pattern yyyy-mm-dd hh:mm:ss.hhh, skip line if it doesn't match
        next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;
        $last = $1;  #save time to variable $last
        my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;  #save local time in unix format
        my $se = $9;  #save start/end in variable $se
        #skip this line if it is an END and there was no START
        next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));  
        if($se eq 'END') {
                #If an END, save with matching START
                push @{$pairs[-1]}, $tl;
        }
        else {
                #If a START, save a new entry
                push @pairs, [$tl];
        }
}
my ($sum, $count);
foreach (@pairs) {  #Loop through all START/END pairs
        #Skip unless there is a START and END
        next unless ($_->[0] and $_->[1]);
        $sum+=$_->[1] - $_->[0], $count++; #add this difference to $sum and $count
}
if(!$count) {  #if no pairs, display error
        print "There are less than two timestamp to calculate the average time\n";
}
else { #otherwise, display results
        my $average = $sum/$count;
        
        print "Time calculated : $last\n";
        printf "Average time    : %.3f seconds\n", $average;
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.