Solved

Parse the log file to calculate the average time

Posted on 2008-10-29
7
666 Views
Last Modified: 2008-10-30
Hi,

   I have a log file and I do
grep "doMatch\(\)" info.log | tail -5
with the output like
Case 1:
===================
2008-10-29 17:52:28,606 [INFO ] AM_thread -- END doMatch()
2008-10-29 17:56:28,288 [INFO ] AM_thread -- START doMatch()
2008-10-29 17:59:39,496 [INFO ] AM_thread -- END doMatch()
2008-10-29 18:03:39,582 [INFO ] AM_thread -- START doMatch()
2008-10-29 18:06:57,863 [INFO ] AM_thread -- END doMatch()
====================
or
Case 2:
----------
2008-10-29 17:59:39,496 [INFO ] AM_thread -- END doMatch()
2008-10-29 18:03:39,582 [INFO ] AM_thread -- START doMatch()
2008-10-29 18:06:57,863 [INFO ] AM_thread -- END doMatch()
========================
   I want to extract out the event start and end time and calculate the average time span for the event.
In case 1, there are two events,
 Event1 started at 2008-10-29 17:56:28,288 and ended at 2008-10-29 17:59:39,496. So this event spends 191.208 seconds.
 Event 2 started at 2008-10-29 18:03:39,582 and ended at 2008-10-29 18:06:57,863. So event 2 spends 198.281 seconds.

  For average of two events, it is ( 191.208 + 198.281 ) / 2 = 194.745 seconds

In case 2, there are one event,
 This event started at 2008-10-29 18:03:39,582 and ended at 2008-10-29 18:06:57,863. So event 2 spends 198.281 seconds.
 So the average time is  198.281 seconds

  I need the help to write a script to parse the log files and and calculate the average time in case 1 or case 2.
This script will
1. parse the output from "grep "doMatch\(\)" info.log | tail -5" and
  if there are 4 or 5 lines from the output, then calculate the average time with two events, (between START - END)
  if there are 3 or 2 lines from the output, then calculate the time difference between START - END
  if there are less than 2 lines, error out with message "There are less than two timestamp to calculate the average time"
2. convert those timestamp into ms format and calculate the difference and average time.
3. Print out the result.

The output of this script will sho
1. The end time of the last event, in this example, it is 2008-10-29 18:06:57,863
2. The average time, 194.745 (or 198.281) seconds in my example.
such as
-----------
Time calculated : 2008-10-29 18:06:57,863
Average time    : 194.745 seconds
-----------

    The challenge part is converting the time format (2008-10-29 18:06:57,863) into ms format then do the calculation.

    It is really appreciated that some experts can help me out on this.
0
Comment
Question by:wesly_chen
  • 4
  • 3
7 Comments
 
LVL 39

Expert Comment

by:Adam314
ID: 22834583

#!/usr/bin/perl

use strict;

use warnings;

use Time::Local;
 

my @pairs;

my $last;

while(<>) {

	next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;

	$last = $1;

	my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;

	my $se = $9;

	next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));

	if($se eq 'END') {

		push @{$pairs[-1]}, $tl;

	}

	else {

		push @pairs, [$tl];

	}

}
 

my ($sum, $count);

$sum+=$_->[1] - $_->[0], $count++ foreach (@pairs);

my $average = $sum/$count;
 

print "Time calculated : $last\n";

printf "Average time    : %.3f seconds\n", $average;

Open in new window

0
 
LVL 38

Author Comment

by:wesly_chen
ID: 22834740
Thanks Adam314 for the script.

With the input
-----------
2008-10-29 19:10:40,017 [INFO ] AM_thread -- START doMatch()
2008-10-29 19:14:12,467 [INFO ] AM_thread -- END doMatch()
2008-10-29 19:18:13,016 [INFO ] AM_thread -- START doMatch()
2008-10-29 19:21:34,862 [INFO ] AM_thread -- END doMatch()
2008-10-29 19:25:33,930 [INFO ] AM_thread -- START doMatch()
----------

You script fails with the following error messages:
-------------
Use of uninitialized value in subtraction (-) at /home/om/scripts/t_avg.pl line 23, <> line 5.
Time calculated : 2008-10-29 19:25:33,930
Average time    : -408435973.211 seconds
-------------
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22836732
With 5 lines, I expected it to always begin with END.  Here is an update that will work with your data.
...

my ($sum, $count);

foreach (@pairs) {

	next unless ($_->[0] and $_->[1]);

	$sum+=$_->[1] - $_->[0], $count++ 

}

my $average = $sum/$count;

...

Open in new window

0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 38

Author Comment

by:wesly_chen
ID: 22837004
Thanks for the update.

One of criteria doesn't match is
-----
  if there are less than 2 lines, error out with message "There are less than two timestamp to calculate the average time"
-----
So if the output of "grep "doMatch\(\)" info.log | tail -5" have only one line or
---------
2008-10-29 17:52:28,606 [INFO ] AM_thread -- END doMatch()
2008-10-29 17:56:28,288 [INFO ] AM_thread -- START doMatch()
-----------
Then the script should print out error messages.

Now I got the following error message for 1 line input:
---------
Use of uninitialized value in division (/) at /home/om/scripts/match_avg.pl line 27, <> line 1.
Use of uninitialized value in division (/) at /home/om/scripts/match_avg.pl line 27, <> line 1.
Illegal division by zero at /home/om/scripts/match_avg.pl line 27, <> line 1.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 22837518
forgot about that part... here is an updated version again
#!/usr/bin/perl

use strict;

use warnings;

use Time::Local;

 

my @pairs;

my $last;

while(<>) {

        next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;

        $last = $1;

        my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;

        my $se = $9;

        next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));

        if($se eq 'END') {

                push @{$pairs[-1]}, $tl;

        }

        else {

                push @pairs, [$tl];

        }

}

my ($sum, $count);

foreach (@pairs) {

        next unless ($_->[0] and $_->[1]);

        $sum+=$_->[1] - $_->[0], $count++ 

}

if(!$count) {

	print "There are less than two timestamp to calculate the average time\n";

}

else {

	my $average = $sum/$count;

	

	print "Time calculated : $last\n";

	printf "Average time    : %.3f seconds\n", $average;

}

Open in new window

0
 
LVL 38

Author Comment

by:wesly_chen
ID: 22837738
Thanks a lot.
Could you give me some explanation about your scripts?
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 500 total points
ID: 22837844

#!/usr/bin/perl

use strict;       #Require variable declarations

use warnings;     #warn about possible bugs

use Time::Local;  #include this module (converts sec,min,hour,month day,month,year into unix time)

 

my @pairs;

my $last;

while(<>) {  #while there is a line to read

        Look for pattern yyyy-mm-dd hh:mm:ss.hhh, skip line if it doesn't match

        next unless /((\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2}),(\d+)).*(START|END)/;

        $last = $1;  #save time to variable $last

        my $tl = timelocal($7,$6,$5, $4,$3-1,$2-1900) + $8/1000;  #save local time in unix format

        my $se = $9;  #save start/end in variable $se

        #skip this line if it is an END and there was no START

        next if (($se eq 'END') and ($#pairs<0 or $#{$pairs[-1]}==1));  

        if($se eq 'END') {

                #If an END, save with matching START

                push @{$pairs[-1]}, $tl;

        }

        else {

                #If a START, save a new entry

                push @pairs, [$tl];

        }

}

my ($sum, $count);

foreach (@pairs) {  #Loop through all START/END pairs

        #Skip unless there is a START and END

        next unless ($_->[0] and $_->[1]);

        $sum+=$_->[1] - $_->[0], $count++; #add this difference to $sum and $count

}

if(!$count) {  #if no pairs, display error

        print "There are less than two timestamp to calculate the average time\n";

}

else { #otherwise, display results

        my $average = $sum/$count;

        

        print "Time calculated : $last\n";

        printf "Average time    : %.3f seconds\n", $average;

}

Open in new window

0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

The following is a collection of cases for strange behaviour when using advanced techniques in DOS batch files. You should have some basic experience in batch "programming", as I'm assuming some knowledge and not further explain the basics. For some…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now