uluttrell
asked on
Calculate Statistics From Log file
This is not a homework assignment.
I have log files for multiple servers that are kept in a repository. The log file's filename format is application.servername.tim estampinYY YYMMDD.log .
Each of the log files in the directory has the following format:
YYYYMMDD Time-TimeZone Servername Application PID; MeasuredStatistic (XX/YY) Value for TimeStamp
20030817 000216010-0400 servername application 4567;MeasuredStatisticOne( 12/18) 5217
20030817 000216110-0400 servername application 4567;MeasuredStatisticTwo( 12/14) 419276
20030817 000216110-0400 servername application 4567;MeasuredStatisticThre e(12/31) 57912
20030817 000216110-0400 servername application 4567;MeasuredStatisticFour (12/12) 72
20030817 000216110-0400 servername application 4567;MeasuredStatisticFive (12/13) 1451718
The log files roll over at midnight. Each statistic is measured at a random interval.
I am trying to write a perl script that will do the following:
For Each server
*determine the number of times that a given statistic appears on a day for a server. Use that number to calculate the average for the MeasuredStatistic for the day.
* sum the days to determine the totals and averages for the week for a server.
* determine the average for all statics for all servers and export to a csv file to be used in an Excel spreadsheet.
* determine totals for all servers and export to a csv file to be used in an Excel spreadsheet.
I have written the following code, but it is not producing the desired results.
=====Begin code.pl
#! /usr/bin/perl
%module_count = ();
%module_sum = ();
while (<>) {
chomp;
next if (/^\s*$/);
my ($date, $time, $host, $server, $pid, $metric, $value) = split(/\s/);
$module_count{$module}++;
$module_sum{$module} += $percent;
}
foreach $module (sort keys %module_count) {
printf "%s %dx average is %d%%\n",
$module,
$module_count{$module},
$module_sum{$module} / $module_count{$module};
=====End code.pl
How would I script this properly in perl?
I have log files for multiple servers that are kept in a repository. The log file's filename format is application.servername.tim
Each of the log files in the directory has the following format:
YYYYMMDD Time-TimeZone Servername Application PID; MeasuredStatistic (XX/YY) Value for TimeStamp
20030817 000216010-0400 servername application 4567;MeasuredStatisticOne(
20030817 000216110-0400 servername application 4567;MeasuredStatisticTwo(
20030817 000216110-0400 servername application 4567;MeasuredStatisticThre
20030817 000216110-0400 servername application 4567;MeasuredStatisticFour
20030817 000216110-0400 servername application 4567;MeasuredStatisticFive
The log files roll over at midnight. Each statistic is measured at a random interval.
I am trying to write a perl script that will do the following:
For Each server
*determine the number of times that a given statistic appears on a day for a server. Use that number to calculate the average for the MeasuredStatistic for the day.
* sum the days to determine the totals and averages for the week for a server.
* determine the average for all statics for all servers and export to a csv file to be used in an Excel spreadsheet.
* determine totals for all servers and export to a csv file to be used in an Excel spreadsheet.
I have written the following code, but it is not producing the desired results.
=====Begin code.pl
#! /usr/bin/perl
%module_count = ();
%module_sum = ();
while (<>) {
chomp;
next if (/^\s*$/);
my ($date, $time, $host, $server, $pid, $metric, $value) = split(/\s/);
$module_count{$module}++;
$module_sum{$module} += $percent;
}
foreach $module (sort keys %module_count) {
printf "%s %dx average is %d%%\n",
$module,
$module_count{$module},
$module_sum{$module} / $module_count{$module};
=====End code.pl
How would I script this properly in perl?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Just came back from a vacation.
Yeah, it should be $date not $data. Sorry for the typo. Use strict catches typo like this and make it easier to debug, so do use use strict in your programs. It'll help a lot.
Yeah, it should be $date not $data. Sorry for the typo. Use strict catches typo like this and make it easier to debug, so do use use strict in your programs. It'll help a lot.
ASKER
Thank you both for the comment and pointers. I will follow up with how they work.
Hi, uluttrell, I just realized my code has a bug. The
use Time::Local;
$sunday = timelocal(0,0,0,2,7,1998); # 1998/8/3, 0 am
$week = 7 * 24 * 3600;
my ($year, $month, $day) = $date =~ /(\d{4,4})(\d{2,2})(\d{2,2 })/;
$now = timelocal(0,0,0,$day-1,$mo nth-1,$yea r);
$server_count{$server}->{w eekstat}-> {int(($now -$sunday)/ $week)}++;
The previous method looked cute but it's not working correctly.
use Time::Local;
$sunday = timelocal(0,0,0,2,7,1998);
$week = 7 * 24 * 3600;
my ($year, $month, $day) = $date =~ /(\d{4,4})(\d{2,2})(\d{2,2
$now = timelocal(0,0,0,$day-1,$mo
$server_count{$server}->{w
The previous method looked cute but it's not working correctly.
ASKER
Hi, inq123, Thanks for the correction. Would you please post the final version?
This time I made sure everything works perfectly. Now use this log file, save as test.log:
20030817 000216010-0400 servername application 4567;MeasuredStatisticOne( 12/18) 5217
20030815 000216110-0400 servername application 4567;MeasuredStatisticTwo( 12/14) 419276
20030817 000216110-0400 servername1 application 4567;MeasuredStatisticThre e(12/31) 57912
20030816 000216110-0400 servername1 application 4567;MeasuredStatisticFour (12/12) 72
20030817 000216110-0400 servername application 4567;MeasuredStatisticFive (12/13) 1451718
Then save this script as test.pl:
#! /usr/bin/perl
use strict;
use Time::Local;
my $sunday = timelocal(0,0,0,2,7,2003); # 2003/8/3, 0 am
my $daylength = 24 * 3600;
my $week = 7 * $daylength;
my (%server_count, %all_count);
while (<>) {
next if (/^\s*$/);
chomp;
my ($date, $time, $host, $server, $tmp, $value) = split(/\s/);
my ($pid, $metric) = split(/;/, $tmp);
$server_count{$host}->{day stat}->{$d ate}++; # records each server's number of times per day
my ($year, $month, $day) = $date =~ /(\d{4,4})(\d{2,2})(\d{2,2 })/;
my $now = timelocal(0,0,0,$day-1,$mo nth-1,$yea r);
$server_count{$host}->{wee kstat}->{i nt(($now-$ sunday)/$w eek)}++; # records server's total for each week, convenient
$all_count{daystat}->{$dat e}++; # records all servers' per day stat. not really needed, but convenient as we don't have to add servers up
$all_count{weekstat}->{int (($now-$su nday)/$wee k)}++; # same as above
}
foreach my $host (keys %server_count)
{
foreach my $weekno (keys %{$server_count{$host}->{w eekstat}})
{
my $start = localtime($sunday + $weekno * $week + $daylength);
print "for server $host, weekly count for $start - " . localtime($sunday + ($weekno+1) * $week + $daylength) . " is $server_count{$host}->{wee kstat}->{$ weekno}\n" ;
}
}
Now finally launch the script with perl test.pl < test.log, you'll see everything works
20030817 000216010-0400 servername application 4567;MeasuredStatisticOne(
20030815 000216110-0400 servername application 4567;MeasuredStatisticTwo(
20030817 000216110-0400 servername1 application 4567;MeasuredStatisticThre
20030816 000216110-0400 servername1 application 4567;MeasuredStatisticFour
20030817 000216110-0400 servername application 4567;MeasuredStatisticFive
Then save this script as test.pl:
#! /usr/bin/perl
use strict;
use Time::Local;
my $sunday = timelocal(0,0,0,2,7,2003);
my $daylength = 24 * 3600;
my $week = 7 * $daylength;
my (%server_count, %all_count);
while (<>) {
next if (/^\s*$/);
chomp;
my ($date, $time, $host, $server, $tmp, $value) = split(/\s/);
my ($pid, $metric) = split(/;/, $tmp);
$server_count{$host}->{day
my ($year, $month, $day) = $date =~ /(\d{4,4})(\d{2,2})(\d{2,2
my $now = timelocal(0,0,0,$day-1,$mo
$server_count{$host}->{wee
$all_count{daystat}->{$dat
$all_count{weekstat}->{int
}
foreach my $host (keys %server_count)
{
foreach my $weekno (keys %{$server_count{$host}->{w
{
my $start = localtime($sunday + $weekno * $week + $daylength);
print "for server $host, weekly count for $start - " . localtime($sunday + ($weekno+1) * $week + $daylength) . " is $server_count{$host}->{wee
}
}
Now finally launch the script with perl test.pl < test.log, you'll see everything works
use this log file would test even better as my old method would've worked on the log file above, but not this one as the month changed in this one:
20030817 000216010-0400 servername application 4567;MeasuredStatisticOne( 12/18) 5217
20030815 000216110-0400 servername application 4567;MeasuredStatisticTwo( 12/14) 419276
20030817 000216110-0400 servername1 application 4567;MeasuredStatisticThre e(12/31) 57912
20030816 000216110-0400 servername1 application 4567;MeasuredStatisticFour (12/12) 72
20030917 000216110-0400 servername application 4567;MeasuredStatisticFive (12/13) 1451718
20030817 000216010-0400 servername application 4567;MeasuredStatisticOne(
20030815 000216110-0400 servername application 4567;MeasuredStatisticTwo(
20030817 000216110-0400 servername1 application 4567;MeasuredStatisticThre
20030816 000216110-0400 servername1 application 4567;MeasuredStatisticFour
20030917 000216110-0400 servername application 4567;MeasuredStatisticFive
ASKER
Thanks so much inq123. It works great and I can tweak it further. I appreciate all of your help :)
ASKER
Hi inq123,
I rethought the problem and decidied that I could cat all the stat files for a week into a single file.
Now, instead of the average for the servers, I would like the average for each of the measuredStatistics. How would I do this in perl?
I will assign more points for this because this is a variation on my original submittal.
I rethought the problem and decidied that I could cat all the stat files for a week into a single file.
Now, instead of the average for the servers, I would like the average for each of the measuredStatistics. How would I do this in perl?
I will assign more points for this because this is a variation on my original submittal.
I could sure write the program but please explain the measuredStatistics, or give me an equation how to calculate the average as I do not quite understand the format and meaning of each number. And I do not know you want to average the measuredStats against what, each day, week, metric? Please give me some specifics.
ASKER
The measuredStatistic is a general term. I apologize for being vague.
The application on each server measures on 8 distinct statistics. These statistics are populated to the server's log file at random intervals. Each time that statistic appears in the file that is the total for that time of the day, ie if the line reads
20030817 000216010-0400 servername application 4567;AccumulatedConnection s(12/18) 5217
the total AccumulatedConnections for the day ending at 000216010 is 5217.
With the files rolling over at midnight, the latest time stamp for any one of the 8 statistics shows the total for that particular statistic for that day.
Does this help any?
The application on each server measures on 8 distinct statistics. These statistics are populated to the server's log file at random intervals. Each time that statistic appears in the file that is the total for that time of the day, ie if the line reads
20030817 000216010-0400 servername application 4567;AccumulatedConnection
the total AccumulatedConnections for the day ending at 000216010 is 5217.
With the files rolling over at midnight, the latest time stamp for any one of the 8 statistics shows the total for that particular statistic for that day.
Does this help any?
That still does not explain what the average for measuredStatistc is. I cannot fully understand how such a log would work. I mean, if the stats were populated at random interval, and files were changed (files rolled over? what does that mean?) at midnight, then how can you guarantee the last stats shows the total for that day? (why you said the last time stamp shows the total for that day? What does timestamp have to do with any stat?)
I think I'm almost more confused now than before. But if you simply give me an equation for calculating the average using the example log file you gave out, then I can write out the script for you.
I think I'm almost more confused now than before. But if you simply give me an equation for calculating the average using the example log file you gave out, then I can write out the script for you.
ASKER
Rolled over means that a new log file is created at midnight. The new log file is server.log. This file is a symlink to the current day's log file.
I'll keep trying some suggestions.
I did not mean to confuse you more.
I'll keep trying some suggestions.
I did not mean to confuse you more.
I understand. I just meant that I'm still at a lost as to how should I calculate average. Let me know if I can help you.
ASKER
When I attempt to run the code, I get the following errors for the following lines:
Global symbol "$data" requires explicit package name at
$server_count{$server}->{w
Global symbol "$data" requires explicit package name at
$all_count{weekstat}->{int