perl or shell delete rows with -99.0000 then average monthly data and create a third column

I have a file with 3 columns.

1. Column 2 needs to be deleted.
2. All rows with -99.0000 need to be deleted.
3. Data in the original column 3 should be converted into monthly averages and will be in column 2 since the original column 2 will be deleted. The number of rows will vary in each average due to deleted rows and different days per month. Data should be averaged to the nearest hundredth.
4. Column 1 should reflect the year and month only since the day will not longer be included for a monthly average.
5. A new column 3 will be created will 12 month binned averages. So the third column won't start until month 12 since 12 data points are needed and rows 1-11 will only have two columns of data and the 3rd column of data begins from row 12.

See attached input file.

Assuming the file perl file is called irradianceavg.pl or the shell script file is called irradianceavg.sh and creates an output file called irradianceavg.txt and I want to called the the script by either

perl /path/irradianceavg.pl > /path/irradianceavg.txt

or

sh /path/irradianceavg.sh > /path/irradianceavg.txt

I want to use the input file found at /path/irradiance.txt
irradiance.txt
libertyforall2Asked:
Who is Participating?
 
ozoConnect With a Mentor Commented:
if the latter:

#!/usr/bin/perl -an
BEGIN{@ARGV=("/path/irradiance.txt");$"="+"}
next if /;/ || $F[2] == -99;
($m)=/(\d{4})/;
push(@s1,$s1),push(@s0,$s0),@s1==12&&(($v,$s)=(eval"(@s1)/(@s0)"," %.2f",shift @s0,shift@s1)),(printf"$p %.2f$s\n",$s1/$s0,$v),$s1=$s0=0 if $s0 && $p!=$m;
$p=$m and ++$s0, $s1+=$F[2];
}continue{
 redo if s/.+// && eof;
0
 
farzanjCommented:
What is binned average?  Would it average latest twelve row?
0
 
farzanjConnect With a Mentor Commented:
Here's something to get you started.  You will have to explain what needs to be done after this

#!/usr/bin/perl

use warnings;
use strict;

my @dat = grep {/\d+\s+[\d.]+\s+\d/} <>;
my @vals;
foreach my $row (@dat)
{
    next if ($row =~ /-99\.0000/);
    $row =~ s/^\s+//;
    my @col = split(/\s+/, $row);
    push (@vals,"$col[0] $col[2]");
}
$" = "\n";
print "@vals", "\n";

Open in new window

0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
Pierre FrançoisConnect With a Mentor Senior consultantCommented:
This shell script computes the first two columns. For column 3, let me think a little bit.

#!/bin/bash

sed '1,/^;\$ Data:/d' irradiance.txt \
| tr -s ' ' : \
| cut -d : -f 2,4 \
| grep -v :-99 \
| sed '/^\(....\)../s//\1/' \
| awk -F : '{if ($1 == prevm) { c++; s += $2 } else { if (c > 0) { printf ("%s %5.2f\n", prevm, s/c)} ; c = 1; prevm = $1; s = $2}} END {printf ("%s %5.2f\n", prevm, s/c) }'

Open in new window

0
 
Pierre FrançoisSenior consultantCommented:
OK, I think this shell script does the job:

#!/bin/bash

sed '1,/^;\$ Data:/d' irradiance.txt \
| tr -s ' ' : \
| cut -d : -f 2,4 \
| grep -v :-99 \
| sed '/^\(....\)../s//\1/' \
| awk -F : '{
	if ($1 == prevm) { 
		c++; s += $2 
	} else { 
		if (c > 0) {
			printf ("%s %5.2f\n", prevm, s/c)
		}; 
		c = 1; 
		prevm = $1; 
		s = $2
	}
} 
END	{printf ("%s %5.2f\n", prevm, s/c) 
}' \
| awk -F ' ' '{
	if (++j > 12) {
		j = 1
	};
	if (++i < 12) {
		last12 [j] = $2; 
		tot += $2; 
		printf ("%s %5.2f\n", $1, $2);
	} else {
		tot -= last12 [j]; 
		last12 [j] = $2;
		tot += $2;
		printf ("%s %5.2f %5.2f\n", $1, $2, tot/12);
	}
}'

Open in new window

0
 
ozoCommented:
#!/usr/bin/perl -an                                                                              
BEGIN{@ARGV=("/path/irradiance.txt")}
next if /;/ || $F[2] == -99;                                                                    
($m)=/(\d{4})/;
printf "%s %.2f\n",$p,$s1/$s0 and $s1=$s0=0 if $s0 && $p!=$m;
$p=$m and ++$s0, $s1+=$F[2];
}continue{
 redo if s/.+// && eof;
0
 
ozoCommented:
Should the 12 month binned averages be the average of the last 12 monthly averages, or the average of the days in the last 12 months?

If the former:

#!/usr/bin/perl -an
BEGIN{@ARGV=("/path/irradiance.txt");$"="+"}
next if /;/ || $F[2] == -99;
($m)=/(\d{4})/;
$m[++$#m]=$s1/$s0,@m==12?(($v,$s)=((eval"@m")/12," %.2f",shift @m)):"",printf "$p %.2f$s\n",$m[-1],$v and $s1=$s0=0 if $s0 && $p!=$m;
$p=$m and ++$s0, $s1+=$F[2];
}continue{
 redo if s/.+// && eof;
0
 
Pierre FrançoisSenior consultantCommented:
@libertyforall2: Just in case you are doubting my script doesn't do the job, I attach hereby the script and its output.
irradianceavg.txt
irradianceavg.sh
0
 
libertyforall2Author Commented:
Works well.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.