Solved

Select highest value in each column and delete all other values in a column of a file using shell or perl

Posted on 2011-09-02
7
462 Views
Last Modified: 2012-05-12
Ok. I only want to do one thing then make an output file based on the results.

If I have files in a directory with whole numbers or numbers rounded to the nearest hundredth, I want to locate the highest value in each column and delete all other values leaving me with a file that has only one row of data.

sampleinput file

/path1/samplefile.txt

08-14-2011 00:00:00 0 1 0 14
08-14-2011 00:00:00 0 10 0 7
08-14-2011 00:00:00 0 8 0 3
08-14-2011 00:00:00 0 2 0 3
08-14-2011 00:00:00 0 7 0 3
08-14-2011 00:00:00 0 30 0 3
08-14-2011 00:00:00 0 10 0 1
08-14-2011 00:00:00 0 6 0 23

sample output file

/path2/sampleoutputfile.txt

08-14-2011 00:00:00 0 30 0 23

There would be a row for each file. There would be a single file with all output rows in chrono order
0
Comment
Question by:libertyforall2
  • 3
  • 3
7 Comments
 
LVL 17

Expert Comment

by:Kent Dyer
Comment Utility
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
This should do what you want in perl...
#!/usr/local/bin/perl

use strict;
use warnings;
use List::Util qw(max);

# change these as needed
my $in_dir = '/path1';
my $out = '/path2/sampleoutputfile.txt';

opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

my %data;

foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    my $curr = 0;
    my $row;
    while (<IN>) {
        chomp;
        my ($dt, $ts, @vals) = split /\s+/;
        my $max = max @vals;
        if ($max > $curr) {
            $curr = $max;
            my ($mon, $day, $yr) = split /-/, $dt;
            $row = [$yr, $mon, $day, $td, $_];
        }
    }
    close IN;
    if (exists $data{$row[0]}{$row[1]}{$row[2]}{$row[3]}) {
        push @{$data{$row[0]}{$row[1]}{$row[2]}{$row[3]}}, $row[4];
    } else {
        $data{$row[0]}{$row[1]}{$row[2]}{$row[3]} = [$row[4]];
    }
}

# output each row from input files into output file in chronological order
open OUT, '>', $out or die "could not write $out: $!";
foreach my $yr (sort { $a <=> $b } keys %data) {
    foreach my $mon (sort { $a <=> $b } keys %{$data{$yr}}) {
        foreach my $day (sort { $a <=> $b } keys %{$data{$yr}{$mon}}) {
            foreach my $ts (sort keys %{$data{$yr}{$mon}{$day}}) {
                print join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n";
            }
        }
    }
}
close OUT;

Open in new window

0
 

Author Comment

by:libertyforall2
Comment Utility
I got these error messages.

[rhuff@huina ~/scripts]$ perl fcstvaluesso2.pl
Can't open perl script "fcstvaluesso2.pl": No such file or directory
[rhuff@huina ~/scripts]$ perl fcsthvalueso2.pl
Global symbol "$td" requires explicit package name at fcsthvalueso2.pl line 28.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
Comment Utility
Oops - make the changes below and it should work:

my @row; # line 20
@row = ($yr, $mon, $day, $ts, $_); # line 28
0
 

Author Comment

by:libertyforall2
Comment Utility
I'm using this script

#!/usr/local/bin/perl

use strict;
use warnings;
use List::Util qw(max);

# change these as needed
my $in_dir = '/share/huina/rhuff/forecastfiles/so2b';
my $out = '/share/huina/rhuff/forecastfiles/so2c.txt';

opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

my %data;

foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    my $curr = 0;
    my @row; 
    while (<IN>) {
        chomp;
        my ($dt, $ts, @vals) = split /\s+/;
        my $max = max @vals;
        if ($max > $curr) {
            $curr = $max;
            my ($mon, $day, $yr) = split /-/, $dt;
            @row = ($yr, $mon, $day, $ts, $_); 
        }
    }
    close IN;
    if (exists $data{$row[0]}{$row[1]}{$row[2]}{$row[3]}) {
        push @{$data{$row[0]}{$row[1]}{$row[2]}{$row[3]}}, $row[4];
    } else {
        $data{$row[0]}{$row[1]}{$row[2]}{$row[3]} = [$row[4]];
    }
}

# output each row from input files into output file in chronological order
open OUT, '>', $out or die "could not write $out: $!";
foreach my $yr (sort { $a <=> $b } keys %data) {
    foreach my $mon (sort { $a <=> $b } keys %{$data{$yr}}) {
        foreach my $day (sort { $a <=> $b } keys %{$data{$yr}{$mon}}) {
            foreach my $ts (sort keys %{$data{$yr}{$mon}{$day}}) {
                print join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n";
            }
        }
    }
}
close OUT;

Open in new window


I am getting this error message and output. It produces a blank file as well.

[rhuff@huina ~/scripts]$ perl so2highest.pl
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
0
 

Author Closing Comment

by:libertyforall2
Comment Utility
Still getting error messages but command line output was sufficient to create file by copy and past
0
 
LVL 26

Expert Comment

by:wilcoxon
Comment Utility
Oops - the empty file is due to another minor error (on line 45 this time) - it should be:

print OUT join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n"; # missing OUT

I have no idea why you're getting warnings.  At least when I run it against a directory containing just the sample file you list, I get no warnings.  Are there files in the $in_dir directory that are not in the specified format?  If so, is there a name consistency that would allow only selecting the valid files?  Do some of the files have empty lines?
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
This article explains all about SQL Server Piecemeal Restore with examples in step by step manner.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now