Solved

Select highest value in each column and delete all other values in a column of a file using shell or perl

Posted on 2011-09-02
7
519 Views
Last Modified: 2012-05-12
Ok. I only want to do one thing then make an output file based on the results.

If I have files in a directory with whole numbers or numbers rounded to the nearest hundredth, I want to locate the highest value in each column and delete all other values leaving me with a file that has only one row of data.

sampleinput file

/path1/samplefile.txt

08-14-2011 00:00:00 0 1 0 14
08-14-2011 00:00:00 0 10 0 7
08-14-2011 00:00:00 0 8 0 3
08-14-2011 00:00:00 0 2 0 3
08-14-2011 00:00:00 0 7 0 3
08-14-2011 00:00:00 0 30 0 3
08-14-2011 00:00:00 0 10 0 1
08-14-2011 00:00:00 0 6 0 23

sample output file

/path2/sampleoutputfile.txt

08-14-2011 00:00:00 0 30 0 23

There would be a row for each file. There would be a single file with all output rows in chrono order
0
Comment
Question by:libertyforall2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 17

Expert Comment

by:Kent Dyer
ID: 36476837
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 36476932
This should do what you want in perl...
#!/usr/local/bin/perl

use strict;
use warnings;
use List::Util qw(max);

# change these as needed
my $in_dir = '/path1';
my $out = '/path2/sampleoutputfile.txt';

opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

my %data;

foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    my $curr = 0;
    my $row;
    while (<IN>) {
        chomp;
        my ($dt, $ts, @vals) = split /\s+/;
        my $max = max @vals;
        if ($max > $curr) {
            $curr = $max;
            my ($mon, $day, $yr) = split /-/, $dt;
            $row = [$yr, $mon, $day, $td, $_];
        }
    }
    close IN;
    if (exists $data{$row[0]}{$row[1]}{$row[2]}{$row[3]}) {
        push @{$data{$row[0]}{$row[1]}{$row[2]}{$row[3]}}, $row[4];
    } else {
        $data{$row[0]}{$row[1]}{$row[2]}{$row[3]} = [$row[4]];
    }
}

# output each row from input files into output file in chronological order
open OUT, '>', $out or die "could not write $out: $!";
foreach my $yr (sort { $a <=> $b } keys %data) {
    foreach my $mon (sort { $a <=> $b } keys %{$data{$yr}}) {
        foreach my $day (sort { $a <=> $b } keys %{$data{$yr}{$mon}}) {
            foreach my $ts (sort keys %{$data{$yr}{$mon}{$day}}) {
                print join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n";
            }
        }
    }
}
close OUT;

Open in new window

0
 

Author Comment

by:libertyforall2
ID: 36476992
I got these error messages.

[rhuff@huina ~/scripts]$ perl fcstvaluesso2.pl
Can't open perl script "fcstvaluesso2.pl": No such file or directory
[rhuff@huina ~/scripts]$ perl fcsthvalueso2.pl
Global symbol "$td" requires explicit package name at fcsthvalueso2.pl line 28.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 36477132
Oops - make the changes below and it should work:

my @row; # line 20
@row = ($yr, $mon, $day, $ts, $_); # line 28
0
 

Author Comment

by:libertyforall2
ID: 36477223
I'm using this script

#!/usr/local/bin/perl

use strict;
use warnings;
use List::Util qw(max);

# change these as needed
my $in_dir = '/share/huina/rhuff/forecastfiles/so2b';
my $out = '/share/huina/rhuff/forecastfiles/so2c.txt';

opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

my %data;

foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    my $curr = 0;
    my @row; 
    while (<IN>) {
        chomp;
        my ($dt, $ts, @vals) = split /\s+/;
        my $max = max @vals;
        if ($max > $curr) {
            $curr = $max;
            my ($mon, $day, $yr) = split /-/, $dt;
            @row = ($yr, $mon, $day, $ts, $_); 
        }
    }
    close IN;
    if (exists $data{$row[0]}{$row[1]}{$row[2]}{$row[3]}) {
        push @{$data{$row[0]}{$row[1]}{$row[2]}{$row[3]}}, $row[4];
    } else {
        $data{$row[0]}{$row[1]}{$row[2]}{$row[3]} = [$row[4]];
    }
}

# output each row from input files into output file in chronological order
open OUT, '>', $out or die "could not write $out: $!";
foreach my $yr (sort { $a <=> $b } keys %data) {
    foreach my $mon (sort { $a <=> $b } keys %{$data{$yr}}) {
        foreach my $day (sort { $a <=> $b } keys %{$data{$yr}{$mon}}) {
            foreach my $ts (sort keys %{$data{$yr}{$mon}{$day}}) {
                print join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n";
            }
        }
    }
}
close OUT;

Open in new window


I am getting this error message and output. It produces a blank file as well.

[rhuff@huina ~/scripts]$ perl so2highest.pl
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
0
 

Author Closing Comment

by:libertyforall2
ID: 36477238
Still getting error messages but command line output was sufficient to create file by copy and past
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 36477782
Oops - the empty file is due to another minor error (on line 45 this time) - it should be:

print OUT join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n"; # missing OUT

I have no idea why you're getting warnings.  At least when I run it against a directory containing just the sample file you list, I get no warnings.  Are there files in the $in_dir directory that are not in the specified format?  If so, is there a name consistency that would allow only selecting the valid files?  Do some of the files have empty lines?
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to protecting Oracle Database servers and systems, there are a ton of myths out there. Here are the most common.
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…

710 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question