Solved

Select highest value in each column and delete all other values in a column of a file using shell or perl

Posted on 2011-09-02
7
527 Views
Last Modified: 2012-05-12
Ok. I only want to do one thing then make an output file based on the results.

If I have files in a directory with whole numbers or numbers rounded to the nearest hundredth, I want to locate the highest value in each column and delete all other values leaving me with a file that has only one row of data.

sampleinput file

/path1/samplefile.txt

08-14-2011 00:00:00 0 1 0 14
08-14-2011 00:00:00 0 10 0 7
08-14-2011 00:00:00 0 8 0 3
08-14-2011 00:00:00 0 2 0 3
08-14-2011 00:00:00 0 7 0 3
08-14-2011 00:00:00 0 30 0 3
08-14-2011 00:00:00 0 10 0 1
08-14-2011 00:00:00 0 6 0 23

sample output file

/path2/sampleoutputfile.txt

08-14-2011 00:00:00 0 30 0 23

There would be a row for each file. There would be a single file with all output rows in chrono order
0
Comment
Question by:libertyforall2
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
7 Comments
 
LVL 17

Expert Comment

by:Kent Dyer
ID: 36476837
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 36476932
This should do what you want in perl...
#!/usr/local/bin/perl

use strict;
use warnings;
use List::Util qw(max);

# change these as needed
my $in_dir = '/path1';
my $out = '/path2/sampleoutputfile.txt';

opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

my %data;

foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    my $curr = 0;
    my $row;
    while (<IN>) {
        chomp;
        my ($dt, $ts, @vals) = split /\s+/;
        my $max = max @vals;
        if ($max > $curr) {
            $curr = $max;
            my ($mon, $day, $yr) = split /-/, $dt;
            $row = [$yr, $mon, $day, $td, $_];
        }
    }
    close IN;
    if (exists $data{$row[0]}{$row[1]}{$row[2]}{$row[3]}) {
        push @{$data{$row[0]}{$row[1]}{$row[2]}{$row[3]}}, $row[4];
    } else {
        $data{$row[0]}{$row[1]}{$row[2]}{$row[3]} = [$row[4]];
    }
}

# output each row from input files into output file in chronological order
open OUT, '>', $out or die "could not write $out: $!";
foreach my $yr (sort { $a <=> $b } keys %data) {
    foreach my $mon (sort { $a <=> $b } keys %{$data{$yr}}) {
        foreach my $day (sort { $a <=> $b } keys %{$data{$yr}{$mon}}) {
            foreach my $ts (sort keys %{$data{$yr}{$mon}{$day}}) {
                print join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n";
            }
        }
    }
}
close OUT;

Open in new window

0
 

Author Comment

by:libertyforall2
ID: 36476992
I got these error messages.

[rhuff@huina ~/scripts]$ perl fcstvaluesso2.pl
Can't open perl script "fcstvaluesso2.pl": No such file or directory
[rhuff@huina ~/scripts]$ perl fcsthvalueso2.pl
Global symbol "$td" requires explicit package name at fcsthvalueso2.pl line 28.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 32.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 33.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line 35.
Global symbol "@row" requires explicit package name at fcsthvalueso2.pl line
0
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 36477132
Oops - make the changes below and it should work:

my @row; # line 20
@row = ($yr, $mon, $day, $ts, $_); # line 28
0
 

Author Comment

by:libertyforall2
ID: 36477223
I'm using this script

#!/usr/local/bin/perl

use strict;
use warnings;
use List::Util qw(max);

# change these as needed
my $in_dir = '/share/huina/rhuff/forecastfiles/so2b';
my $out = '/share/huina/rhuff/forecastfiles/so2c.txt';

opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

my %data;

foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    my $curr = 0;
    my @row; 
    while (<IN>) {
        chomp;
        my ($dt, $ts, @vals) = split /\s+/;
        my $max = max @vals;
        if ($max > $curr) {
            $curr = $max;
            my ($mon, $day, $yr) = split /-/, $dt;
            @row = ($yr, $mon, $day, $ts, $_); 
        }
    }
    close IN;
    if (exists $data{$row[0]}{$row[1]}{$row[2]}{$row[3]}) {
        push @{$data{$row[0]}{$row[1]}{$row[2]}{$row[3]}}, $row[4];
    } else {
        $data{$row[0]}{$row[1]}{$row[2]}{$row[3]} = [$row[4]];
    }
}

# output each row from input files into output file in chronological order
open OUT, '>', $out or die "could not write $out: $!";
foreach my $yr (sort { $a <=> $b } keys %data) {
    foreach my $mon (sort { $a <=> $b } keys %{$data{$yr}}) {
        foreach my $day (sort { $a <=> $b } keys %{$data{$yr}{$mon}}) {
            foreach my $ts (sort keys %{$data{$yr}{$mon}{$day}}) {
                print join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n";
            }
        }
    }
}
close OUT;

Open in new window


I am getting this error message and output. It produces a blank file as well.

[rhuff@huina ~/scripts]$ perl so2highest.pl
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 35.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 32.
Use of uninitialized value in exists at so2highest.pl line 32.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
Use of uninitialized value in hash element at so2highest.pl line 33.
0
 

Author Closing Comment

by:libertyforall2
ID: 36477238
Still getting error messages but command line output was sufficient to create file by copy and past
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 36477782
Oops - the empty file is due to another minor error (on line 45 this time) - it should be:

print OUT join("\n", @{$data{$yr}{$mon}{$day}{$ts}}), "\n"; # missing OUT

I have no idea why you're getting warnings.  At least when I run it against a directory containing just the sample file you list, I get no warnings.  Are there files in the $in_dir directory that are not in the specified format?  If so, is there a name consistency that would allow only selecting the valid files?  Do some of the files have empty lines?
0

Featured Post

Get MySQL database support online, now!

At Percona’s web store you can order your MySQL database support needs in minutes. No hassles, no fuss, just pick and click. Pay online with a credit card.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Azure Functions is a solution for easily running small pieces of code, or "functions," in the cloud. This article shows how to create one of these functions to write directly to Azure Table Storage.
Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Six Sigma Control Plans

626 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question