Link to home
Start Free TrialLog in
Avatar of libertyforall2
libertyforall2Flag for United States of America

asked on

Delete some rows and columns and create output file using shell script or perl script

I have a file attached it looks like this below. It may also contain points with 0.00 in addition to just whole numbers. I simply want to

1) keep only the rows 10 -19 (in other words ignore the header row and first 8 rows of data, keep the next 8 rows of data, and ignore any rows after that) In this case the first row of data to be kept would be 08/14/2011 11 8 14 2 14 3 0 0 0 0 1 0 14 0 0 3 0

2. keep only columns 1 (date stamp), 11 (corresponds to header S00000004), 12 (corresponds to header S000000034), 14 (corresponds to header S000000075), & 17 (corresponds to header S000000045). The new line ignoring all other rows would look like this

08/14/2011 0 1 0 14

3. Add a hardcoded stampe of 00:00:00 after the day and change the date format to hyphens instead of slashes. The new line would look like this

08-14-2011 00:00:00 0 1 0 14

4. output results to a separate file and perform that function on ALL files in a directory. Lets call the directory /path2/so2 and output a modified file for each input file. The output files would have the same name as the input files but just have a different path at /path1 instead of the input files

sampleinputfile

/path2/samplefile.txt

     JDAY  YR  MO DA1 HR1 DA2 HR2 S00000037 S00000021 S00000002 S00000004 S00000034 S00000035 S00000075 S00000038 S00000044 S00000045 S00000046
08/13/2011 11 8 13 2 13 3 0 0 0 0 2 0 21 0 0 0 0
08/13/2011 11 8 13 5 13 6 0 0 0 0 6 0 0 0 0 0 0
08/13/2011 11 8 13 8 13 9 0 0 0 0 1 0 0 0 0 0 0
08/13/2011 11 8 13 11 13 12 0 0 0 0 49 0 0 0 0 0 0
08/13/2011 11 8 13 14 13 15 0 0 0 0 15 0 0 0 0 1 0
08/13/2011 11 8 13 17 13 18 0 0 0 0 11 0 0 0 0 6 0
08/13/2011 11 8 13 20 13 21 0 0 0 0 56 0 0 0 0 1 0
08/13/2011 11 8 13 23 14 0 0 0 0 0 13 0 0 0 0 9 0
08/14/2011 11 8 14 2 14 3 0 0 0 0 1 0 14 0 0 3 0
08/14/2011 11 8 14 5 14 6 0 0 0 0 10 0 14 0 0 16 0
08/14/2011 11 8 14 8 14 9 0 0 0 0 8 0 1 0 0 7 0
08/14/2011 11 8 14 11 14 12 0 0 0 0 2 0 0 0 0 0 0
08/14/2011 11 8 14 14 14 15 0 0 0 0 7 0 0 0 0 0 0
08/14/2011 11 8 14 17 14 18 0 0 0 0 30 0 0 0 0 0 0
08/14/2011 11 8 14 20 14 21 0 0 0 0 10 0 0 0 0 1 0
08/14/2011 11 8 14 23 15 0 0 0 0 0 6 0 0 0 0 23 0
08/15/2011 11 8 15 2 15 3 0 0 0 0 5 0 0 0 0 3 0
08/15/2011 11 8 15 5 15 6 0 0 0 0 13 0 0 0 0 1 0
08/15/2011 11 8 15 8 15 9 0 0 0 0 1 0 0 0 0 0 0
08/15/2011 11 8 15 10 15 11 0 0 0 0 23 0 0 0 0 0 0

outputfile should look like this

/path1/samplefile.txt

08-14-2011 00:00:00 0 1 0 14
08-14-2011 00:00:00 0 10 0 7
08-14-2011 00:00:00 0 8 0 3
08-14-2011 00:00:00 0 2 0 3
08-14-2011 00:00:00 0 7 0 3
08-14-2011 00:00:00 0 30 0 3
08-14-2011 00:00:00 0 10 0 1
08-14-2011 00:00:00 0 6 0 23
samplefile.txt
Avatar of wilcoxon
wilcoxon
Flag of United States of America image

This should do what you want.  Let me know if there are any issues...

You say lines 10-19 but say 8 lines (10-19 would be 10 lines).

I set a bunch of vars at the top of the script you can alter to change dirs, lines to keep, cols to keep, etc.
#!/usr/local/bin/perl

use strict;
use warnings;

# change these to suit
my $in_dir = '/path1';
my $out_dir = '/path2/so2';
my $min_line = 10;
my $max_line = 19;
my @cols = (0, 10, 11, 13, 16); # 0-offset rather than 1-offset

# get all files in $in_dir
opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

# loop over files
foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    open OUT, '>', "$out_dir/$fil" or die "could not write $out_dir/$fil: $!";
    while (<IN>) {
        last if ($. > $max_line);
        next if ($. < $min_line);
        chomp;
        # get only the cols we want
        my @vals = (split /\s+/)[@cols];
        # add the hard-coded timestamp
        splice @vals, 1, 0, '00:00:00';
        print OUT join(' ', @vals), "\n";
    }
    close OUT;
    close IN;
}

Open in new window

Avatar of libertyforall2

ASKER

Almost. It left the time stamp with slashes instead of hyphens time should look like 08-27-2011 instead of 08/27/2011
ASKER CERTIFIED SOLUTION
Avatar of wilcoxon
wilcoxon
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Great!