Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 294
  • Last Modified:

Delete some rows and columns and create output file using shell script or perl script

I have a file attached it looks like this below. It may also contain points with 0.00 in addition to just whole numbers. I simply want to

1) keep only the rows 10 -19 (in other words ignore the header row and first 8 rows of data, keep the next 8 rows of data, and ignore any rows after that) In this case the first row of data to be kept would be 08/14/2011 11 8 14 2 14 3 0 0 0 0 1 0 14 0 0 3 0

2. keep only columns 1 (date stamp), 11 (corresponds to header S00000004), 12 (corresponds to header S000000034), 14 (corresponds to header S000000075), & 17 (corresponds to header S000000045). The new line ignoring all other rows would look like this

08/14/2011 0 1 0 14

3. Add a hardcoded stampe of 00:00:00 after the day and change the date format to hyphens instead of slashes. The new line would look like this

08-14-2011 00:00:00 0 1 0 14

4. output results to a separate file and perform that function on ALL files in a directory. Lets call the directory /path2/so2 and output a modified file for each input file. The output files would have the same name as the input files but just have a different path at /path1 instead of the input files

sampleinputfile

/path2/samplefile.txt

     JDAY  YR  MO DA1 HR1 DA2 HR2 S00000037 S00000021 S00000002 S00000004 S00000034 S00000035 S00000075 S00000038 S00000044 S00000045 S00000046
08/13/2011 11 8 13 2 13 3 0 0 0 0 2 0 21 0 0 0 0
08/13/2011 11 8 13 5 13 6 0 0 0 0 6 0 0 0 0 0 0
08/13/2011 11 8 13 8 13 9 0 0 0 0 1 0 0 0 0 0 0
08/13/2011 11 8 13 11 13 12 0 0 0 0 49 0 0 0 0 0 0
08/13/2011 11 8 13 14 13 15 0 0 0 0 15 0 0 0 0 1 0
08/13/2011 11 8 13 17 13 18 0 0 0 0 11 0 0 0 0 6 0
08/13/2011 11 8 13 20 13 21 0 0 0 0 56 0 0 0 0 1 0
08/13/2011 11 8 13 23 14 0 0 0 0 0 13 0 0 0 0 9 0
08/14/2011 11 8 14 2 14 3 0 0 0 0 1 0 14 0 0 3 0
08/14/2011 11 8 14 5 14 6 0 0 0 0 10 0 14 0 0 16 0
08/14/2011 11 8 14 8 14 9 0 0 0 0 8 0 1 0 0 7 0
08/14/2011 11 8 14 11 14 12 0 0 0 0 2 0 0 0 0 0 0
08/14/2011 11 8 14 14 14 15 0 0 0 0 7 0 0 0 0 0 0
08/14/2011 11 8 14 17 14 18 0 0 0 0 30 0 0 0 0 0 0
08/14/2011 11 8 14 20 14 21 0 0 0 0 10 0 0 0 0 1 0
08/14/2011 11 8 14 23 15 0 0 0 0 0 6 0 0 0 0 23 0
08/15/2011 11 8 15 2 15 3 0 0 0 0 5 0 0 0 0 3 0
08/15/2011 11 8 15 5 15 6 0 0 0 0 13 0 0 0 0 1 0
08/15/2011 11 8 15 8 15 9 0 0 0 0 1 0 0 0 0 0 0
08/15/2011 11 8 15 10 15 11 0 0 0 0 23 0 0 0 0 0 0

outputfile should look like this

/path1/samplefile.txt

08-14-2011 00:00:00 0 1 0 14
08-14-2011 00:00:00 0 10 0 7
08-14-2011 00:00:00 0 8 0 3
08-14-2011 00:00:00 0 2 0 3
08-14-2011 00:00:00 0 7 0 3
08-14-2011 00:00:00 0 30 0 3
08-14-2011 00:00:00 0 10 0 1
08-14-2011 00:00:00 0 6 0 23
samplefile.txt
0
libertyforall2
Asked:
libertyforall2
  • 2
  • 2
1 Solution
 
wilcoxonCommented:
This should do what you want.  Let me know if there are any issues...

You say lines 10-19 but say 8 lines (10-19 would be 10 lines).

I set a bunch of vars at the top of the script you can alter to change dirs, lines to keep, cols to keep, etc.
#!/usr/local/bin/perl

use strict;
use warnings;

# change these to suit
my $in_dir = '/path1';
my $out_dir = '/path2/so2';
my $min_line = 10;
my $max_line = 19;
my @cols = (0, 10, 11, 13, 16); # 0-offset rather than 1-offset

# get all files in $in_dir
opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

# loop over files
foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    open OUT, '>', "$out_dir/$fil" or die "could not write $out_dir/$fil: $!";
    while (<IN>) {
        last if ($. > $max_line);
        next if ($. < $min_line);
        chomp;
        # get only the cols we want
        my @vals = (split /\s+/)[@cols];
        # add the hard-coded timestamp
        splice @vals, 1, 0, '00:00:00';
        print OUT join(' ', @vals), "\n";
    }
    close OUT;
    close IN;
}

Open in new window

0
 
libertyforall2Author Commented:
Almost. It left the time stamp with slashes instead of hyphens time should look like 08-27-2011 instead of 08/27/2011
0
 
wilcoxonCommented:
Sorry.  Forgot to do that.
#!/usr/local/bin/perl

use strict;
use warnings;

# change these to suit
my $in_dir = '/path1';
my $out_dir = '/path2/so2';
my $min_line = 10;
my $max_line = 19;
my @cols = (0, 10, 11, 13, 16); # 0-offset rather than 1-offset

# get all files in $in_dir
opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

# loop over files
foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    open OUT, '>', "$out_dir/$fil" or die "could not write $out_dir/$fil: $!";
    while (<IN>) {
        last if ($. > $max_line);
        next if ($. < $min_line);
        chomp;
        # get only the cols we want
        my @vals = (split /\s+/)[@cols];
        # change / to - in timestamp
        $vals[0] =~ s{/}{-}g;
        # add the hard-coded timestamp
        splice @vals, 1, 0, '00:00:00';
        print OUT join(' ', @vals), "\n";
    }
    close OUT;
    close IN;
}

Open in new window

0
 
libertyforall2Author Commented:
Great!
0

Featured Post

Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now