Solved

Delete some rows and columns and create output file using shell script or perl script

Posted on 2011-09-02
4
285 Views
Last Modified: 2012-05-12
I have a file attached it looks like this below. It may also contain points with 0.00 in addition to just whole numbers. I simply want to

1) keep only the rows 10 -19 (in other words ignore the header row and first 8 rows of data, keep the next 8 rows of data, and ignore any rows after that) In this case the first row of data to be kept would be 08/14/2011 11 8 14 2 14 3 0 0 0 0 1 0 14 0 0 3 0

2. keep only columns 1 (date stamp), 11 (corresponds to header S00000004), 12 (corresponds to header S000000034), 14 (corresponds to header S000000075), & 17 (corresponds to header S000000045). The new line ignoring all other rows would look like this

08/14/2011 0 1 0 14

3. Add a hardcoded stampe of 00:00:00 after the day and change the date format to hyphens instead of slashes. The new line would look like this

08-14-2011 00:00:00 0 1 0 14

4. output results to a separate file and perform that function on ALL files in a directory. Lets call the directory /path2/so2 and output a modified file for each input file. The output files would have the same name as the input files but just have a different path at /path1 instead of the input files

sampleinputfile

/path2/samplefile.txt

     JDAY  YR  MO DA1 HR1 DA2 HR2 S00000037 S00000021 S00000002 S00000004 S00000034 S00000035 S00000075 S00000038 S00000044 S00000045 S00000046
08/13/2011 11 8 13 2 13 3 0 0 0 0 2 0 21 0 0 0 0
08/13/2011 11 8 13 5 13 6 0 0 0 0 6 0 0 0 0 0 0
08/13/2011 11 8 13 8 13 9 0 0 0 0 1 0 0 0 0 0 0
08/13/2011 11 8 13 11 13 12 0 0 0 0 49 0 0 0 0 0 0
08/13/2011 11 8 13 14 13 15 0 0 0 0 15 0 0 0 0 1 0
08/13/2011 11 8 13 17 13 18 0 0 0 0 11 0 0 0 0 6 0
08/13/2011 11 8 13 20 13 21 0 0 0 0 56 0 0 0 0 1 0
08/13/2011 11 8 13 23 14 0 0 0 0 0 13 0 0 0 0 9 0
08/14/2011 11 8 14 2 14 3 0 0 0 0 1 0 14 0 0 3 0
08/14/2011 11 8 14 5 14 6 0 0 0 0 10 0 14 0 0 16 0
08/14/2011 11 8 14 8 14 9 0 0 0 0 8 0 1 0 0 7 0
08/14/2011 11 8 14 11 14 12 0 0 0 0 2 0 0 0 0 0 0
08/14/2011 11 8 14 14 14 15 0 0 0 0 7 0 0 0 0 0 0
08/14/2011 11 8 14 17 14 18 0 0 0 0 30 0 0 0 0 0 0
08/14/2011 11 8 14 20 14 21 0 0 0 0 10 0 0 0 0 1 0
08/14/2011 11 8 14 23 15 0 0 0 0 0 6 0 0 0 0 23 0
08/15/2011 11 8 15 2 15 3 0 0 0 0 5 0 0 0 0 3 0
08/15/2011 11 8 15 5 15 6 0 0 0 0 13 0 0 0 0 1 0
08/15/2011 11 8 15 8 15 9 0 0 0 0 1 0 0 0 0 0 0
08/15/2011 11 8 15 10 15 11 0 0 0 0 23 0 0 0 0 0 0

outputfile should look like this

/path1/samplefile.txt

08-14-2011 00:00:00 0 1 0 14
08-14-2011 00:00:00 0 10 0 7
08-14-2011 00:00:00 0 8 0 3
08-14-2011 00:00:00 0 2 0 3
08-14-2011 00:00:00 0 7 0 3
08-14-2011 00:00:00 0 30 0 3
08-14-2011 00:00:00 0 10 0 1
08-14-2011 00:00:00 0 6 0 23
samplefile.txt
0
Comment
Question by:libertyforall2
  • 2
  • 2
4 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 36476331
This should do what you want.  Let me know if there are any issues...

You say lines 10-19 but say 8 lines (10-19 would be 10 lines).

I set a bunch of vars at the top of the script you can alter to change dirs, lines to keep, cols to keep, etc.
#!/usr/local/bin/perl

use strict;
use warnings;

# change these to suit
my $in_dir = '/path1';
my $out_dir = '/path2/so2';
my $min_line = 10;
my $max_line = 19;
my @cols = (0, 10, 11, 13, 16); # 0-offset rather than 1-offset

# get all files in $in_dir
opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

# loop over files
foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    open OUT, '>', "$out_dir/$fil" or die "could not write $out_dir/$fil: $!";
    while (<IN>) {
        last if ($. > $max_line);
        next if ($. < $min_line);
        chomp;
        # get only the cols we want
        my @vals = (split /\s+/)[@cols];
        # add the hard-coded timestamp
        splice @vals, 1, 0, '00:00:00';
        print OUT join(' ', @vals), "\n";
    }
    close OUT;
    close IN;
}

Open in new window

0
 

Author Comment

by:libertyforall2
ID: 36476373
Almost. It left the time stamp with slashes instead of hyphens time should look like 08-27-2011 instead of 08/27/2011
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 36476895
Sorry.  Forgot to do that.
#!/usr/local/bin/perl

use strict;
use warnings;

# change these to suit
my $in_dir = '/path1';
my $out_dir = '/path2/so2';
my $min_line = 10;
my $max_line = 19;
my @cols = (0, 10, 11, 13, 16); # 0-offset rather than 1-offset

# get all files in $in_dir
opendir DIR, $in_dir or die "could not open dir $in_dir: $!";
my @files = grep { -f "$in_dir/$_" } readdir DIR;
closedir DIR;

# loop over files
foreach my $fil (@files) {
    open IN, "$in_dir/$fil" or die "could not open $in_dir/$fil: $!";
    open OUT, '>', "$out_dir/$fil" or die "could not write $out_dir/$fil: $!";
    while (<IN>) {
        last if ($. > $max_line);
        next if ($. < $min_line);
        chomp;
        # get only the cols we want
        my @vals = (split /\s+/)[@cols];
        # change / to - in timestamp
        $vals[0] =~ s{/}{-}g;
        # add the hard-coded timestamp
        splice @vals, 1, 0, '00:00:00';
        print OUT join(' ', @vals), "\n";
    }
    close OUT;
    close IN;
}

Open in new window

0
 

Author Closing Comment

by:libertyforall2
ID: 36477030
Great!
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Set OWA language and time zone in Exchange for individuals, all users or per database.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question