Solved

Use Perl to find all files in a directory starting with this word then take out date & two data point for each file

Posted on 2011-03-14
17
256 Views
Last Modified: 2012-05-11
I have a series of files that start with kilauea in a directory. They have emissions values for two vent locations. the files contain 3 lines of text (1 line of headers and two lines of data) that looks like this below.

SHIPDATE,STATION,BEGIN,END,DELTADAYS,NOBS,MEAN,STDEV
2011-02-03 04:00:13,summit,2011-01-21 20:05:00,2011-01-31 20:16:00,10,6,686,191
2011-02-03 04:00:13,east_rift,2011-01-22 00:50:49,2011-01-22 00:50:49,0,1,312,0

I have to create a single output file that looks at the SHIPDATE, STATION, & MEAN then creates a single line for each file. In the file above the output line would look like this below. Summit station mean would be column 2 and east_rift (or whatever name in row 3) would be column 3 in the output line. Its important NOT to use the actual names of the station but rather the location of the data line in the file since the names of the vents may change.

2011-02-03 686 312

Then a similar line would be created for each file and put into a single master file called emissionsmaster.txt I would then be able to import the emissionsmaster.txt file for use in graphical programs or excel to generate charts showing fluctuations of emissions rates based on archived data. Thanks.

0
Comment
Question by:libertyforall2
  • 9
  • 8
17 Comments
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35132679
Based on your description, this should work...

I notice the file has timestamp on the date but your sample output is only the date.  The code I provided uses only the data but will end up with more values than 2 if there are multiple files with the same date.  If you want to include the time, simply remove the s{}{} regex with the comment about removing the time.
#!/usr/local/bin/perl

use strict;
use warnings;

my $dir = shift || '.';
opendir DIR, $dir or die "could not open dir $dir: $!";
my @files = grep m{^kilauea}, readdir DIR;
closedir DIR;

my %data;
foreach my $fil (@files) {
    open IN, $fil or die "could not open $file: $!";
    while (<IN>) {
        chomp;
        next if m{^SHIPDATE}; # skip header row
        my @vals = split /,/;
        $vals[0] =~ s{\s.*$}{}; # remove time
        $data{$vals[0]} = [] unless exists($data{$vals[0]});
        push @{$data{$vals[0]}}, $vals[-2];
    }
    close IN;
}

open OUT, '>emissionsmaster.txt' or die "could not write emissionsmaster.txt: $!";
foreach my $dt (sort keys %data) {
    print OUT "$dt @{$data{$dt}}\n";
}
close OUT;

Open in new window

0
 

Author Comment

by:libertyforall2
ID: 35404719
Still doesn't seem to work
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35406298
In what way doesn't it work?  More information would help me to help you.
0
 

Author Comment

by:libertyforall2
ID: 35836650
OK. It was my error. I needed commas between some of the columns as in the format below.

2011-05-15 02:00:00, 375, 250
2011-05-15 02:00:00, 375, 250
2011-05-15 14:00:00, 375, 250
2011-05-16 02:00:00, 375, 250
2011-05-16 14:00:00, 375, 250
2011-05-17 02:00:00, 375, 250
2011-05-17 14:00:00, 375, 250
2011-05-18 02:00:00, 375, 250
2011-05-18 14:00:00, 375, 250
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35836711
This should produce that output (datetime and commas in between fields)...
#!/usr/local/bin/perl

use strict;
use warnings;

my $dir = shift || '.';
opendir DIR, $dir or die "could not open dir $dir: $!";
my @files = grep m{^kilauea}, readdir DIR;
closedir DIR;

my %data;
foreach my $fil (@files) {
    open IN, $fil or die "could not open $file: $!";
    while (<IN>) {
        chomp;
        next if m{^SHIPDATE}; # skip header row
        my @vals = split /,/;
#        $vals[0] =~ s{\s.*$}{}; # remove time
        $data{$vals[0]} = [] unless exists($data{$vals[0]});
        push @{$data{$vals[0]}}, $vals[-2];
    }
    close IN;
}

open OUT, '>emissionsmaster.txt' or die "could not write emissionsmaster.txt: $!";
foreach my $dt (sort keys %data) {
    print OUT join(', ', $dt, @{$data{$dt}}), "\n";
}
close OUT;

Open in new window

0
 

Author Comment

by:libertyforall2
ID: 35883801
I used the above script and got this message.


uila% perl emissions.pl
Global symbol "$file" requires explicit package name at emissions.pl line 13.
Global symbol "$file" requires explicit package name at emissions.pl line 13.
Execution of emissions.pl aborted due to compilation errors.
uila%

There is another change I wish to make however. I want to look at only a single file. Lets call it file1.txt it will output the line in the same file emissionsmaster.txt however

1. it will simply add the line at the bottom of the file if there are existing lines within the file.
2. it should change the time stamp too show  todays date  and the hour min sec 14:00:00 if the date the perl script is executed is after 14:00:00 or 02:00:00 is before 14:00:00
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35897535
This should correct the typo I made and handle a single file and #1.

For #2, just to verify, it should modify the time stamp to todays date (always) and to either 14:00 or 02:00 depending on if it is before (02:00) or after (14:00) 14:00.  So 2011-05-15 13:23:52 would be updated to 2011-06-02 02:00:00, correct?

Does this also mean that you no longer need the entries sorted by time stamp?
#!/usr/local/bin/perl

use strict;
use warnings;

#my $dir = shift || '.';
#opendir DIR, $dir or die "could not open dir $dir: $!";
#my @files = grep m{^kilauea}, readdir DIR;
#closedir DIR;

my $fil = shift or die "Usage: $0 filename\n";

my %data;
#foreach my $fil (@files) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        next if m{^SHIPDATE}; # skip header row
        my @vals = split /,/;
#        $vals[0] =~ s{\s.*$}{}; # remove time
        $data{$vals[0]} = [] unless exists($data{$vals[0]});
        push @{$data{$vals[0]}}, $vals[-2];
    }
    close IN;
#}

open OUT, '>>emissionsmaster.txt' or die "could not append emissionsmaster.txt: $!";
foreach my $dt (sort keys %data) {
    print OUT join(', ', $dt, @{$data{$dt}}), "\n";
}
close OUT;

Open in new window

0
 

Author Comment

by:libertyforall2
ID: 35912414
Yes & Yes. I've written some code to allow for a single file to be read. It then would read the single file that looks like the intial file and then output the line of code in an output file that has additional lines it based on time and date. I'll test out the code.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:libertyforall2
ID: 35912426
where within the script do I actually change to $dir variable and let in know the actual directory pathway?
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35916885
Since you asked it be changed to only read a single file, I changed the script to be called as:

script.pl /path/to/file

(eg the filename passed in should contain the path to the file).  If emissionmaster.txt is not in the current directory, you can change line 27 to be "open OUT, '>>/path/to/emissionsmaster.txt' ..." to make sure it writes to the correct location.

If you always want to read from the same directory and just supply the raw filename, then you could change line 11 to be:

my $dir = '/path/to/files';
my $fil = shift or die "Usage: $0 filename\n";
$fil = "$dir/$fil";

in which case you could also change line 27 to this (if it is the same directory where the files should be read from):

open OUT, ">>$dir/emissionsmaster.txt" or die "could not append emissionsmaster.txt: $!";
0
 

Author Comment

by:libertyforall2
ID: 35962632

#!/usr/local/bin/perl

use strict;
use warnings;

#my $dir = shift || '.';
#opendir DIR, $dir or die "could not open dir $dir: $!";
#my @files = grep m{^kilauea}, readdir DIR;
#closedir DIR;

my $dir = '/share/huina/rhuff/scripts/emmissions';
my $fil = shift or die "Usage: $0 filename\n";
$fil = "$dir/$fil";

my %data;
#foreach my $fil (@files) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        next if m{^SHIPDATE}; # skip header row
        my @vals = split /,/;
#        $vals[0] =~ s{\s.*$}{}; # remove time
        $data{$vals[0]} = [] unless exists($data{$vals[0]});
        push @{$data{$vals[0]}}, $vals[-2];
    }
    close IN;
#}

open OUT, '>>/share/huina/rhuff/scripts/emmissions/emissionsmaster.txt' or die "could not append emissionsmaster.txt: $!";
foreach my $dt (sort keys %data) {
    print OUT join(', ', $dt, @{$data{$dt}}), "\n";
}
close OUT;

Open in new window



I used the script above and got the message below when I tried to execute the perl script

[rhuff@huina emmissions]$ perl updatemissionsfile.pl
Can't open perl script "updatemissionsfile.pl": No such file or directory
[rhuff@huina emmissions]$ perl updateemissionsfile.pl
Usage: updateemissionsfile.pl filename
[rhuff@huina emmissions]$
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35963453
Yes.  As I said in my last comment, you need to give it the filename to read on the command line so you need to call it as:

perl updateemissions.pl input_filename

If this is not what you want, how do you want it to work?  Should it read the (arbitrary) first file from the dir that it was reading all files from or something else?
0
 

Author Comment

by:libertyforall2
ID: 35963562
Ok. I ran the script. It created the file, however the time stamp was the same stamp in the file. It didn't create a time stamp based on my request. If the script is run today before noon it should be yesterday's data at 14:00:00 if noon or later it should be todays date with a time stamp of 02:00:00 so the file it read showed 2011-06-03 04:00:18, 569, 716 for the output when it should have read
2011-06-13 02:00:00, 569, 716
0
 
LVL 26

Expert Comment

by:wilcoxon
ID: 35963650
Okay.  Here's code revised to update the dates...

I didn't test it (I'm currently having PC problems) so let me know if it doesn't work and I'll correct it.
#!/usr/local/bin/perl

use strict;
use warnings;

#my $dir = shift || '.';
#opendir DIR, $dir or die "could not open dir $dir: $!";
#my @files = grep m{^kilauea}, readdir DIR;
#closedir DIR;

my $dir = '/share/huina/rhuff/scripts/emmissions';
my $fil = shift or die "Usage: $0 filename\n";
$fil = "$dir/$fil";

my %data;
#foreach my $fil (@files) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        next if m{^SHIPDATE}; # skip header row
        my @vals = split /,/;
        # get current date/time
        my @dt = (localtime)[5,4,3,2,1,0];
        $dt[0] += 1900;
        $dt[1]++;
        # change date/time based on current date/time
        if ($dt[3] < 12) {
            @dt[3,4,5] = (14, 0, 0);
            $dt[2]--;
            if ($dt[2] < 1) {
                $dt[1]--;
                if ($dt[1] < 1) {
                    $dt[0]--;
                    $dt[1] = 12;
                }
                $dt[2] = $days[$dt[1]];
            }
        } else {
            @dt[3,4,5] = (2, 0, 0);
        }
        $vals[0] = sprintf '%d-%02d-%02d %02d:%02d:%02d', @dt;
        # go back to regularly scheduled code
        $data{$vals[0]} = [] unless exists($data{$vals[0]});
        push @{$data{$vals[0]}}, $vals[-2];
    }
    close IN;
#}

open OUT, '>>/share/huina/rhuff/scripts/emmissions/emissionsmaster.txt' or die "could not append emissionsmaster.txt: $!";
foreach my $dt (sort keys %data) {
    print OUT join(', ', $dt, @{$data{$dt}}), "\n";
}
close OUT;

Open in new window

0
 

Author Comment

by:libertyforall2
ID: 35969604
I got this error message.

[rhuff@huina emmissions]$ perl updateemissionsfile.pl kilauea.txt
Global symbol "@days" requires explicit package name at updateemissionsfile.pl line 36.
0
 
LVL 26

Accepted Solution

by:
wilcoxon earned 500 total points
ID: 35970777
oops.  Sorry.  It does help if I defined @days.
#!/usr/local/bin/perl

use strict;
use warnings;

my @days = (undef, 31, undef, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31);

#my $dir = shift || '.';
#opendir DIR, $dir or die "could not open dir $dir: $!";
#my @files = grep m{^kilauea}, readdir DIR;
#closedir DIR;

my $dir = '/share/huina/rhuff/scripts/emmissions';
my $fil = shift or die "Usage: $0 filename\n";
$fil = "$dir/$fil";

my %data;
#foreach my $fil (@files) {
    open IN, $fil or die "could not open $fil: $!";
    while (<IN>) {
        chomp;
        next if m{^SHIPDATE}; # skip header row
        my @vals = split /,/;
        # get current date/time
        my @dt = (localtime)[5,4,3,2,1,0];
        $dt[0] += 1900;
        $dt[1]++;
        # set Feb days
        if ($dt[0] % 4 == 0 and ($dt[0] % 100 or $dt[0] % 400 == 0)) {
            $days[2] = 29;
        } else {
            $days[2] = 28;
        }
        # change date/time based on current date/time
        if ($dt[3] < 12) {
            @dt[3,4,5] = (14, 0, 0);
            $dt[2]--;
            if ($dt[2] < 1) {
                $dt[1]--;
                if ($dt[1] < 1) {
                    $dt[0]--;
                    $dt[1] = 12;
                }
                $dt[2] = $days[$dt[1]];
            }
        } else {
            @dt[3,4,5] = (2, 0, 0);
        }
        $vals[0] = sprintf '%d-%02d-%02d %02d:%02d:%02d', @dt;
        # go back to regularly scheduled code
        $data{$vals[0]} = [] unless exists($data{$vals[0]});
        push @{$data{$vals[0]}}, $vals[-2];
    }
    close IN;
#}

open OUT, '>>/share/huina/rhuff/scripts/emmissions/emissionsmaster.txt' or die "could not append emissionsmaster.txt: $!";
foreach my $dt (sort keys %data) {
    print OUT join(', ', $dt, @{$data{$dt}}), "\n";
}
close OUT;

Open in new window

0
 

Author Closing Comment

by:libertyforall2
ID: 35986478
Works great! Thanks.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now