Solved

Perl script dropping a data point in output file

Posted on 2010-11-08
30
469 Views
Last Modified: 2012-05-10
Ok. I am using the command and perl script (forecastso2kapolei12.pl) below to convert a time stamp from a file from UTC to local time which is exactly 10 hours earlier. The script also takes only the specified row of data and a single column of data for that row. In the example below, I am looking at row 5 and only want the time stamp, hour, & single data point from column 10 row 5. The output line in the output file should look like this 11-07-2010 5 0.00 but instead looks like this 11-07-2010 0.00 The script also stacks lines when there is more than one input file. The data point that is being dropped is the hour. The hour is in column 7 and again should be converted to 10 hours earlier in the sample attached file, the hour is 15 for column 7 and in the output file should be 5. Of course the time stamp needs to be changed as well if the local time is a day earlier.




perl forecastso2kapolei12.pl /share/huina/rhuff/hysplit/hysplit/kilauea/forecastdata/so2/hysplit.haw.horiz.so2.*.txt >/share/huina/rhuff/validation/forecast/so2/hysplit.haw.horiz.so2.kapolei.hour12.txt

#!/usr/local/bin/perl

$| = 1;

while (my $file = shift(@ARGV)) {
     my @rows = (5);
     my @cols = (8);
     open IN, $file or die "Can't open $file: $!";
     <IN>; #skip header
     my $rndx = 0;
     while (<IN>)
     {
           next if $.<$rows[$rndx];
           @data = split;
           $data[0] =~ s/\//-/g;
           print "$data[0] ";
           print $data[$_-1] foreach @cols;
           print "\n";
           $rndx++;
           last if $rndx>$#rows;
     }
     close IN;
}
hysplit.haw.horiz.so2.2010110712.txt
0
Comment
Question by:libertyforall2
  • 16
  • 13
30 Comments
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
Personally, I think you need to dump that script and start over.

Are you looking for someone to write/modify your script for you, or are you looking for guidance and help in troubleshooting?
0
 

Author Comment

by:libertyforall2
Comment Utility
Well, I could add 7 to the columns which would make it (7, 8) but I would still need a script to change the time stamp and the file is processed.
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility

#!/usr/local/bin/perl



use strict;

use warnings;

use Date::Calc qw(Date_to_Time Time_to_Date);



my $base_dir = '/share/huina/rhuff/hysplit/hysplit/kilauea/forecastdata/so2/';

my $outfile = '/share/huina/rhuff/validation/forecast/so2/hysplit.haw.horiz.so2.kapolei.hour12.txt';



open my $output_fh, '>', $outfile or die "failed to open/create <$outfile> $!";



my $ten_hours = 60*60*10;



foreach my $file ( <$base_dir/hysplit.haw.horiz.so2.*.txt> ) {

    open my $fh, '<', $file or die "failed to open <$file> $!";

    <$fh> for 1..4;

    my $line = <$fh>;  # getting row 5

    close $fh;

    

    my ($date, $hr, $data) = (split(/\s/, $line))[0,6,7];

    $date =~ s~/~-~g;

    

    my ($mo, $day, $yr) = split(/-/, $date);    

    my $time            = Date_to_Time($yr, $mo, $day, $hr, 0, 0);

    ($yr,$mo,$day,$hr)  = Time_to_Date($time - $ten_hours);

    

    print $output_fh "$mo-$day-$yr $hr $data\n";

}

Open in new window

0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
print $output_fh "$mo-$day-$yr $hr $data\n";

could be changed to
print $output_fh "$date $hr $data\n";
0
 

Author Comment

by:libertyforall2
Comment Utility
Ok. This is the output I got.


[rhuff@huina ~/scripts]$ perl forecastso2kapolei12.pl
Can't locate Date/Calc.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at forecastso2kapolei12.pl line 5.
BEGIN failed--compilation aborted at forecastso2kapolei12.pl line 5.
[rhuff@huina ~/scripts]$


Also, I see you moved the rows over from 7,8 to 6,7. I am assuming the 0 & 6 are the time stamps and the 7 is the data column I want pulled. If this is the case, I can simply modify #7 should I need to pull data for a different location. The question I have once I get the above error message resolved is what part of the code do I need to modify If I want a different line of data used. In this case it is 5 but lets assume I want to use 9 or 13, or 17 instead of 5, what part of the code below would I need to change in the code below?

my $line = <$fh>;  # getting row 5
    close $fh;
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
You need to install the Date::Calc module.

If you need to change the desired line number to 13, you'd change the number of lines to skip, which is this line:

<$fh> for 1..4;

changes to:
<$fh> for 1..12;

We could make that as a passed in parameter.

my $lines_to skip = $ARGV[0] || 5;  # defaults to 5 if not passed to the script

<$fh> for 1..$lines_to skip;
0
 

Author Comment

by:libertyforall2
Comment Utility
Ok. Where can I find the Data::Calc module?
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
0
 

Author Comment

by:libertyforall2
Comment Utility
Ok. I really due apologize for being such a newbie. The module is .pm but the install seems to indicate it should be in some type of tar. Can I use a wget to grab the file and then install the module? Would it be possible to install the module with just 2 lines of code wget http://cpansearch.perl.org/src/STBEY/Date-Calc-6.3/lib/Date/Calc.pm and some other line? Please help!! Also, with the module work if it is installed in any directory?

0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
The module needs to be complied/built.

Use wget to download the tar file.
wget http://search.cpan.org/CPAN/authors/id/S/ST/STBEY/Date-Calc-6.3.tar.gz

"untar" it
tar -xzf Date-Calc-6.3.tar.gz

cd into the directory that it created and issue these commands

perl Makefile.PL
make
make install

If you did not receive any errors, then you're ready to use it in your scripts.
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
Another method to install it would be to use the cpan utility.

run this command:
perl -MCPAN -e shell

The first time you use it, it will walk you through a bunch of configuration questions.  Once you're at the cpan shell prompt, simply issue the install command.
cpan[1]> install Date::Calc
0
 

Author Comment

by:libertyforall2
Comment Utility
[rhuff@huina ~/Date-Calc-6.3]$ perl Makefile.PL

*************************************************************
****** BEWARE: Use "make install UNINST=1" to install! ******
*************************************************************

WARNING: META_MERGE is not a known parameter.
WARNING: LICENSE is not a known parameter.
Checking if your kit is complete...
Looks good
Warning: prerequisite Bit::Vector 7.1 not found.
Warning: prerequisite Carp::Clan 6.04 not found.
'LICENSE' is not a known MakeMaker parameter name.
'META_MERGE' is not a known MakeMaker parameter name.
Writing Makefile for Date::Calc
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
Do you have root access on the box?

If not, it would be easier to have the person that does have root access to install the module and its dependencies.

If you want to install it yourself, without root privileges, it will be more complex.  To start with, you'll need to use the PREFIX directive to install the module into a different directory than one of Perl's @INC module directories.

One of the cpan configuration questions that you had to answer was related to prerequisites.  You'll want to set it to "follow" if you're running under root privileges.  

The CPAN module can detect when a module which you are trying to build
depends on prerequisites. If this happens, it can build the
prerequisites for you automatically ('follow'), ask you for
confirmation ('ask'), or just ignore them ('ignore'). Please set your
policy to one of the three values.

 <prerequisites_policy>
Policy on building prerequisites (follow, ask or ignore)? [follow]

0
 

Author Comment

by:libertyforall2
Comment Utility
Ok. I'm going to have to have someone else install this. In the mean time, it would save time to simply do a 2 step process for the data parsing. I should go ahead and convert the data to a single file using the original script with the added 7 column for the hour so I can have the files ready to change the date stamp. If I went ahead and did that with the new file in the format 11-07-2010 5 0.00, How could simply change the data with the module and add zeros to hours less than 10 for the semi parsed files?
0
 

Author Comment

by:libertyforall2
Comment Utility
When I went back to try and add the hour column, It simply attaches column 7 to column 8 in the last column of the output file. If I want to try and prepare the data before the module is installed, how would I need to change the script below, and what would I need to do to covert the new output file If I successfully generate the file in the format 11-07-2010 5 0.00 as stated above? I am close to resolving this so once I can get this answered, I am hoping this is resolved. Thanks.

#!/usr/local/bin/perl

$| = 1;

while (my $file = shift(@ARGV)) {
     my @rows = (5);
     my @cols = (7,8);
     open IN, $file or die "Can't open $file: $!";
     <IN>; #skip header
     my $rndx = 0;
     while (<IN>)
     {
           next if $.<$rows[$rndx];
           @data = split;
           $data[0] =~ s/\//-/g;
           print "$data[0] ";
           print $data[$_-1] foreach @cols;
           print "\n";
           $rndx++;
           last if $rndx>$#rows;
     }
     close IN;
}
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
Don't use your original script, it'd poorly written.

Just comment out the lines of my script that are related to the Date::Calc module and enable them when the module is installed.

#!/usr/local/bin/perl



use strict;

use warnings;

#use Date::Calc qw(Date_to_Time Time_to_Date);



my $base_dir = '/share/huina/rhuff/hysplit/hysplit/kilauea/forecastdata/so2/';

my $outfile = '/share/huina/rhuff/validation/forecast/so2/hysplit.haw.horiz.so2.kapolei.hour12.txt';



open my $output_fh, '>', $outfile or die "failed to open/create <$outfile> $!";



my $ten_hours = 60*60*10;



foreach my $file ( <$base_dir/hysplit.haw.horiz.so2.*.txt> ) {

    open my $fh, '<', $file or die "failed to open <$file> $!";

    <$fh> for 1..4;

    my $line = <$fh>;  # getting row 5

    close $fh;

    

    my ($date, $hr, $data) = (split(/\s/, $line))[0,6,7];

    $date =~ s~/~-~g;

    

    my ($mo, $day, $yr) = split(/-/, $date);    

    #my $time            = Date_to_Time($yr, $mo, $day, $hr, 0, 0);

    #($yr,$mo,$day,$hr)  = Time_to_Date($time - $ten_hours);

    

    print $output_fh "$mo-$day-$yr $hr $data\n";

}

Open in new window

0
 

Author Comment

by:libertyforall2
Comment Utility
Ok. Should my ($date, $hr, $data) = (split(/\s/, $line))[0,6,7]; be changed to my ($date, $hr, $data) = (split(/\s/, $line))[0,7,8]; I think you said somewhere my hour2 was actually column7. Also,  should the $hr = sprintf("%02d", $hr); be added directly above print $output_fh "$mo-$day-$yr $hr $data\n";?  Is the script correct below? Thanks.

[code]
#!/usr/local/bin/perl

use strict;
use warnings;
#use Date::Calc qw(Date_to_Time Time_to_Date);

my $base_dir = '/share/huina/rhuff/hysplit/hysplit/kilauea/forecastdata/so2/';
my $outfile = '/share/huina/rhuff/validation/forecast/so2/hysplit.haw.horiz.so2.kapolei.hour12.txt';

open my $output_fh, '>', $outfile or die "failed to open/create <$outfile> $!";

my $ten_hours = 60*60*10;

foreach my $file ( <$base_dir/hysplit.haw.horiz.so2.*.txt> ) {
    open my $fh, '<', $file or die "failed to open <$file> $!";
    <$fh> for 1..4;
    my $line = <$fh>;  # getting row 5
    close $fh;
   
    my ($date, $hr, $data) = (split(/\s/, $line))[0,7,8];
    $date =~ s~/~-~g;
   
    my ($mo, $day, $yr) = split(/-/, $date);    
    #my $time            = Date_to_Time($yr, $mo, $day, $hr, 0, 0);
    #($yr,$mo,$day,$hr)  = Time_to_Date($time - $ten_hours);
   
    $hr = sprintf("%02d", $hr);
    print $output_fh "$mo-$day-$yr $hr $data\n";
}
[/code]
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
That line is doing an array slice.  Arrays are 0 indexed, so the 1st column is index 0 and the 7th and 8th columns are indexes 6 and 7 respectively.  So, the index numbers in that split function need to be 1 less than the column numbers that you want to extract.

Yes, the sprintf line needs to be just prior to the print statement.
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
This is the output I get from that script.  Is this what you expect?

D:\perl>forecastso2kapolei12.pl
11-07-2010 15 0.00
0
 

Author Comment

by:libertyforall2
Comment Utility
Looking at hour 15 is in UTC for line 2. HST is 10 hours behind UTC and since HST does not observe daylight savings, it will always be -10 hours. The final output line has the correct columns but the hour 15 should read 03. It looks as though it read line 2 instead of line 5 which has a UTC hour of 0. But looking at the file again, I am realizing something. If you notice column 1 is the same date as day1, since the data is averaged over an hour, the time stamp is for the beginning hour and day 2 is the end of the hour. If you look at line 5; 11/07/2010 10 11 7 23 8 0 0.00 you see that the day 7 is the beginning hour and day 8 is the ending hour. This means that we do not need to convert the date only the hour because the time stamp would be the same date if we lose 10  hours for rows with an hour2 of 0. Since we are only looking at hours ending in 0 or 12; we only need to change change hour 0 to hour 14 and hour 12 to 02. Nothing else needs to be changed aside from being able to modify the rows from 5 to 9, 13, & 17 when necessary and modify the columns selected for data.

 JDAY  YR  MO DA1 HR1 DA2 HR2 S00000037 S00000021 S00000002 S00000004 S00000034 S00000035 S00000075 S00000038 S00000044 S00000045 S00000046
11/07/2010 10 11 7 14 7 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11/07/2010 10 11 7 17 7 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11/07/2010 10 11 7 20 7 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
11/07/2010 10 11 7 23 8 0 0.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00
0
 

Author Comment

by:libertyforall2
Comment Utility
     At this point, it looks like we can simplify the script. Since all input files will have row 5 with an hour2 of either 0 or 12 so all rows 5 , 9, 13, & 17 will always be 0 or 12, we  simply select column 1 (or zero?) for the date, column (6 or 7?) for time2, modify the hour to 14 if 0 and 02 if 12, then add the selected column of data (in this case 7, or 8?). It looks like column 0 is the first column? Finally, stack all selected output rows of data if more than one input file and place in the output file.
     Looking at the script it appears column 0 is actually the first column. In any case, I don't think we need the time module.
0
 
LVL 28

Expert Comment

by:FishMonger
Comment Utility
If you don't care about the accuracy of the timestamp, which is what you're saying boils down to, then why change it?
0
 

Author Comment

by:libertyforall2
Comment Utility
The time stamp will be accurate if we simply change the hour. The date will not need to be changed since we are only looking at certain rows of data . All we need to do to the time stamp is change the hour2 by substracting changing 0 to 14 and 12 to 02.  
0
 

Author Comment

by:libertyforall2
Comment Utility
I have to change the hour in order to match it up to another file. I need the hours to match. The other file is in the same format I am trying to get this file into and I will compare the two in a plot. Since HST has no daylight saving time and the hours I need to look at just happen to fall into a certain range, the date itself does not need to be changed only the hour.
0
 
LVL 10

Expert Comment

by:TRW-Consulting
Comment Utility
It's starting to look like you are only interested in lines where the hour is 0 or 12.  Is that the case?  If so, then why do the calculation and just use what's below.  If I'm misunderstanding please provide a file showing what you want the output to be.
#!/usr/bin/perl

use strict;

while (my $file = shift(@ARGV)) {

    open IN, $file or die "failed to open <$file> $!";

    while (<IN>) {

      my ($date, $hr, $data) = (split(/\s/, $_))[0,6,7];

      if ($hr eq "0") {
        print "$date 14 $data\n";
      } elsif ($hr eq "12") {
        print "$date 02 $data\n";
      }

    }
}

Open in new window

0
 
LVL 28

Accepted Solution

by:
FishMonger earned 500 total points
Comment Utility
If the hour is 10, do you want the adjusted hour to be 00 or 24?

If you want it to be 24, then change line 21 to this:
$hr = $hr < 11 ? $hr + 14 : sprintf("%02d", $hr - 10);

#!/usr/local/bin/perl



use strict;

use warnings;



my $base_dir = '/share/huina/rhuff/hysplit/hysplit/kilauea/forecastdata/so2/';

my $outfile = '/share/huina/rhuff/validation/forecast/so2/hysplit.haw.horiz.so2.kapolei.hour12.txt';



open my $output_fh, '>', $outfile or die "failed to open/create <$outfile> $!";



my $ten_hours = 60*60*10;



foreach my $file ( <$base_dir/hysplit.haw.horiz.so2.*.txt> ) {

    open my $fh, '<', $file or die "failed to open <$file> $!";

    <$fh> for 1..6;

    my $line = <$fh>;  # getting row 5

    close $fh;

    

    my ($date, $hr, $data) = (split(/\s/, $line))[0,6,7];

    $date =~ s~/~-~g;

    $hr = $hr < 10 ? $hr + 14 : sprintf("%02d", $hr - 10);

    

    print $output_fh "$date $hr $data\n";

}

Open in new window

0
 

Author Comment

by:libertyforall2
Comment Utility
        I am going to create 4 separate scripts for each row number. script 1, would look at only line 5 which could be either 0 or 12, script 2 would look at only line 9 with goes out 24 hours but would be either 0 or 12 and so on. So again, If I can simply choose the line & column of data so I can modify it for each time frame and location (column of data).
        I will only end up with hours that have 0 or 12. If 0 then it should be changed to 14, if 12 it should be changed to 02, nothing else needs to be changed. Each script will pick only one line and one column of data but the lines will contain both types of files so there will be both 0 & 12 hours in row 5 since I am pulling the data from two types of files that are 12 hours apart. It looks like line $hr = $hr < 10 ? $hr + 14 : sprintf("%02d", $hr - 10); does this. I will run it and see what happens.
0
 

Author Comment

by:libertyforall2
Comment Utility
Ok. I ran the script at it looks like it input the wrong row of data and the hours were 08 & 20. I changed the script back to <$fh> for 1..4; and it appears to be working correctly. I am going to check to data just to make sure it input the correct row and column and I will close this question once it works.
0
 

Author Comment

by:libertyforall2
Comment Utility
It worked. <$fh> for 1..4; for row 5! Yeah!!!!!!!!!!!! Finally!!!!!!!!!!!!
0
 

Author Closing Comment

by:libertyforall2
Comment Utility
FANTASTIC!!!!!!!!!!
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

How to sign a powershell script so you can prevent tampering, and only allow users to run authorised Powershell scripts
This article explains how to prepare an HTML email signature template file containing dynamic placeholders for users' Azure AD data. Furthermore, it explains how to use this file to remotely set up a department-wide email signature policy in Office …
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now