How to generate multiple files from a perl script

The below perl script reads a database table generates output to a csv file.

use strict;
use warnings;
use DBI;
use Date::Manip;

Date_Init('TZ=US/Central','Internal=1');

$| = 1;

my $nodata = 0;
my ($db, $user, $pw, $wkdir, $firstday, $lastday)  = @ARGV;

# Display parameter values passed by the appworx

print("\n");
print("Database Instance  : $db \n");
print("Connected as User  : $user \n");
print("Output Directory is : $wkdir \n");
print("First Day : $firstday \n");
print("Last Day : $lastday \n\n");

#Check to see if required parameters are received by appworx
unless($db && $user && $pw && $wkdir && $firstday && $lastday) {
	
  die("\n\n  - ERROR:  The Required number of arguments were not received...\n\n");
  
}

# Connectivity check to the database else display failed connectivity
my $dbh = DBI->connect("dbi:Oracle:$db", $user, $pw, { AutoCommit => 0, RaiseError => 1 }) or die("Cannot connect to Database '$db' " . DBI->errstr());

# Capture counts for distinct partitions for given firstday and lastday dates
my $partition_key = join '', $firstday ,'_',$lastday;
print("Partition Key : $partition_key \n\n");

my $sthcounts = $dbh->prepare(qq(select count(distinct r.execution_id) from REGION r where r.partition_key = ? AND r.status = 'N'));
# Values of partition key will be passed on by appworx will be parameterized in the script
$sthcounts -> execute($partition_key);

# Get value of rowcount returned from Count of partition keys from REGION table
my $rowcount = ($sthcounts-> fetchrow_array());
$sthcounts -> finish();

# Check if single partition exists for a given firstday/lastday parameters, ROW COUNT RETURNED will be 1 and then proceed with extract out columns from the table to a csv file
# IF multiple partitions exist OR no data exists for a given firstday/last day set - the PROCESS WILL ABORT
if ((!defined $rowcount) || ($rowcount eq 0)) {

  die ("ERROR: NO DATA EXISTS FOR GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n");

} else {
  if ($rowcount gt 1) 
  {

    die ("ERROR: UNREPORTED DATA EXISTS FOR THE GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n");

  } else 
    {

      #Assign variable to the partition key and use this variable in subsequent prepare statements and to extract columns to outfile
      my $genextract = $dbh->prepare(qq(SELECT  col1, col2, sum(col3)
                              					FROM REGION WHERE PARTITION_KEY = ? AND STATUS = 'N'))
                   		 or die("Could not get data from table '$_'".DBI->errstr);

      $genextract->execute($partition_key);

      my @allrows;
      my @columns;
      my $getfilename = join '', $partition_key,'.csv';
      my $rc  = $genextract->fetchrow_hashref() or $nodata = 1;
               my @row = @{$genextract ->{NAME}};

      foreach my $col (@row) 
      {
          my $name = $col;
          $name =~ s/\s+//g;
          my $data = $rc->{$name};
          push @allrows, $data;
          push @columns, $name;
      }
      my $dumpdir = "$wkdir/Regionload_$getfilename";

      open (OUT, ">$dumpdir") or die(" - Could not open $dumpdir for export. $!");

      # List out all the columns/rows with header information and generating a csv outfile..
       $" = ",";
      print OUT "@columns\n";
      print OUT "@allrows\n";

      while (my $row = $genextract->fetchrow_hashref()) {
      
        foreach my $col (@columns) {
        
           print OUT $$row{$col},",";
           
        }
        print OUT "\n";
      }
      close(OUT);

      $genextract-> finish();

      # Updates the STATUS column to R(Reported) for the partition key after REC LOAD csv outfile is generated
      print "Updating data set status to R (reported) for partition key $partition_key. \n\n";
      my $update_status = $dbh->prepare(qq(UPDATE REGION SET STATUS = 'R', LSTIME = SYSDATE WHERE PARTITION_KEY = ? AND STATUS = 'N'))
                                    or die("Could not get data from table '$_'".DBI->errstr);

      $update_status -> execute($partition_key);
      $update_status -> finish();

      # File output process completion Notification
      print "The Regionload_$getfilename file is generated to output directory.\n\n";
  }

}
# Disconnect from the database
$dbh->disconnect();

Open in new window




How do I modify the script to extract different  files using different data sets from the same source table (sample data sets below)? Each dataset would be a separate select statement so that the file created is separate

dataset1: select col4, col5, col6 from region where partition_key = ?

dataset 2:  select col5, col6, sum(col7) from region where partition_key = ? group by col5, col6
steve2312Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

wilcoxonCommented:
Your script does a lot of things the hard way.  I've rewritten your script and factored out the dataset creation to make it simpler.  However, I do not have an Oracle db to test against so debugging will be left to you.

use strict;
use warnings;
use DBI;
use Date::Manip;

Date_Init('TZ=US/Central','Internal=1');

$| = 1;

my $nodata = 0;
my ($db, $user, $pw, $wkdir, $firstday, $lastday)  = @ARGV;

# Display parameter values passed by the appworx

print "\nDatabase Instance  : $db\nConnected as User  : $user\n",
      "Output Directory is : $wkdir\nFirst Day : $firstday\n",
      "Last Day : $lastday\n\n";

#Check to see if required parameters are received by appworx
die("\n\n  - ERROR:  The Required number of arguments were not received...\n\n")
    unless $lastday;

# Connectivity check to the database else display failed connectivity
my $dbh = DBI->connect("dbi:Oracle:$db", $user, $pw, { AutoCommit => 0, RaiseError => 1 })
    or die("Cannot connect to Database '$db' " . DBI->errstr());

# Capture counts for distinct partitions for given firstday and lastday dates
my $partition_key = "${firstday}_$lastday";
print("Partition Key : $partition_key \n\n");

# Get value of rowcount returned from Count of partition keys from REGION table
my ($rowcount) = $dbh->selectrow_array("select count(distinct r.execution_id) from REGION r where r.partition_key = '$partition_key' AND r.status = 'N'");

# Check if single partition exists for a given firstday/lastday parameters, ROW COUNT RETURNED will be 1 and then proceed with extract out columns from the table to a csv file
# IF multiple partitions exist OR no data exists for a given firstday/last day set - the PROCESS WILL ABORT

die "ERROR: NO DATA EXISTS FOR GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n"
    unless $rowcount;

die "ERROR: UNREPORTED DATA EXISTS FOR THE GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n"
    if ($rowcount > 1);

do_dataset('Regionload', [qw(col1 col2 sum(col3))], status => 'N', group_by => [qw(col1 col2)]);
do_dataset('dataset1', [qw(col4 col5 col6)]);
do_dataset('dataset2', [qw(col5 col6 sum(col7))], group_by => [qw(col5 col6)]);

# Disconnect from the database
$dbh->disconnect();

sub do_dataset {
    my ($file, $cols, %opt) = @_;
    # do some pre-processing of options
    my $col = join ',', @$cols;
    my $group = '';
    if (exists $opt{group_by}) {
        $group = join ', ', @{$opt{group_by}};
        delete $opt{group_by};
    }
    # XXX - only works for *char cols this code
    my $xtra = %opt ? ('and ' . join(' and ', map { "$_ = '$opt{$_}'" } keys %opt)) : '';

    # get data
    my $allrows = $dbh->selectall_arrayref("SELECT $col FROM REGION WHERE PARTITION_KEY = '$partition_key' $xtra $group")
        or die("Could not get data from table REGION ".DBI->errstr);

    # dump the data out to a file
    my $dumpdir = "$wkdir/${file}_$partition_key.csv";
    open OUT, '>', $dumpdir or die " - Could not open $dumpdir for export. $!";
    print OUT $col, "\n"; # manually list column names from input
    print OUT map { print join(',', @$_), "\n" } @$allrows;
    close OUT;

    # Updates the STATUS column to R(Reported) for the partition key after REC LOAD csv outfile is generated
    print "Updating data set status to R (reported) for partition key $partition_key. \n\n";
    $dbh->do("UPDATE REGION SET STATUS = 'R', LSTIME = SYSDATE WHERE PARTITION_KEY = '$partition_key' $xtra")
        or die "Could not get data from table REGION ".DBI->errstr;

    # File output process completion Notification
    print "The $dumpdir file is generated to output directory.\n\n";
}

Open in new window


If you have any questions about this, just ask.
0
steve2312Author Commented:
Thanks for taking time to rework it.

A few questions(and this could be due to my ignorance)

1.  Would these generate different file (filenames).  The reason I ask is the each dataset is different (quarterly, semi-annual / yearly ).   So wanted to be sure each file is generated separately and goes to the same outdirectory.

2. "XXX - only works for *char cols this code" (Line 59) - I am not understanding the comment mentioned here.

3. You have the looping constructs after the disconnect from db.   I always thought disconnect would be best after all processing is complete.

I may have additional questions as I get deeper into testing.  Thanks once again for the pointers!
0
wilcoxonCommented:
  1. Yes.  line 67 generates the filesnames based on the first argument passed into do_dataset.
  2. For any "extra" conditions passed in to do_dataset, as written, it will quote the value (eg "status => N" becomes "and status = 'N'" but "someval => 1" would also become "and someval = '1'").
  3. do_dataset is a subroutine/function.  It is executed when called and not where defined so the loops within do_dataset are executed before the disconnect is called (on lines 43-45 to be precise).

I'll be happy to answer any further questions you have.
0
Amazon Web Services

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

steve2312Author Commented:
Thanks, for the dataset initializations (Lines 43 - 45), you have them as 'Regionload', 'dataset1', 'dataset2'.  

1. Why are these are not referred anywhere else in the rest of script?
2.  Columns listed inside qw brackets (lines 43 - 45) ) would only be columns  from the select statement  - correct?  We don't add any where clause or group by columns/conditions in here - correct?
3.  In the files created, how could we append the below to filenames?
    a. MonYY for monthly file generation (example RegionLoad would be RegionLoad_Jan18 for January 2018 monthly run)
    b. YYYY  for annual file generation (example: RegionLoad filename ==>  RegionLoad_2017 for 2017 file)
0
wilcoxonCommented:
  1. Yes, I defined the datasets as Regionload, dataset1, and dataset2 since you already had the first named but didn't give your other desired ones names.  The dataset/file name is passed in on the do_dataset call on lines 43-45, defined as the filename on line 67, opened on line 68 (and mostly referred to by filehandle OUT after that), and used in the print on line 79.
  2. The columns for the select is the arrayref as second argument to do_dataset.  Everything else gets shoved into the %opt hash within do_dataset - if it is group_by, it is used as group_by - if it is anything else, it is added to the where clause (the only example of this is line 43 with "status => 'N'" which would append "and status = 'N'" to the where clause).
  3. Do you still want the $partition_key as part of the filename or just the "dataset" name and date (either MonYY or YYYY)?  The best way to achieve this depends on which you actually want.
0
steve2312Author Commented:
Thanks for detailed explanation.  

For Point 3.  I want the dataset name along with MonYY or YYYY (depending on type of report requested is monthly or annual).
0
wilcoxonCommented:
For 3, one other question - where do you get the date from?  Is it hard-coded in the script, calculated from the current date, passed in on the command line, or something else?
0
steve2312Author Commented:
The date will be passed from external job scheduling tool.

The date parameters passed are in format of "firstday, lastday"  and in a YYYMMDD format.

(example: if dates passed are "20170101, 20170131" then value of  Jan 17 would be appended to dataset1 file name)
(example: if dates chosen are"20160101, 20161231" then value of 2016 would be appended to dataset1 file name)
0
wilcoxonCommented:
Here's a revised version that does a few more error checks and handles significantly differing filenames.  Do your sql queries ever use functions other than aggregate functions?  If not, I had a thought for a simple way to auto-generate group by from column list (instead of needing to explicitly specify it).

use strict;
use warnings;
use DBI;
use Date::Manip;

Date_Init('TZ=US/Central','Internal=1');

$| = 1;

my $nodata = 0;
my ($db, $user, $pw, $wkdir, $firstday, $lastday)  = @ARGV;

# Display parameter values passed by the appworx

print "\nDatabase Instance  : $db\nConnected as User  : $user\n",
      "Output Directory is : $wkdir\nFirst Day : $firstday\n",
      "Last Day : $lastday\n\n";

#Check to see if required parameters are received by appworx
die("\n\n  - ERROR:  The Required number of arguments were not received...\n\n")
    unless $lastday;

my ($fyr, $fmon, $fday, $lyr, $lmon, $lday);
if ($firstday =~ /^(\d\d\d\d)(\d\d)(\d\d)$/) {
    ($fyr, $fmon, $fday) = ($1, $2, $3);
} else {
    die " - ERROR: firstday ($firstday) is not a date\n";
}
if ($lastday =~ /^(\d\d\d\d)(\d\d)(\d\d)$/) {
    ($lyr, $lmon, $lday) = ($1, $2, $3);
} else {
    die " - ERROR: lastday ($lastday) is not a date\n";
}

# Connectivity check to the database else display failed connectivity
my $dbh = DBI->connect("dbi:Oracle:$db", $user, $pw, { AutoCommit => 0, RaiseError => 1 })
    or die("Cannot connect to Database '$db' " . DBI->errstr());

# Capture counts for distinct partitions for given firstday and lastday dates
my $partition_key = "${firstday}_$lastday";
print("Partition Key : $partition_key \n\n");

# Get value of rowcount returned from Count of partition keys from REGION table
my ($rowcount) = $dbh->selectrow_array("select count(distinct r.execution_id) from REGION r where r.partition_key = '$partition_key' AND r.status = 'N'");

# Check if single partition exists for a given firstday/lastday parameters, ROW COUNT RETURNED will be 1 and then proceed with extract out columns from the table to a csv file
# IF multiple partitions exist OR no data exists for a given firstday/last day set - the PROCESS WILL ABORT

die "ERROR: NO DATA EXISTS FOR GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n"
    unless $rowcount;

die "ERROR: UNREPORTED DATA EXISTS FOR THE GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n"
    if ($rowcount > 1);

do_dataset("Regionload_$partition_key", [qw(col1 col2 sum(col3))], status => "'N'", group_by => [qw(col1 col2)]);
# quick hack that only handles monthly and yearly
my $date;
if ($fyr == $lyr and $fmon == $lmon) {
    $date = month_name($fmon) . substr($fyr,0,2);
} elsif ($fyr == $lyr) {
    $date = $fyr;
} else {
    die " - ERROR: don't know what to do if years don't match\n";
}
do_dataset("dataset1_$date", [qw(col4 col5 col6)]);
do_dataset("dataset2_$partition_key", [qw(col5 col6 sum(col7))], group_by => [qw(col5 col6)]);

# Disconnect from the database
$dbh->disconnect();

sub do_dataset {
    my ($file, $cols, %opt) = @_;
    # do some pre-processing of options
    my $col = join ',', @$cols;
    my $group = '';
    if (exists $opt{group_by}) {
        $group = join ', ', @{$opt{group_by}};
        delete $opt{group_by};
    }
    # XXX - changed to use passed-in value
    #       will work for numbers - requires passed in string value to be quoted
    my $xtra = %opt ? ('and ' . join(' and ', map { "$_ = $opt{$_}" } keys %opt)) : '';

    # get data
    my $allrows = $dbh->selectall_arrayref("SELECT $col FROM REGION WHERE PARTITION_KEY = '$partition_key' $xtra $group")
        or die("Could not get data from table REGION ".DBI->errstr);

    # dump the data out to a file
    my $dumpdir = "$wkdir/$file.csv";
    open OUT, '>', $dumpdir or die " - Could not open $dumpdir for export. $!";
    print OUT $col, "\n"; # manually list column names from input
    print OUT map { print join(',', @$_), "\n" } @$allrows;
    close OUT;

    # Updates the STATUS column to R(Reported) for the partition key after REC LOAD csv outfile is generated
    print "Updating data set status to R (reported) for partition key $partition_key. \n\n";
    $dbh->do("UPDATE REGION SET STATUS = 'R', LSTIME = SYSDATE WHERE PARTITION_KEY = '$partition_key' $xtra")
        or die "Could not get data from table REGION ".DBI->errstr;

    # File output process completion Notification
    print "The $dumpdir file is generated to output directory.\n\n";
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
steve2312Author Commented:
Thank you wilcoxon.   The sql queries I am using to retrieve data from table would at the most be  a select on a table with agg functions like sum(totals).
0
steve2312Author Commented:
A few questions on your code posted

1. What does lines 58 - 63 on do (the monthly and yearly hack)
2. What does line 82 (my $xtra = %opt ? ('and ' . join(' and ', map { "$_ = $opt{$_}" } keys %opt)) ) do?


Meanwhile. I've added my other datasets to give you an idea of data extracted.

use strict;
use warnings;
use DBI;
use Date::Manip;

Date_Init('TZ=US/Central','Internal=1');

$| = 1;

my $nodata = 0;
my ($db, $user, $pw, $wkdir, $firstday, $lastday)  = @ARGV;

# Display parameter values passed by the appworx

print("\n");
print("Database Instance  : $db \n");
print("Connected as User  : $user \n");
print("Output Directory is : $wkdir \n");
print("First Day : $firstday \n");
print("Last Day : $lastday \n\n");


#Check to see if required parameters are received by appworx
die("\n\n  - ERROR:  The Required number of arguments were not received...\n\n") unless $lastday;

my ($fyr, $fmon, $fday, $lyr, $lmon, $lday);
if ($firstday =~ /^(\d\d\d\d)(\d\d)(\d\d)$/) {
    ($fyr, $fmon, $fday) = ($1, $2, $3);
} else {
    die " - ERROR: firstday ($firstday) is not a date\n";
}
if ($lastday =~ /^(\d\d\d\d)(\d\d)(\d\d)$/) {
    ($lyr, $lmon, $lday) = ($1, $2, $3);
} else {
    die " - ERROR: lastday ($lastday) is not a date\n";
}

# Connectivity check to the database else display failed connectivity
my $dbh = DBI->connect("dbi:Oracle:$db", $user, $pw, { AutoCommit => 0, RaiseError => 1 }) or die("Cannot connect to Database '$db' " . DBI->errstr());


# Capture counts for distinct partitions for given firstday and lastday dates
my $partition_key = "${firstday}_$lastday";
print("Partition Key : $partition_key \n\n");

# Get value of rowcount returned from Count of partition keys from REGION table
my ($rowcount) = $dbh->selectrow_array("select count(distinct r.execution_id) from RP_REC_LOAD_EXTRACT r where r.partition_key = '$partition_key' AND r.status = 'N'");

# Check if single partition exists for a given firstday/lastday parameters, ROW COUNT RETURNED will be 1 and then proceed with extract out columns from the table to a csv file
# IF multiple partitions exist OR no data exists for a given firstday/last day set - the PROCESS WILL ABORT

die "ERROR: NO DATA EXISTS FOR GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n"
    unless $rowcount;

die "ERROR: UNREPORTED DATA EXISTS FOR THE GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n"
    if ($rowcount > 1);

do_dataset('Detail_ESIID', [qw(ESIID, REPCODE, REPNAME, TO_CHAR(STARTTIME,'MM/DD/YYYY') TRADEDATE, ORIGIN, INTV1, INTV2, INTV3, INTV4, INTV5, INTV6, INTV7, INTV8, INTV9, INTV10, INTV11, INTV12, INTV13, INTV14, INTV15,
														 INTV16, INTV17, INTV18, INTV19, INTV20, INTV21, INTV22, INTV23, INTV24, INTV25, INTV26, INTV27, INTV28, INTV29, INTV30, INTV31, INTV32, INTV33, INTV34, INTV35, INTV36, INTV37, INTV38, INTV39, INTV40, INTV41, INTV42, INTV43, INTV44, INTV45,
														 INTV46, INTV47, INTV48, INTV49, INTV50, INTV51, INTV52, INTV53, INTV54, INTV55, INTV56, INTV57, INTV58, INTV59, INTV60, INTV61, INTV62, INTV63, INTV64, INTV65, INTV66, INTV67, INTV68, INTV69, INTV70, INTV71, INTV72, INTV73, INTV74, INTV75,
														 INTV76, INTV77, INTV78, INTV79, INTV80, INTV81, INTV82, INTV83, INTV84, INTV85, INTV86, INTV87, INTV88, INTV89, INTV90, INTV91, INTV92, INTV93, INTV94, INTV95, INTV96, INTV97, INTV98, INTV99, INTV100, DAILY_TOTAL)], status => 'N');

# quick hack that only handles monthly and yearly
my $date;
if ($fyr == $lyr and $fmon == $lmon) {
    $date = month_name($fmon) . substr($fyr,0,2);
} elsif ($fyr == $lyr) {
    $date = $fyr;
} else {
    die " - ERROR: don't know what to do if years don't match\n";
}
do_dataset('Monthly_ESIID', [qw(ESIID, REPCODE, REPNAME, SUM(TOTAL) MONTHLY_TOTAL)] , group_by => [qw(ESIID, REPCODE, REPNAME)]);
do_dataset('Daily_REP', [qw(REPNAME, REPCODE, TO_CHAR(STARTTIME,'MM/DD/YYYY') TRADEDATE, SUBTRACT_TOTAL, ADJ_TOTAL)];
do_dataset('Monthly_REP', [qw(REPNAME, REPCODE, TO_CHAR(STARTTIME,'MM/DD/YYYY') TRADEDATE, SUBTRACT_TOTAL, ADJ_TOTAL)], group_by => [qw(REPCODE, REPNAME)]);
do_dataset('RecLoad', [qw(TO_CHAR(STARTTIME,'MM/DD/YYYY') TRADEDATE, DUNSNUMBER, REPNAME, ADJ_TOTAL AS TOTALS)], group_by => [qw(STARTIME, DUNSNUMBER, REPNAME)]);

# Disconnect from the database
$dbh->disconnect();

sub do_dataset {
    my ($file, $cols, %opt) = @_;
    # do some pre-processing of options
    my $col = join ',', @$cols;
    my $group = '';
    if (exists $opt{group_by}) {
        $group = join ', ', @{$opt{group_by}};
        delete $opt{group_by};
    }
    # XXX - changed to use passed-in value
    #       will work for numbers - requires passed in string value to be quoted
    my $xtra = %opt ? ('and ' . join(' and ', map { "$_ = $opt{$_}" } keys %opt)) : '';

    # get data
    my $allrows = $dbh->selectall_arrayref("SELECT $col FROM RP_REC_LOAD_EXTRACT WHERE PARTITION_KEY = '$partition_key' $xtra $group")
        or die("Could not get data from table REGION ".DBI->errstr);

    # dump the data out to a file
    my $dumpdir = "$wkdir/$file.csv";
    open OUT, '>', $dumpdir or die " - Could not open $dumpdir for export. $!";
    print OUT $col, "\n"; # manually list column names from input
    print OUT map { print join(',', @$_), "\n" } @$allrows;
    close OUT;

    # Updates the STATUS column to R(Reported) for the partition key after REC LOAD csv outfile is generated
    print "Updating data set status to R (reported) for partition key $partition_key. \n\n";
    $dbh->do("UPDATE RP_REC_LOAD_EXTRACT SET STATUS = 'R', LSTIME = SYSDATE WHERE PARTITION_KEY = '$partition_key' $xtra")
        or die "Could not get data from table REGION ".DBI->errstr;

    # File output process completion Notification
    print "The $dumpdir file is generated to output directory.\n\n";
}

Open in new window

0
wilcoxonCommented:
  1. The date hack takes the date pieces (split out on lines 27-36) and compares them to see if they appear to be a monthly or yearly report.  To be more precise, it checks to see if the month for first/last are the same (does not check days) and, if not, checks to see if the years are the same (does not check months or days).
  2. line 82 (now 91 in your latest code) checks to see if there is anything in the %opt hash.  If there is, it sets $xtra to be a concatenated series of and statements based on what is in %opt (not used a whole lot - in the older example, there was only status => 'N' which became "and status = 'N'" - it doesn't look like there is anything like that in your new datasets).  If there is nothing in %opt, then it just sets $xtra to ''.  ?: is basically an if-else statement just for assignments.
0
steve2312Author Commented:
There seems to be an issue with the "status = 'N'.  An error stating:-  "N" : invalid identifier.  Though the line number listing the error is 92 ( my$allrows = $dbh->selectall_arrayref("SELECT $col FROM RP_REC_LOAD_EXTRACT WHERE PARTITION_KEY = '$partition_key' $xtra $group")

I am getting a warning on lines 58 - 62 (my latest code) , the error states "possible attempt to separate words with commas", though that is a warning only. This is as per updated code snippet below.


use strict;
use warnings;
use DBI;
use Date::Manip;

Date_Init('TZ=US/Central','Internal=1');

$| = 1;

my $nodata = 0;
my ($db, $user, $pw, $wkdir, $firstday, $lastday)  = @ARGV;

# Display parameter values passed by the appworx

print("\n");
print("Database Instance  : $db \n");
print("Connected as User  : $user \n");
print("Output Directory is : $wkdir \n");
print("First Day : $firstday \n");
print("Last Day : $lastday \n\n");


#Check to see if required parameters are received by appworx
die("\n\n  - ERROR:  The Required number of arguments were not received...\n\n") unless $lastday;

my ($fyr, $fmon, $fday, $lyr, $lmon, $lday);
if ($firstday =~ /^(\d\d\d\d)(\d\d)(\d\d)$/) {
    ($fyr, $fmon, $fday) = ($1, $2, $3);
} else {
    die " - ERROR: firstday ($firstday) is not a date\n";
}
if ($lastday =~ /^(\d\d\d\d)(\d\d)(\d\d)$/) {
    ($lyr, $lmon, $lday) = ($1, $2, $3);
} else {
    die " - ERROR: lastday ($lastday) is not a date\n";
}

# Connectivity check to the database else display failed connectivity
my $dbh = DBI->connect("dbi:Oracle:$db", $user, $pw, { AutoCommit => 0, RaiseError => 1 }) or die("Cannot connect to Database '$db' " . DBI->errstr());


# Capture counts for distinct partitions for given firstday and lastday dates
my $partition_key = "${firstday}_$lastday";
print("Partition Key : $partition_key \n\n");

# Get value of rowcount returned from Count of partition keys from REGION table
my ($rowcount) = $dbh->selectrow_array("select count(distinct r.execution_id) from REGION r where r.partition_key = '$partition_key' AND r.status = 'N'");

# Check if single partition exists for a given firstday/lastday parameters, ROW COUNT RETURNED will be 1 and then proceed with extract out columns from the table to a csv file
# IF multiple partitions exist OR no data exists for a given firstday/last day set - the PROCESS WILL ABORT

die "ERROR: NO DATA EXISTS FOR GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n"
    unless $rowcount;

die "ERROR: UNREPORTED DATA EXISTS FOR THE GIVEN FIRSTDAY LASTDAY SET OF DATES - ABORTING!!!\n\n"
    if ($rowcount > 1);

do_dataset('Detail_ESIID', [qw(ESIID REPCODE REPNAME TO_CHAR(STARTTIME,'MM/DD/YYYY') TRADEDATE ORIGIN INTV1 INTV2 INTV3 INTV4 INTV5 INTV6 INTV7 INTV8 INTV9 INTV10 INTV11 INTV12 INTV13 INTV14 INTV15 INTV16 INTV17 INTV18 INTV19 INTV20 INTV21 INTV22 INTV23 INTV24 INTV25 INTV26 INTV27 INTV28 INTV29 INTV30 INTV31 INTV32 INTV33 INTV34 INTV35 INTV36 INTV37 INTV38 INTV39 INTV40 INTV41 INTV42 INTV43 INTV44 INTV45 INTV46 INTV47 INTV48 INTV49 INTV50 INTV51 INTV52 INTV53 INTV54 INTV55 INTV56 INTV57 INTV58 INTV59 INTV60 INTV61 INTV62 INTV63 INTV64 INTV65 INTV66 INTV67 INTV68 INTV69 INTV70 INTV71 INTV72 INTV73 INTV74 INTV75
 INTV76 INTV77 INTV78 INTV79 INTV80 INTV81 INTV82 INTV83 INTV84 INTV85 INTV86 INTV87 INTV88 INTV89 INTV90 INTV91 INTV92 INTV93 INTV94 INTV95 INTV96 INTV97 INTV98 INTV99 INTV100 DAILY_TOTAL)], status => 'N');

# quick hack that only handles monthly and yearly
my $date;
if ($fyr == $lyr and $fmon == $lmon) {
    $date = month_name($fmon) . substr($fyr,0,2);
} elsif ($fyr == $lyr) {
    $date = $fyr;
} else {
    die " - ERROR: don't know what to do if years don't match\n";
}
do_dataset('Monthly_ESIID', [qw(ESIID REPCODE REPNAME SUM(TOTAL) MONTHLY_TOTAL)] , group_by => [qw(ESIID REPCODE REPNAME)]);
do_dataset('Daily_REP', [qw(REPNAME REPCODE TO_CHAR(STARTTIME,'MM/DD/YYYY') TRADEDATE SUBTRACT_TOTAL ADJ_TOTAL)]);
do_dataset('Monthly_REP', [qw(REPNAME REPCODE TO_CHAR(STARTTIME,'MM/DD/YYYY') TRADEDATE SUBTRACT_TOTAL ADJ_TOTAL)], group_by => [qw(REPCODE REPNAME)]);
do_dataset('RecLoad', [qw(TO_CHAR(STARTTIME,'MM/DD/YYYY') TRADEDATE DUNSNUMBER REPNAME ADJ_TOTAL)], group_by => [qw(STARTIME DUNSNUMBER REPNAME)]);

# Disconnect from the database
$dbh->disconnect();

sub do_dataset {
    my ($file, $cols, %opt) = @_;
    # do some pre-processing of options
    my $col = join ',', @$cols;
    my $group = '';
    if (exists $opt{group_by}) {
        $group = join ', ', @{$opt{group_by}};
        delete $opt{group_by};
    }
    # XXX - changed to use passed-in value
    #       will work for numbers - requires passed in string value to be quoted
    my $xtra = %opt ? ('and ' . join(' and ', map { "$_ = $opt{$_}" } keys %opt)) : '';

    # get data
    my $allrows = $dbh->selectall_arrayref("SELECT $col FROM REGION WHERE PARTITION_KEY = '$partition_key' $xtra $group")
        or die("Could not get data from table REGION ".DBI->errstr);

    # dump the data out to a file
    my $dumpdir = "$wkdir/$file.csv";
    open OUT, '>', $dumpdir or die " - Could not open $dumpdir for export. $!";
    print OUT $col, "\n"; # manually list column names from input
    print OUT map { print join(',', @$_), "\n" } @$allrows;
    close OUT;

    # Updates the STATUS column to R(Reported) for the partition key after REC LOAD csv outfile is generated
    print "Updating data set status to R (reported) for partition key $partition_key. \n\n";
    $dbh->do("UPDATE REGION SET STATUS = 'R', LSTIME = SYSDATE WHERE PARTITION_KEY = '$partition_key' $xtra")
        or die "Could not get data from table REGION ".DBI->errstr;

    # File output process completion Notification
    print "The $dumpdir file is generated to output directory.\n\n";
}

Open in new window

0
wilcoxonCommented:
The warning can be ignored (as long as there are no spaces in the TO_CHAR(...) - it's just a warning because there is a common to separate the arguments.

The error is caused by the changes I made to allow non-string values.  You just need to change status => 'N' to status => "'N'" (eg double-quote, single-quote, N, single-quote, double-quote).
0
steve2312Author Commented:
That did not seem to work, possibly due due to the  "" at the beginning of select count ??  

Below is the error

String found where operator expected at rp_rec_load_rpt.pl line 79, near ""select count(distinct r.execution_id) from REGION r where r.partition_key = '$partition_key
 and r.status = "'N'"
        (Missing operator before 'N'?)
String found where operator expected at rp_rec_load_rpt.pl line 79, near "'N'"""
        (Missing operator before ""?)

Open in new window

0
steve2312Author Commented:
I was able to resolve the error, thanks for the pointer. I can write out the firstdataset but the script also generates an error "use of uninitialized value in join or string pointing to below line"

 print OUT map { print join(',', @$_), "\n" } @$allrows;

Open in new window

0
wilcoxonCommented:
To get rid of the error, this should do it:
 print OUT map { print join(',', map { defined($_) ? $_ : '' } @$_), "\n" } @$allrows;

Open in new window

0
steve2312Author Commented:
Thanks the error is resolved. Appreciate your help and patience with the questions.
0
wilcoxonCommented:
You're welcome.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.