We help IT Professionals succeed at work.

How do I solve this Perl Spreadsheet Parseexcel problem

sunny82
sunny82 asked
on
Hi all,
 
For the attached excel sheet, I am going through all the columns and printing out the contents into an array and then to a file. But in my excel sheet, the data rows are arranged like this
 
File1.txt       /ms/user/s/sayantag/        1       Header1       Header2       Header3       Header4       Header5
File1.txt       /ms/user/s/sayantag/        1       Data1       Data2       Data3       Data4       Data5
File1.txt       /ms/user/s/sayantag/        1       Data6       Data7       Data8       Data9       Data10
File1.txt       /ms/user/s/sayantag/        1       Data11       Data12       Data13       Data14       Data15
 
In the output array and hence into the file, the data gets populated one after another like Data1Data2Data3Data4Data5Data6Data7Data8Data9Data10Data11Data12Data13Data14Data15.
 
Where am I going wrong? Should I put the contents of an entire row of data into another array and then print it out into the file?

Pls help asap with the code

The code can be run with--

perl Process_MF.pl 1 3 5 1 4 2 2
 
#!/usr/bin/perl5.10 -w
 

 
use warnings;
use Spreadsheet::ParseExcel;
use strict;
use IO::Scalar;
use Data::Dumper;
 
my @etl_val = @ARGV;
 
print "@etl_val" . "\n" ;
 
my $parser   = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("/ms/user/s/sayantag/Demo_MF2.xls");
my $worksheet;
my %order_of_flat_file;
my @data_arr;
my $data;
my $etl;
my $path;
my @fin_arr;
 
for  $worksheet ( $workbook->worksheets() ) {
   my ( $row_min, $row_max ) = $worksheet->row_range();
   my ( $col_min, $col_max ) = $worksheet->col_range();
 
   my $row = $row_min + 2;
   while( $row <=$row_max ) {
        print "Row before if statement is $row\n";   
     if ($worksheet->get_cell( $row, 2 )->value ne "ETL") {
        print "Row after if statement is $row\n";
               
     my $etl_cell  = $worksheet->get_cell( $row, 2 );
        if(defined($etl_cell)) {
        $etl = $etl_cell->value;}
 
     my $path_cell = $worksheet->get_cell( $row, 1 );
        if(defined($path_cell)) {
        $path = $path_cell->value;
        $path =~ s/\s$//;
        print "PATH: $path" . "\n";}
 
     my $path_cell0 = $worksheet->get_cell( $row, 0 );
        if(defined($path_cell0)) {
        $path.= $path_cell0->value;
        print $path_cell0->value . "\n";
        print "PATH: $path" . "\n";
        print "ETL: $etl" . "\n";
        print "PATH: $path" . "\n";
       
        }
 
     for my $col ( $col_min+3 .. $col_max ) {
        print "ROW : $row" ."\n";
        print "COLUMN: $col" . "\n";
               
        my $cell = $worksheet->get_cell( $row, $col );
        if (defined($cell) ) {
          my $cell_val = $cell->value;
          print "Cell value is $cell_val\n";
          push(@data_arr, split("\t", $cell_val));
#          @data_arr = join ("\n", @data_arr);
#          print "The first array element is $data_arr[0]\n";
          print "Content of '", $path, "' will be:\n", @data_arr, "\n";
#          foreach my $val(@data_arr) {$val=~ s/\t{1,}/\|/g;}
         } 
        
         }
        
        $row++;    
       }
        else { print "The array looks like @data_arr\n";
              open("FILEHANDLE", '>', $path) or die "Cannot open the file ", $path, ": $!";
              print FILEHANDLE @data_arr ;
              close ("FILEHANDLE");
              @data_arr = ();
              $row = $row+2;
              print "Value of row before calling next is $row\n";
              next;
             
            }
     }
   }
 
#print "The array is @data_arr" . "\n";

Open in new window

Demo-MF2.xls
Comment
Watch Question

I gather you are saying the problem is with the print statements for @data_arr (both to the file and to the screen)?

If so, you need to do something to create "space" between the elements of the array.

print FILEHANDLE "@data_arr"; # print with single space between elements
print FILEHANDLE join("\t", @data_arr); # print with tab between elemennts
print FILEHANDLE join(',', @data_arr); # separate elements with commas

Author

Commented:
IF I do ---

print FILEHANDLE join("\t", @data_arr); # print with tab between elemennts
print FILEHANDLE join(',', @data_arr); # separate elements with commas

the elements are getting printed as Data1 Data2 Data 3........Data15 in the file depending on if it is a space or a tab. But that is not what I need. If you see my attached excel spreadsheet, the data values are all in separate rows. Like Data1 to Data5 are in separate cells in a single row. Similarly Data6 to Data10 and Data11 to Data15. Like --

File1.txt       /ms/user/s/sayantag/        1       Header1       Header2       Header3       Header4       Header5
File1.txt       /ms/user/s/sayantag/        1       Data1       Data2       Data3       Data4       Data5
File1.txt       /ms/user/s/sayantag/        1       Data6       Data7       Data8       Data9       Data10
File1.txt       /ms/user/s/sayantag/        1       Data11       Data12       Data13       Data14       Data15

So in the file I want the data to be in the exact same way just separated by pipes.

So in my output File1.txt, I want the data to be in separate rows like in the excel sheet like this--

Data1|Data2|Data3|Data4|Data5
Data6|Data7|Data8|Data9|Data10
Data11|Data12|Data13|Data14|Data15

How do I do that? Thanks for your help, really appreciate it and eagerly waiting for your answer.

I've got to run now but I'll take a closer look at your code later tonight.

Generally, you need to output a line at a time.  If you know there will always be 5 data items per line, then you can do the easy fix:

my @tmp = @data_arr;
while (@tmp) {
     print FILEHANDLE join('|', splice(@tmp, 0, 5)), "\n";
}

Author

Commented:
I dont know if there will be 5 data items per line. There can be any no. of data items in a single excel spreadsheet row. Pls help me when you have time as I am really stuck and dont know what to do . As I said, In the output file File1.txt, I need the data rows in the same way as in the excel sheet. Like

Data1|Data2|Data3|Data4|Data5
Data6|Data7|Data8|Data9|Data10
Data11|Data12|Data13|Data14|Data15

At present my code goes through every row, finds out the etl and the path value and then calculates cell value for every column in that row and puts that value into the array. Then it does a $row++ to come to the next row and if it finds the header ETL, outputs the value into the file and empties the array and calls next to go to the next row iteration.

I am getting the output
Data1Data2Data3Data4Data5Data6Data7Data8Data9Data10Data11Data12Data13Data14Data15

while the output will be as above. Hope this will help.

Thanks.
Programmer
Commented:
You need to have a line end at the end of your rows.

You are putting all the cells between the ETLs into an array. I think you would be better off putting each row into a scalar, then put the rows into an array. Then print your array. That way, you can add your pipes and line end to each row.

btw, your line 63 indicates that each cell has tabs in it and you want to split the parts of the cell into separate elements in the array. I don't see any indication of tabs in the cells in your attached worksheet, so I'm going to assume that is incorrect.

Add line 54:
my $data_row;

Change line 63
push(@data_arr, split("\t", $cell_val));  
to
$data_row .= $cell_val.'|';

Before line 72 add:
chop $data_row; # remove last pipe
$data_row .= "\n"; # Add line end to end of row. Be sure to use double quotes
push @data_arr, $data_row;  

Hope that helps.

Author

Commented:
Thanks a lot. It really worked. Really appreciate all your help.

Author

Commented:
Hi,

I got the solution yesterday for this problem. But wanted to ask a related problem. Attached is the excel sheet I want to parse. It is working fine if all the data fields are there, then it puts a pipe between every data field and outputs it into the file. But if a data field is blank it should also put a pipe in its place.

For e.g.
if in my excel sheet,

File4.txt      /ms/user/s/sayantag/       4      Header13      Header14      Header15                              
File4.txt      /ms/user/s/sayantag/      4      Data26      Data27      Data30                              
File4.txt      /ms/user/s/sayantag/      4      Data28      Data29      Data31                              
The output for the code is Data26|Data27|Data30
                                          Data28|Data29|Data31
which is correct.
But if the excel file has any blank data row like
File4.txt      /ms/user/s/sayantag/       4      Header13      Header14      Header15                              
File4.txt      /ms/user/s/sayantag/      4      Data26            Data30                              
File4.txt      /ms/user/s/sayantag/      4      Data28      Data29      Data31      

The output in File4.txt should put a pipe in its place

i.e. the output should be         Data26||Data30
                                               Data28|Data29|Data30
                                         Data28|Data29

How can I do that?
Pls help with the code asap.

Demo-MF4.xls
Frederick McIntyreProgrammer

Commented:
Sorry I didn't get back sooner. My hard drive crashed yesterday, and I've been reinstalling everything from scratch!

I believe you need to remove line 60:
        if (defined($cell) ) {  
and the end of the block, line 68
}

What line 60 is saying is, Add this cell only if there is somethng in it.