PERL Conversion of XLS to CSV replaces decimals with 'GENERAL'

Webspeeder
Webspeeder used Ask the Experts™
on
Hey all.

I have some perl code (below) that, when it converts a decimal field from the XLS file to the CSV file, it converts the decimal to the word "general".

Any ideas why this happens? I added decimal values to a string field and it came through fine so it must have something to do with the way PERL handles decimals?
#!/usr/bin/perl
# xls2csv.pl - parses an excel spreadsheet and writes it back out as a csv file.
# receives 2 arguement - 
#   $ARGV[0] - Excel filename
#   $ARGV[1] - New Filename
# example: xls2csv.pl setex.xls

use strict; 
# make sure all arguements were provided
if(!(defined $ARGV[0])) { 
    print "Usage: ", $0 , " Excel_File\n"; 
    exit; 
} 

if(!(defined $ARGV[1])) { 
    print "Usage: ", $1 , " Starting Line Number\n"; 
    exit; 
} 

#if(!(defined $ARGV[2])) { 
#    print "Usage: ", $2 , " Need to define input directory\n"; 
#    exit; 
#} 

# Package declarations
use Time::localtime;
use File::Copy;
use Spreadsheet::ParseExcel; 

# Initialize Objects #####################################################
my $oExcel = new Spreadsheet::ParseExcel; 
my $tm = localtime;
my $xcount;
my $ycount;
my $oWkS;
my $oWkC;
my $filename;

# datafile path $ARGV[2] should be either blank or ./, depending on how this script is ran
# If it is ran from within a program, send blank. If it is ran from command line in current directory, send ./
my $datapath=$ARGV[2];
#"./";

# Parse(Read) in the spreadsheet
my $oBook = $oExcel->Parse($datapath.$ARGV[0]); 
my $Rcv = 0;

# Split filename up
my @filesplit = split /\./,$ARGV[0];

# export the file into a csv file format
#my $Sheet = $Book->Worksheets(1);
#$oWkS = $oBook->{Worksheet}[0]
#$oWkS = $oBook->{Worksheet}[0];
#foreach $oWkS (@{$oBook->{Worksheet}}) { 
   $oWkS = $oBook->{Worksheet}[0];
   # check to see if the worksheet has any data, if yes then process
   if (defined $oWkS->{MaxRow}) 
   {
      $filename = $datapath . @filesplit[0] . ".csv";
      open(OUT, ">" . $filename) || die "Could not open ".$filename."\n";    
      
      for(my $xcount = $oWkS->{MinRow}; 
         defined $oWkS->{MaxRow} && $xcount <= $oWkS->{MaxRow}; $xcount++) { 
         if ( $xcount <= $ARGV[1] ) {
             next;
         }
         #for(my $ycount = $oWkS->{MinCol} ; 
         for(my $ycount = 0 ; 
            defined $oWkS->{MaxCol} && $ycount <= $oWkS->{MaxCol} ; $ycount++) { 
                $oWkC = $oWkS->{Cells}[$xcount][$ycount]; 
                print OUT $oWkC->Value, if($oWkC); 
                print OUT "|", if($ycount < $oWkS->{MaxCol});
         }  # end columns for
         print OUT "\n";
      } #end rows for
      close(OUT);
   } # end if
#} # end foreach worksheet

# make a copy of the file for loading
#my $loadfile=$datapath . $ARGV[0] . $ARGV[1] . ".csv";
#copy($filename, $loadfile);

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
Hi Speeder,

Done quite a bit of work like this myself. I personally use the ->unformatted method instead if the ->Value method, but it's been so long since I made that decision I can't recall why exactly I made that choice. But I seem to recall encountering similar problems to those that you are experiencing now.

Line 72:                 print OUT $oWkC->Unformatted, if($oWkC);

Good luck.

Author

Commented:
Thanks for the help.

When I run the program, I get this error:


Can't locate object method "Unformatted" via package "Spreadsheet::ParseExcel::Cell" at /xls2csv.pl line 73.

I added your line as you wrote it. I removed the original.

Commented:
Sorry about that. I was following your syntax in the code you posted here and noticed the ->Value was capitalised, so working from memory, I mistakenly assumed that logically the author of the package would capitalise other methods in the package.

The correction should have read:


Line 72:                 print OUT $oWkC->unformatted(), if($oWkC);

Good luck.
CompTIA Security+

Learn the essential functions of CompTIA Security+, which establishes the core knowledge required of any cybersecurity role and leads professionals into intermediate-level cybersecurity jobs.

Author

Commented:
Still didn't work.


Can't locate object method "unformatted" via package "Spreadsheet::ParseExcel::Cell" at xls2csv.pl line 73.

Any more ideas?


Author

Commented:
I think the problem is that I need the following modules added to the perl installation, except the first.

Spreadsheet::ParseExcel      Read information from an Excel file.               0.57
Spreadsheet::ParseExcel::Cell      A class for Cell data and formatting.               0.57
Spreadsheet::ParseExcel::Dump      A class for dumping Excel records.               0.57
Spreadsheet::ParseExcel::FmtDefault      A class for Cell formats.               0.57
Spreadsheet::ParseExcel::FmtJapan      A class for Cell formats.               0.57
Spreadsheet::ParseExcel::FmtJapan2      A class for Cell formats.               0.57
Spreadsheet::ParseExcel::FmtUnicode      A class for Cell formats.               0.57
Spreadsheet::ParseExcel::Font      A class for Cell fonts.               0.57
Spreadsheet::ParseExcel::Format      A class for Cell formats.               0.57
Spreadsheet::ParseExcel::SaveParser      Rewrite an existing Excel file.               0.57
Spreadsheet::ParseExcel::SaveParser::Workbook      A class for SaveParser Workbooks.               0.57
Spreadsheet::ParseExcel::SaveParser::Worksheet      A class for SaveParser Worksheets.               0.57
Spreadsheet::ParseExcel::Utility      Utility functions for Spreadsheet::ParseExcel.               0.57
Spreadsheet::ParseExcel::Workbook      A class for Workbooks.               0.57
Spreadsheet::ParseExcel::Worksheet      A class for Worksheets.  

Commented:
OK, I see another problem with the method, try this as a replacement statement block in your loop that captures the cell data:
for (  my $ycount = 0; defined $oWkS->{MaxCol} && $ycount <= $oWkS->{MaxCol}; $ycount++ ) { 
                my $cell = oWkS->get_cell($xcount, $ycount);
                print OUT $cell->unformatted(), if($cell->unformatted()); 
                print OUT "|", if($ycount < $oWkS->{MaxCol});
         }

Open in new window

Commented:
Speeder, those sub-modules should all be included in the main package... unless there's some reason your sys-admin would have cack-handedly shoehorn'd in only the parent package.

ie. if they did anything at all like "cpan> install Spreadsheet::ParseExcel" then all should be groovy.

Author

Commented:
no, he wouldn't have done any like that so it should be good. I'll try your code change.

Author

Commented:
Now I get the error on the get_cell call.


Can't locate object method "get_cell" via package "oWkS" (perhaps you forgot to load "oWkS"?) at xls2csv.pl line 71.

Commented:
Sorry, small typo there:

my $cell = oWkS->get_cell($xcount, $ycount);

should be:

my $cell = $oWkS->get_cell($xcount, $ycount);

missed the $ (dollar) symbol of your variable name for the worksheet.

Author

Commented:
Still error.


Can't locate object method "get_cell" via package "Spreadsheet::ParseExcel::Worksheet" at xls2csv.pl line 70.

#!/usr/bin/perl
# xls2csv.pl - parses an excel spreadsheet and writes it back out as a csv file.
# receives 2 arguement - 
#   $ARGV[0] - Excel filename
#   $ARGV[1] - New Filename
# example: xls2csv.pl setex.xls

use strict; 
# make sure all arguements were provided
if(!(defined $ARGV[0])) { 
    print "Usage: ", $0 , " Excel_File\n"; 
    exit; 
} 

if(!(defined $ARGV[1])) { 
    print "Usage: ", $1 , " Starting Line Number\n"; 
    exit; 
} 

#if(!(defined $ARGV[2])) { 
#    print "Usage: ", $2 , " Need to define input directory\n"; 
#    exit; 
#} 

# Package declarations
use Time::localtime;
use File::Copy;
use Spreadsheet::ParseExcel; 

# Initialize Objects #####################################################
my $oExcel = new Spreadsheet::ParseExcel; 
my $tm = localtime;
my $xcount;
my $ycount;
my $oWkS;
my $oWkC;
my $filename;

# datafile path $ARGV[2] should be either blank or ./, depending on how this script is ran
# If it is ran from within a program, send blank. If it is ran from command line in current directory, send ./
my $datapath=$ARGV[2];
#"./";

# Parse(Read) in the spreadsheet
my $oBook = $oExcel->Parse($datapath.$ARGV[0]); 
my $Rcv = 0;

# Split filename up
my @filesplit = split /\./,$ARGV[0];

# export the file into a csv file format
#my $Sheet = $Book->Worksheets(1);
#$oWkS = $oBook->{Worksheet}[0]
#$oWkS = $oBook->{Worksheet}[0];
#foreach $oWkS (@{$oBook->{Worksheet}}) { 
   $oWkS = $oBook->{Worksheet}[0];
   # check to see if the worksheet has any data, if yes then process
   if (defined $oWkS->{MaxRow}) 
   {
      $filename = $datapath . @filesplit[0] . ".csv";
      open(OUT, ">" . $filename) || die "Could not open ".$filename."\n";    
      
      for(my $xcount = $oWkS->{MinRow}; 
         defined $oWkS->{MaxRow} && $xcount <= $oWkS->{MaxRow}; $xcount++) { 
         if ( $xcount <= $ARGV[1] ) {
             next;
         }
         #for(my $ycount = $oWkS->{MinCol} ; 
         for (  my $ycount = 0; defined $oWkS->{MaxCol} && $ycount <= $oWkS->{MaxCol}; $ycount++ ) { 
                my $cell = $oWkS->get_cell($xcount, $ycount);
                print OUT $cell->unformatted(), if($cell->unformatted()); 
                print OUT "|", if($ycount < $oWkS->{MaxCol});
         }
         
         #for(my $ycount = 0 ; 
         #   defined $oWkS->{MaxCol} && $ycount <= $oWkS->{MaxCol} ; $ycount++) { 
         #       $oWkC = $oWkS->{Cells}[$xcount][$ycount]; 
         #       #print OUT $oWkC->Unformatted, if($oWkC); 
         #       print OUT $oWkC->Value, if($oWkC);
         #       print OUT "|", if($ycount < $oWkS->{MaxCol});
         #}  # end columns for
         print OUT "\n";
      } #end rows for
      close(OUT);
   } # end if
#} # end foreach worksheet

# make a copy of the file for loading
#my $loadfile=$datapath . $ARGV[0] . $ARGV[1] . ".csv";
#copy($filename, $loadfile);

Open in new window

Commented:
OK, I've gone through the script from top to bottom and squatted everything that goes wrong to save this back-n-forth process.

A few problems existed... like your column and row markers where inter-placed and the sheet reference was also referenced weirdly.

The attached script does what you need and can easily be used as a foundation for whatever it is your going to do with it afterwards. I'm sure you can take it form here.

By the way you're not really making a CSV file, but you know that - right?
If your looking for more information on the 'official' format of CSV files try this: http://www.rfc-editor.org/rfc/rfc4180.txt

Anyhow, good luck with your project.
#!/usr/bin/perl -w
##################################################################################
# xls2csv.pl - parses an excel spreadsheet and writes it back out as a csv file. #
##################################################################################
use strict; 
use warnings;

# Ensure All Arguements Provided
################################
if ( !(defined $ARGV[0]) ) { 
    print "Usage: ", $0 , " Excel_File\n"; 
    exit; 
} 

if ( !(defined $ARGV[1]) ) { 
    print "Usage: ", $1 , " Starting Line Number\n"; 
    exit; 
} 

# Package Declarations
######################
use Time::localtime;
use File::Copy;
use Spreadsheet::ParseExcel;

# Initialize Objects 
####################
my $oExcel = new Spreadsheet::ParseExcel;
my $tm = localtime;

# Datafile path $ARGV[2] should be either blank or './'
#######################################################
my $datapath;
if ( !(defined $ARGV[2]) ) { $datapath = './' } else { $datapath = $ARGV[2] };

# Parse(Read) in the spreadsheet
################################
my $oBook = $oExcel->Parse($datapath.$ARGV[0]);

# Split filename to change extension later
##########################################
my @filesplit = split /\./,$ARGV[0];

# Assume Data stored in 1st Sheet
#################################
my $oWkS = $oBook->worksheet(0);

# Check to see if the worksheet has any data
############################################
if ( defined $oWkS->{MaxRow} ) {

    # Gather Worksheet Dimensions
    #############################
    my %DIMENSIONS; 
    $DIMENSIONS{minRow} = $oWkS->{MinRow};
    $DIMENSIONS{maxRow} = $oWkS->{MaxRow};
    $DIMENSIONS{minCol} = $oWkS->{MinCol};
    $DIMENSIONS{maxCol} = $oWkS->{MaxCol};

    # Define New Output Filename
    ############################
    my $filename = $datapath . $filesplit[0] . ".csv";

    # Open Filehandle ready for output 
    ##################################
    open(OUT, ">" . $filename) || die "Could not open ".$filename."\n";    

    # Loop through each Row 
    #######################
    for ( my $row = $DIMENSIONS{minRow}; $row <= $DIMENSIONS{maxRow}; $row++ ) { 
        
        # Skip row if start row is higher
        #################################
        if ( $row <= $ARGV[1] ) {
            next;
        }

        # Loop through each Column
        ##########################
        for ( my $column = $DIMENSIONS{minCol}; $column <= $DIMENSIONS{maxCol}; $column++ ) { 
        
            # Gather Data from Cell
            #######################
            my $cell = $oWkS->get_cell($row, $column);
            if ( $cell && $cell->unformatted() ) {

                # Output data to CSV file
                #########################
                print OUT $cell->unformatted();

                # Seperate Cell with Delimter if not end column
                ###############################################
                if ( $column < $DIMENSIONS{maxCol} ) {
                    print OUT "|";
                }

                # New Line if end Column 
                ########################
                else {  
                    print OUT "\n";
                }
            }
        }
    }

    # Close File Handle
    ###################
    close(OUT);
}
exit;

Open in new window

Author

Commented:
Thanks for your help. I'll take it from here.

CSV file, no not technically. I wanted to avoid conflicts with the comma and any potential commas in the incoming text so I made the delimiter a pipe.

Thanks again for your help.

Author

Commented:
Thank you for commenting the code.

Commented:
Just noticed it would work better like this if the excel sheet would contain empty cells, you should still print your delimiters...
# Gather Data from Cell
            #######################
            if ( $cell && $cell->unformatted() ) {

                # Output data to CSV file
                #########################
                print OUT $cell->unformatted();
            }

            # Seperate Cell with Delimter if not end column
            ###############################################
            if ( $column < $DIMENSIONS{maxCol} ) {
                print OUT "|";
            }

            # New Line if end Column 
            ########################
            else {  
                print OUT "\n";
            }

Open in new window

Author

Commented:
There is still an issue with the get_cell but like I said, I'll take it from here. I think it must be a syntax issue. I need to learn these different modules.

I copied the base code and made as few enhancements as I needed since I don't work too often in Perl.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial