Link to home
Start Free TrialLog in
Avatar of Tech_20
Tech_20

asked on

Perl script to process a .csv file

I want to create a Perl script that reads a file based on file name input from the user, processes each line, detects the category in the 4th column, sums each amount in the 3rd column based on if the category in the 4th column is the same and prints a total for each categorical amount. I have made it up to this point (see code and attached .csv file below). Please help. Thanks.

____


#!/usr/bin/perl
use strict;
use warnings;

my $file = <STDIN> or die "Need to get CSV file on the command line\n";

my $sum = 0;
open(my $data, '<', $file) or die "Could not open '$file' $!\n";

while (my $line = <$data>) {
  chomp $line;
  my @fields = split "," , $line;
     # while the category field is the same {
         $sum += $fields[2];
    }
}
print "$sum\n";
Avatar of JesterToo
JesterToo
Flag of United States of America image

Well, this script seems to do that.  It tries to determine iv the CSV file has a header row and skips over it.  It does NOT strip off quotes from the "category" field so they are significant when comparing values from row to row.

It's not pretty but seems to be functional without crashing (didn't put much error checking/handling in it :-)

#!/usr/bin/perl
use strict;
use warnings;

my $argCount = $#ARGV + 1;
if ($argCount < 1) {
   die "\nNeed to get CSV file on the command line\n";
}

my $file = $ARGV[0];
my $sum = 0;
my $category = "";
my $firstrow = 1;       # true
my $firsttime = 1;      # true

open(my $data, '<', $file) or die "Could not open '$file' $!\n";

while (my $line = <$data>) {
  chomp $line;
  my @fields = split "," , $line;

  # assume first row is a header if the amount field is not numeric...
  if ($firstrow) {
     $firstrow = 0;
     if (isNumeric($fields[3]) == 0) {
        next;
     }
  }

  if ($firsttime) {
      $category = $fields[3];
      $firsttime = 0;
  }

  # while the category field is the same {
  if ($fields[3] eq $category) {
     $sum += $fields[2];
  } else {
     print $category, " Total = ", "$sum\n";
     $category = $fields[3];
     $sum = $fields[2];
  }
}
print $category, " Total = ", "$sum\n";

sub isNumeric {
   my ($value) = @_;
   no warnings;
   return 1 if ($value + 0) eq $value;
   return 0;
}

Open in new window

By the way...

I didn't find any csv file attached to your question so I made up my own data for testing.  If you find problems with the script against your data file then please post your data file with any followup questions you may have.
Avatar of Tech_20
Tech_20

ASKER

Hello JesterToo,

Thanks for the post. I tried it with and without headers and the results both times read "Total = 0". The column selections the Perl script are accurate (assuming the usual start at 0). Currently, Experts Exchange is experiencing file upload issues through their provider (from their notice on the website) so I'll plan to upload the .csv file ASAP.
Avatar of Tech_20

ASKER

Hello JesterToo,

Here's the .csv file. Thanks.
PerlFinanceInfo.csv
ASKER CERTIFIED SOLUTION
Avatar of JesterToo
JesterToo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tech_20

ASKER

Hello JesterToo,

I'll finally have a chance to test it soon and give you the results. Thanks.
Avatar of Tech_20

ASKER

Hello JesterToo,

I tried the new script with the code block recognizing the dollar sign and the results still read "Total = 0".
Using the sample file you posted above, here is what I got when I ran the code...

User generated image
Avatar of Tech_20

ASKER

Thanks for the screenshot. There must something in my execution. I'm using OS X terminal and using the standard syntax "perl <filename.pl> PerlFinanceInfo.csv" and still get  "Total = 0". I just noticed when I typed in the "ls" command it didn't show the PerlFinanceInfo.csv file while it did show other .csv files. I'll plan to use another .csv file and test it.
Avatar of Tech_20

ASKER

Hello JesterToo,

I created another .csv file, copied the same data, ensured it was visible in the same folder using the “ls” command and then ran your program. The following output shows it calculated but prints differently (see below). Notice the space before the word, “Total”.

 Total = 1029.8
 Total = 101.11
 Total = 131.23
Category 3 Total = 46.32

Since we’re apparently both using your code. I’m concluding the discrepancy is due to me using Terminal on Mac OS X as opposed to you using Windows command line. Do you have any other suggestions? Thanks.
Did your printout not include the category names for the totals?

I don't currently have a working Mac to test with (waiting on a part to repair it).  Seems like the discrepancies are all in the "print" statements.
There are 2 print statements as you can see... one for each category change event and one for the very last one.  I can only guess at this point that the difference is in how OSX and Windows is handling the "\n" or that your version of Perl does something differently than mine.

Which version of Perl are you using?  I used Strawberry Perl for Windows and I think they also have an OSX version as well... might e interesting if you installed that one (it's free) to see if it still prints wonky.

You could also experiment with altering the print statements to see if you can find how to get it to work better for you.

The part for my Mac will likely be here at the end of the week, so I'm kinda delayed in doing much more til then.
Try replacing both of the print statements with this one:

printf("%s Total = %.2f\n", $category, $sum);  

Open in new window


It will print the amount ($sum) as a formatted number.

Be advised that the script I provided does not handle amount fields that might contain negative sign nor commas... if you need that I have another script that does.
Correction... it DOES handle negative numbers if the minus sign is leading such as:  -123.45 or $-123.45... it does NOT handle trailing minus signs such as 123.45- or $123,45-.  The trailing minus sign is often used when printing columns of numbers such as in a report so that they stand out better and preserve number alignment.
Avatar of Tech_20

ASKER

Thanks for the reply. I am using Perl 5, version 18, subversion 2 (v5.18.2) built for darwin-thread-multi-2level.

I tried your AWK print code and received virtually the same output (see below). I didn't see Strawberry Perl for OS X using a search engine but I will plan to try other print statements and check for other Perl versions and other potential 3rd party apps/utilities.

 Total = 1029.80
 Total = 101.11
 Total = 131.23
Category 3 Total = 46.32
Sorry about the mis-lead on Strawberry Perl for OS X.

I just spun-up a VM and insalled macOS Sierra in it... perl version 5.18.2.  I got the same results you got.

This appears to me to be a bug in that version of perl (at least, the OS X version)... but, I'm not sufficiently well versed in Unix or Perl to say that for sure.

The Strawberry for Windows version I used is 5.24.1.  I'm going to try this script in one of my Linux VM's to compare results and I'll try to upgrade the mac version to something newer.
Well, this just got weirder...

It seems the macos version isn't printing the content of $category in the first print statement but it does in the second one.

The linux (ubuntu 16.04) version of perl is 5.22.1.  It fails to print the content of $category in both print statements.  I am going to attempt to install a different version of perl in macos but with 3 different outputs using same script and same data I don't know what this might prove!

This is the site I'm going to use for new version of perl:   https://learn.perl.org/installing/osx.html
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tech_20

ASKER

Hello JesterToo,

The tr (translate) command and corresponding options did the job and resolved the data file issue! I tested the new file with both scripts (Perl and AWK print statements) and they both printed the same output. Thanks also for taking the time to test the script on different operating systems.