Could this code miss some values out ?

The code below counts the frequency of values through my data.

Could it miss out any values, or is PERL meticulous in its process of counting ?

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my $file = "master sorted 002.vim";
open my $fh, '<', $file or die "Failed to open '$file' <$!>";

my %count;
while (my $line = <$fh>) {
    my $field = (split /,/, $line)[8];
    $count{"$field"}++;
}
close $fh;

open my $out_fh, '>', 'output.txt' or die "failed to open output.txt";
while (my ($key, $value) = each %count) {
    print $out_fh "$key $value\n";
}
close $out_fh;
Europa MacDonaldChief slayer of dragonsAsked:
Who is Participating?
 
ozoConnect With a Mentor Commented:
It won't miss any values assuming your definition of "values" corresponds to the items it is counting.
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
the values that seem to be missing are similar to the other values it has counted.
0
 
ozoCommented:
What values seem to be missing, and what are the lines containing those values?
0
Get expert help—faster!

Need expert help—fast? Use the Help Bell for personalized assistance getting answers to your important questions.

 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
part of the sorted data is below. The gaps are the missing values.

20013      43200
20130      43200
20103      
20031      
13200      
13020      48600
13002      48600
12300      
12030      48600
12003      48600
10320      48600

I dont see how PERL could have missed out some of those.

I also have the problem that when I insert estimated values (48,600) I have a much higher total than I should have.

maybe I could ask for some code to go in and look for the missing values ?
0
 
ozoCommented:
Those lines have no commas, and you are only counting things between the 8th and 9th comma on a line.
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
no, sorry that last sample list is part of the result of the sorting

I was just demonstrating the values that have not been counted
0
 
ozoCommented:
What are the lines in "master sorted 002.vim" where those values appear?
If they are not between the  8th and 9th comma, they will not be counted.
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
My main list has 20 million rows.

Im pretty certain the commas are all where they should be. I will double check now.
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
just manually checked. Everything in the list is as it should be - commas etc, and the values are there.
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
I think I have found the error, thanks for the advice
0
 
ozoCommented:
The output of the script is not sorted, the each operator goes through a hash in random order.
If there is not a nineth comma on a line, $field will include the newline at the end,
so when you print "$key $value\n", $value will be on the following line
which means that for
20103      
20031    
the 20031 would be the $value for the $key of  20103  
but those should always come in pairs, so it doesn't make sense that there would be an odd number of lines in a row with only one item.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.