Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 249
  • Last Modified:

collecting elements of data

ok
now the output I am getting is this :
232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6

now I have to group the average values, and count how many of those values fall into each group, and show it as a graph, somehow

for example, the first averages there are 571.50, 551.50, 551.17, and 531.17
so for this set of data the groups would be
500 - 510  0
511 - 520  0
521 - 530  0
531 - 540  1
541 - 550  0
551 - 560  2
561 - 570  0
571 - 580  1
581 - 590  0
591 - 600  0

then if I could show that as a bar chart, that would be great. I could do that in MS excel if it is easier.

the thing is though, my data sets use quite a range of data, so if possible, I would like to be able to change the groupings.
0
Europa MacDonald
Asked:
Europa MacDonald
  • 22
  • 10
  • 5
  • +1
2 Solutions
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
I have to search through this output data and open a new file and put the new data into that
0
 
Adam314Commented:
When your groups are uniformly spaced like this, you can determine which group a number is in by subtracting the minimum, then dividing by the group size, rounding down.

Here is some code showing this calculation.  You would put lines 5-8 in your existing program.  The rest is there to demonstrate how it works.
my @averages = qw(571.50 551.50 551.17 531.17);
 
my @groups = (0)x10;
foreach (@averages) {
	my $group = ($_ - 501)/10;   #This calculates the group
	$group = 0 if $group < 0;    #In case you have a number below the expected min
	$group = 9 if $group > 9;    #In case you have a number above the expected max
	$groups[$group]++;
}
 
print join(", ", @groups) . "\n";

Open in new window

0
 
ozoCommented:
#if you don't know the range beforehand, you might do something like
my($min,$max)=@averages[0,0];
my %groups;
for( @averages ){
  $groups{int $_/10}++;
  $min=$_ if $_<$min;
  $max=$_ if $_>$max;
}
for( $min/10..$max/10 ){
 printf"%d - %d  %d\n",$_*10,($_+1)*10,$groups{$_}
}
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Fairlight2cxCommented:
As for graphing, look at getting and using the GD::Graph module:

http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
i will know the ranges.

some of them would be 1-1000, 501-550, and probably 1 - 70

I have millions of lines of data to sort through, so if the code could read each line into an array, and I could manually change the range, and output the results to another file  ?
0
 
Adam314Commented:
>>if the code could read each line into an array, and I could manually change the range, and output the results to another file
Not really sure what you want the code to do.  Do you want to manually change the range?
Or should the range be selected from one of your predefined ranges (1-1000, 501-550, 1-70) based on the numbers?  
Should a range be automatically chosen based on the numbers?
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
well I could choose the range before I start if you make it obvious in the code.

I want the code to read a line of numbers, decide which numbers fall into which groups, and tally the amount of numbers which has fallen into each group.

Does that make better sense ?
0
 
Adam314Commented:
Here is some sample code.  You would set $Group_Min, $Group_Max and $Group_Count based on how you want your groups.

For any number, to determine which group it is in:
1) Subtract the $Group_Min
2) Divide (1) by $Group_Count
3) Clip (2) to the range 0..$Group_Count-1

my $Group_Min = 501;
my $Group_Max = 550;
my $Group_Count = 10;
 
my @averages = qw(571.50 551.50 551.17 531.17);
 
my @groups = (0)x$Group_Count;
foreach (@averages) {
        my $group = ($_ - $Group_Min)/$Group_Count;   #This calculates the group
        $group = 0 if $group < 0;    #In case you have a number below the expected min
        $group = $Group_Count-1 if $group >= $Group_Count;    #In case you have a number above the expected max
        $groups[$group]++;
}
 
print join(", ", @groups) . "\n";

Open in new window

0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
ok

the my @averages = qw(571.50 551.50 551.17 531.17);

would this not be taken from a text file ?
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
so would that look like this in code to open data file and write to another file ?

use strict;
use warnings;

 

 
##### Open files

open(my $IN,"outvim2.vim") or die "Could not open input: $!\n";
open(my $OUT,">groupsort.vim") or die "Could not open output: $!\n";


my $Group_Min = 501;
my $Group_Max = 550;
my $Group_Count = 10;
 
my @averages = qw(571.50 551.50 551.17 531.17);
 
my @groups = (0)x$Group_Count;
foreach (@averages) {
        my $group = ($_ - $Group_Min)/$Group_Count;   #This calculates the group
        $group = 0 if $group < 0;    #In case you have a number below the expected min
        $group = $Group_Count-1 if $group >= $Group_Count;    #In case you have a number above the expected max
        $groups[$group]++;
}
 
print OUT join(", ", @groups) . "\n"
0
 
Adam314Commented:
Based on your other posts, I would think the averages would come from other processing.  If you had a variable like $average that contained the average, and you wanted to count the groups for that:

my $Group_Min = 501;
my $Group_Max = 550;
my $Group_Count = 10;
 
while(<>) {    #This is in your existing program - loops over lines of a file
    ...
    Calculate $average 
    
    ##### To count the groups for this:
    my $ThisGroup = ($average - $Group_Min) / $Group_Count;
    $ThisGroup = 0 if $ThisGroup < 0;
    $ThisGroup = $Group_Count - 1 if $ThisGroup >= $Group_Count;
    $groups[$ThisGroup]++;
    
    ##### You could then print @groups, or do whatever
}

Open in new window

0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
Any advice on this code ? it is returning loads of errors


use strict;
use warnings;

 

 
##### Open files

      open(my $IN,"outvim2.vim") or die "Could not open input: $!\n";
      open(my $OUT,">groupsort.vim") or die "Could not open output: $!\n";


      my $Group_Min = 100;
      my $Group_Max = 600;
      my $Group_Count = 10;
 
      while(<>) {    #This is in your existing program - loops over lines of a file
    ...
    Calculate $average
   
##### To count the groups for this

    my $ThisGroup = ($average - $Group_Min) / $Group_Count;
    $ThisGroup = 0 if $ThisGroup < 0;
    $ThisGroup = $Group_Count - 1 if $ThisGroup >= $Group_Count;
    $groups[$ThisGroup]++;
   
##### You could then print @groups, or do whatever


      }
 
            print $OUT (@groups) "\n";

            printf (@groups1);
            printf $OUT (@groups1;



      

0
 
Adam314Commented:
Is this going to be it's own program, or a piece of an existing program?  I thought it was going to be part of an existing program.

If it's on it's own, where does it get the average?  What format is that file?

If it's part of another program, post that other program.
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
oh I see, Im sorry, I didnt explain this entirely.

I am using another program to sort data. Then with the sorted data, I want to run this code as a program on its own. The reason is that the first program I am running takes a lot of time, so i thought it would be better to separate out the tasks needing done to different programs. So yes, I need this code to run and process as an individual program.

>If it's on it's own, where does it get the average?
there is a list of 10 million lines, with the average written there. One to each line sorted from the previous data, so I need this code to read each line (which is an average of other data) and group that average value.

>What format is that file?
text files , I am using Vim

I need to know how many of the average values fall into each grouping
0
 
Adam314Commented:
Post a sample of the file that is the input to this program.
0
 
Adam314Commented:
Is it this, from your original question:
    232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
    232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
    234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
    134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6

So what average should be used?  The avg1?  The avg2?  Both?

What format do you want the output?
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
yes, that is the format of the input file.

I am not wanting to calculate an average. I want to count how many averages fall within:
with this format :

group 1 1-100:      (total number of values in this group (avg1)) : (total number of values in this group (avg2))
group 2 101 -200:  (total number of values in this group (avg1)) : (total number of values in this group (avg2))
....
group 10 901 - 1000:  (total number of values in this group (avg1)) : (total number of values in this group (avg2))



I need to see if there is a pattern to the average values
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
thankyou for your patience, describing this is more tricky than I thought
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
is this too much guys ?
0
 
Adam314Commented:
No need to delete.  In order to get the code to do what you need, we need to know precisely what you need.  This back and forth is not uncommon.

Save this to a file.  Call with the name of your input on command line.  For example, if you save this to script.pl, and your input in invim.vim, call like:
    script.pl invim.vim

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
 
my $Group_Min = 501;
my $Group_Max = 550;
my $Group_Count = 10;
 
my (@groups1, @groups2);   #Store number in group of avg1 and avg2
 
@$_ = (0)x$Group_Count foreach (\@groups1, \@groups2);
 
while(<>) {
	next unless /avg1,([\d\.]+).*avg2,([\d\.]+)/;
	my ($avg1, $avg2) = ($1, $2);
	print "avg1=$avg1     avg2=$avg2\n";
	$groups1[Clip(($avg1 - $Group_Min)/$Group_Count)]++;
	$groups2[Clip(($avg2 - $Group_Min)/$Group_Count)]++;
}
 
print "Average 1 groups: " , join("  ", @groups1) . "\n";
print "Average 2 groups: " , join("  ", @groups2) . "\n";
 
 
sub Clip {
	return 0 if $_[0] < 0;
	return $Group_Count-1 if $_[0]>=$Group_Count;
	return $_[0];
}

Open in new window

0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
its ok, it doesnt need deleted
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
Ok that last script is heading in the right direction for me. Thankyou very much.

Would it be possible to actually define the groups in the script. By that I mean could it be written so that the groups are defined as in maybe
1-100,101-200, 201-300, and so on, and then count how many values fall into each group.

Then, for example, the output could be like this :
group 1 - 3
group 2 - 5
group 3 - 1
and so on through all the groups ?

If the groups were actually defined like this, then I could change them as required.

for example, the code reads a line from the input file,
the first average is 530, so that increments the group 501-600 by one,
then the code moves to the next line and does the same, but maybe finds the first average to be 304
so the group 301-400 gets incremented by one

I hope this is clearer
0
 
ozoCommented:
$[=1;
my @groups = split/,/,"1-100,101-200, 201-300";
s/(\d+)\D+(\d+)/sub{\$_[0]>=$1&&\$_[0]<=$2}/ and $_=eval for @groups;        
my @averages = qw(571.50 551.50 551.17 531.17);
my @group;
for my $a (@averages){
   $group[$_]++ for grep{$groups[$_]($a)}$[..$#groups;
}
print "group $_ - $group[$_]\n" for $[..$#group;
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
we have a problem

this:
    232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
    232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
    234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
    134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
is just an example of 10-20 million lines of data.
this doesnt look right to me: my @averages = qw(571.50 551.50 551.17 531.17); I dont need these defined.

I will gather some more and post
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
right, that last code from Adam was almost there.

this is the data, I have millions of lines of it:

232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6

I need to group the values and count how many values in that group. the output should look like this:

Group 1: (501-510) : 0
Group 2: (511-520) : 0
Group 3: (521-530) : 2
Group 4: (531-540) : 0
Group 5: (541-550) : 0
Group 6: (551-560) : 4
Group 7: (561-570) : 0
Group 8: (571-580) : 2
Group 9: (571-590) : 0
Group 10: (581-600) : 0

total count : 8

Now as I have several data groups to choose from, having the groups defined in the code would allow me to redefine the groups myself.
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
that was for avg1, the same is needed for avg, maybe side by side or in another file
0
 
ozoCommented:
my $groups="1-100,101-200, 201-300,500-550,550-600";
my $g0=0;
$groups =~ s/(\d+)\D*?-\D*(\d+)\D*/++\$group{@{[++$g0]}}if \$_ >= $1 && \$_<=$2;/g;
$groups=eval"sub{$groups}";
while( <DATA> ){
 &$groups for /avg[12],([\d.]+)/g;
}
print "group $_ - $group{$_}\n" for sort keys %group;
__DATA__
    232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
    232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
    234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
    134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
ozo, where does that code go in terms of putting it in a file ?
0
 
ozoCommented:
my $groups="501-510,511-520,521-530,531-540,541-550,551-560,561-5790,571-580,571-590,581-600";
my $g0=0;
$groups =~ s/(\d+)\D*?-\D*(\d+)\D*/++\$group{"@{[++$g0]}: ($1-$2)"}if \$_ >= $1 && \$_<=$2;/g;
$groups=eval"sub{$groups;\$total++}";
our $total;
our %group;
while( <DATA> ){
 &$groups for /avg1,([\d.]+)/g;
}
print "Group $_ $group{$_}\n" for sort {$a<=>$b} keys %group;
print "\ntotal count : $total\n"
__DATA__
232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
I have to read the data from another text file
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
hey guys, can I just say this is not an academic dishonesty issue, this is not an assignment. I am post graduate, doing a completely unrelated project. I have taken the initiative to use and hopefully learn perl. I dont have to use it and this stuff isnt part of any assessment.
0
 
ozoCommented:
to read from @ARGV, change <DATA> to <>
0
 
Adam314Commented:
The grouping logic is based on what ozo gave, but modified to allow for separate groups for avg1 and avg2.  This gets it's data from an input file, like what I posted before.

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
 
##### Define your groups ranges here
my $groups1="501-510,511-520,521-530,531-540,541-550,551-560,561-570,571-580,581-590,591-600";
my $groups2="101-110,111-120,121-130,131-140,141-150,151-160,161-170,171-180,181-190,191-200";
 
##### Convert string to function
my (%groups1, %groups2, $total);
my $g0=0;
$groups1 =~ s/(\d+)\D*?-\D*(\d+)\D*/++\$groups1{"@{[++$g0]}: ($1-$2)"}if \$_[0] >= $1 && \$_[0]<=$2;\n/g;
$g0=0;
$groups2 =~ s/(\d+)\D*?-\D*(\d+)\D*/++\$groups2{"@{[++$g0]}: ($1-$2)"}if \$_[0] >= $1 && \$_[0]<=$2;\n/g;
 
{
no strict;
$groups1=eval"sub{$groups1;\$total++}";
$groups2=eval"sub{$groups2}";
}
 
while(<>) {
	next unless /avg1,([\d\.]+).*avg2,([\d\.]+)/;
	my ($avg1, $avg2) = ($1, $2);
	$groups1->($avg1);
	$groups2->($avg2);
}
 
print "Average1 grouping:\n";
foreach (sort keys %groups1) {
	print "  $_ = $groups1{$_}\n";
}
 
print "Average2 grouping:\n";
foreach (sort keys %groups2) {
	print "  $_ = $groups2{$_}\n";
}
 
print "Total: $total\n";

Open in new window

0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
this is what Im getting
untitled.JPG
0
 
Adam314Commented:
That means that there were no averages found in the ranges specified.  Are the ranges in the code (line 7 and 8) appropriate for that file?  Does the file you processed look like your example?  Can you attach a sample of the input file you used.
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
hold it, wrong data group :-) sorry
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
ok, that works fine for my data set.

who gets the points ?
0
 
Europa MacDonaldChief slayer of dragonsAuthor Commented:
thanks again
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 22
  • 10
  • 5
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now