Solved

collecting elements of data

Posted on 2009-07-12
38
215 Views
Last Modified: 2012-05-07
ok
now the output I am getting is this :
232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6

now I have to group the average values, and count how many of those values fall into each group, and show it as a graph, somehow

for example, the first averages there are 571.50, 551.50, 551.17, and 531.17
so for this set of data the groups would be
500 - 510  0
511 - 520  0
521 - 530  0
531 - 540  1
541 - 550  0
551 - 560  2
561 - 570  0
571 - 580  1
581 - 590  0
591 - 600  0

then if I could show that as a bar chart, that would be great. I could do that in MS excel if it is easier.

the thing is though, my data sets use quite a range of data, so if possible, I would like to be able to change the groupings.
0
Comment
Question by:MichaelGlancy
  • 22
  • 10
  • 5
  • +1
38 Comments
 

Author Comment

by:MichaelGlancy
ID: 24836584
I have to search through this output data and open a new file and put the new data into that
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24836708
When your groups are uniformly spaced like this, you can determine which group a number is in by subtracting the minimum, then dividing by the group size, rounding down.

Here is some code showing this calculation.  You would put lines 5-8 in your existing program.  The rest is there to demonstrate how it works.
my @averages = qw(571.50 551.50 551.17 531.17);
 

my @groups = (0)x10;

foreach (@averages) {

	my $group = ($_ - 501)/10;   #This calculates the group

	$group = 0 if $group < 0;    #In case you have a number below the expected min

	$group = 9 if $group > 9;    #In case you have a number above the expected max

	$groups[$group]++;

}
 

print join(", ", @groups) . "\n";

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 24836965
#if you don't know the range beforehand, you might do something like
my($min,$max)=@averages[0,0];
my %groups;
for( @averages ){
  $groups{int $_/10}++;
  $min=$_ if $_<$min;
  $max=$_ if $_>$max;
}
for( $min/10..$max/10 ){
 printf"%d - %d  %d\n",$_*10,($_+1)*10,$groups{$_}
}
0
 
LVL 7

Expert Comment

by:Fairlight2cx
ID: 24837387
As for graphing, look at getting and using the GD::Graph module:

http://search.cpan.org/~bwarfield/GDGraph-1.44/Graph.pm
0
 

Author Comment

by:MichaelGlancy
ID: 24840076
i will know the ranges.

some of them would be 1-1000, 501-550, and probably 1 - 70

I have millions of lines of data to sort through, so if the code could read each line into an array, and I could manually change the range, and output the results to another file  ?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24843123
>>if the code could read each line into an array, and I could manually change the range, and output the results to another file
Not really sure what you want the code to do.  Do you want to manually change the range?
Or should the range be selected from one of your predefined ranges (1-1000, 501-550, 1-70) based on the numbers?  
Should a range be automatically chosen based on the numbers?
0
 

Author Comment

by:MichaelGlancy
ID: 24843146
well I could choose the range before I start if you make it obvious in the code.

I want the code to read a line of numbers, decide which numbers fall into which groups, and tally the amount of numbers which has fallen into each group.

Does that make better sense ?
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24843262
Here is some sample code.  You would set $Group_Min, $Group_Max and $Group_Count based on how you want your groups.

For any number, to determine which group it is in:
1) Subtract the $Group_Min
2) Divide (1) by $Group_Count
3) Clip (2) to the range 0..$Group_Count-1

my $Group_Min = 501;

my $Group_Max = 550;

my $Group_Count = 10;
 

my @averages = qw(571.50 551.50 551.17 531.17);

 

my @groups = (0)x$Group_Count;

foreach (@averages) {

        my $group = ($_ - $Group_Min)/$Group_Count;   #This calculates the group

        $group = 0 if $group < 0;    #In case you have a number below the expected min

        $group = $Group_Count-1 if $group >= $Group_Count;    #In case you have a number above the expected max

        $groups[$group]++;

}

 

print join(", ", @groups) . "\n";

Open in new window

0
 

Author Comment

by:MichaelGlancy
ID: 24843478
ok

the my @averages = qw(571.50 551.50 551.17 531.17);

would this not be taken from a text file ?
0
 

Author Comment

by:MichaelGlancy
ID: 24843547
so would that look like this in code to open data file and write to another file ?

use strict;
use warnings;

 

 
##### Open files

open(my $IN,"outvim2.vim") or die "Could not open input: $!\n";
open(my $OUT,">groupsort.vim") or die "Could not open output: $!\n";


my $Group_Min = 501;
my $Group_Max = 550;
my $Group_Count = 10;
 
my @averages = qw(571.50 551.50 551.17 531.17);
 
my @groups = (0)x$Group_Count;
foreach (@averages) {
        my $group = ($_ - $Group_Min)/$Group_Count;   #This calculates the group
        $group = 0 if $group < 0;    #In case you have a number below the expected min
        $group = $Group_Count-1 if $group >= $Group_Count;    #In case you have a number above the expected max
        $groups[$group]++;
}
 
print OUT join(", ", @groups) . "\n"
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24843670
Based on your other posts, I would think the averages would come from other processing.  If you had a variable like $average that contained the average, and you wanted to count the groups for that:

my $Group_Min = 501;

my $Group_Max = 550;

my $Group_Count = 10;
 

while(<>) {    #This is in your existing program - loops over lines of a file

    ...

    Calculate $average 

    

    ##### To count the groups for this:

    my $ThisGroup = ($average - $Group_Min) / $Group_Count;

    $ThisGroup = 0 if $ThisGroup < 0;

    $ThisGroup = $Group_Count - 1 if $ThisGroup >= $Group_Count;

    $groups[$ThisGroup]++;

    

    ##### You could then print @groups, or do whatever

}

Open in new window

0
 

Author Comment

by:MichaelGlancy
ID: 24843901
Any advice on this code ? it is returning loads of errors


use strict;
use warnings;

 

 
##### Open files

      open(my $IN,"outvim2.vim") or die "Could not open input: $!\n";
      open(my $OUT,">groupsort.vim") or die "Could not open output: $!\n";


      my $Group_Min = 100;
      my $Group_Max = 600;
      my $Group_Count = 10;
 
      while(<>) {    #This is in your existing program - loops over lines of a file
    ...
    Calculate $average
   
##### To count the groups for this

    my $ThisGroup = ($average - $Group_Min) / $Group_Count;
    $ThisGroup = 0 if $ThisGroup < 0;
    $ThisGroup = $Group_Count - 1 if $ThisGroup >= $Group_Count;
    $groups[$ThisGroup]++;
   
##### You could then print @groups, or do whatever


      }
 
            print $OUT (@groups) "\n";

            printf (@groups1);
            printf $OUT (@groups1;



      

0
 
LVL 39

Expert Comment

by:Adam314
ID: 24844005
Is this going to be it's own program, or a piece of an existing program?  I thought it was going to be part of an existing program.

If it's on it's own, where does it get the average?  What format is that file?

If it's part of another program, post that other program.
0
 

Author Comment

by:MichaelGlancy
ID: 24844383
oh I see, Im sorry, I didnt explain this entirely.

I am using another program to sort data. Then with the sorted data, I want to run this code as a program on its own. The reason is that the first program I am running takes a lot of time, so i thought it would be better to separate out the tasks needing done to different programs. So yes, I need this code to run and process as an individual program.

>If it's on it's own, where does it get the average?
there is a list of 10 million lines, with the average written there. One to each line sorted from the previous data, so I need this code to read each line (which is an average of other data) and group that average value.

>What format is that file?
text files , I am using Vim

I need to know how many of the average values fall into each grouping
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24844434
Post a sample of the file that is the input to this program.
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24844446
Is it this, from your original question:
    232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
    232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
    234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
    134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6

So what average should be used?  The avg1?  The avg2?  Both?

What format do you want the output?
0
 

Author Comment

by:MichaelGlancy
ID: 24844523
yes, that is the format of the input file.

I am not wanting to calculate an average. I want to count how many averages fall within:
with this format :

group 1 1-100:      (total number of values in this group (avg1)) : (total number of values in this group (avg2))
group 2 101 -200:  (total number of values in this group (avg1)) : (total number of values in this group (avg2))
....
group 10 901 - 1000:  (total number of values in this group (avg1)) : (total number of values in this group (avg2))



I need to see if there is a pattern to the average values
0
 

Author Comment

by:MichaelGlancy
ID: 24844547
thankyou for your patience, describing this is more tricky than I thought
0
 

Author Comment

by:MichaelGlancy
ID: 24844908
is this too much guys ?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 39

Expert Comment

by:Adam314
ID: 24844987
No need to delete.  In order to get the code to do what you need, we need to know precisely what you need.  This back and forth is not uncommon.

Save this to a file.  Call with the name of your input on command line.  For example, if you save this to script.pl, and your input in invim.vim, call like:
    script.pl invim.vim

#!/usr/bin/perl

use strict;

use warnings;

use Data::Dumper;
 

my $Group_Min = 501;

my $Group_Max = 550;

my $Group_Count = 10;
 

my (@groups1, @groups2);   #Store number in group of avg1 and avg2
 

@$_ = (0)x$Group_Count foreach (\@groups1, \@groups2);
 

while(<>) {

	next unless /avg1,([\d\.]+).*avg2,([\d\.]+)/;

	my ($avg1, $avg2) = ($1, $2);

	print "avg1=$avg1     avg2=$avg2\n";

	$groups1[Clip(($avg1 - $Group_Min)/$Group_Count)]++;

	$groups2[Clip(($avg2 - $Group_Min)/$Group_Count)]++;

}
 

print "Average 1 groups: " , join("  ", @groups1) . "\n";

print "Average 2 groups: " , join("  ", @groups2) . "\n";
 
 

sub Clip {

	return 0 if $_[0] < 0;

	return $Group_Count-1 if $_[0]>=$Group_Count;

	return $_[0];

}

Open in new window

0
 

Author Comment

by:MichaelGlancy
ID: 24863470
its ok, it doesnt need deleted
0
 

Author Comment

by:MichaelGlancy
ID: 24863805
Ok that last script is heading in the right direction for me. Thankyou very much.

Would it be possible to actually define the groups in the script. By that I mean could it be written so that the groups are defined as in maybe
1-100,101-200, 201-300, and so on, and then count how many values fall into each group.

Then, for example, the output could be like this :
group 1 - 3
group 2 - 5
group 3 - 1
and so on through all the groups ?

If the groups were actually defined like this, then I could change them as required.

for example, the code reads a line from the input file,
the first average is 530, so that increments the group 501-600 by one,
then the code moves to the next line and does the same, but maybe finds the first average to be 304
so the group 301-400 gets incremented by one

I hope this is clearer
0
 
LVL 84

Expert Comment

by:ozo
ID: 24864261
$[=1;
my @groups = split/,/,"1-100,101-200, 201-300";
s/(\d+)\D+(\d+)/sub{\$_[0]>=$1&&\$_[0]<=$2}/ and $_=eval for @groups;        
my @averages = qw(571.50 551.50 551.17 531.17);
my @group;
for my $a (@averages){
   $group[$_]++ for grep{$groups[$_]($a)}$[..$#groups;
}
print "group $_ - $group[$_]\n" for $[..$#group;
0
 

Author Comment

by:MichaelGlancy
ID: 24864284
we have a problem

this:
    232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
    232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
    234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
    134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
is just an example of 10-20 million lines of data.
this doesnt look right to me: my @averages = qw(571.50 551.50 551.17 531.17); I dont need these defined.

I will gather some more and post
0
 

Author Comment

by:MichaelGlancy
ID: 24864332
right, that last code from Adam was almost there.

this is the data, I have millions of lines of it:

232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6

I need to group the values and count how many values in that group. the output should look like this:

Group 1: (501-510) : 0
Group 2: (511-520) : 0
Group 3: (521-530) : 2
Group 4: (531-540) : 0
Group 5: (541-550) : 0
Group 6: (551-560) : 4
Group 7: (561-570) : 0
Group 8: (571-580) : 2
Group 9: (571-590) : 0
Group 10: (581-600) : 0

total count : 8

Now as I have several data groups to choose from, having the groups defined in the code would allow me to redefine the groups myself.
0
 

Author Comment

by:MichaelGlancy
ID: 24864338
that was for avg1, the same is needed for avg, maybe side by side or in another file
0
 
LVL 84

Expert Comment

by:ozo
ID: 24864460
my $groups="1-100,101-200, 201-300,500-550,550-600";
my $g0=0;
$groups =~ s/(\d+)\D*?-\D*(\d+)\D*/++\$group{@{[++$g0]}}if \$_ >= $1 && \$_<=$2;/g;
$groups=eval"sub{$groups}";
while( <DATA> ){
 &$groups for /avg[12],([\d.]+)/g;
}
print "group $_ - $group{$_}\n" for sort keys %group;
__DATA__
    232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
    232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
    234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
    134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
0
 

Author Comment

by:MichaelGlancy
ID: 24864485
ozo, where does that code go in terms of putting it in a file ?
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 200 total points
ID: 24864594
my $groups="501-510,511-520,521-530,531-540,541-550,551-560,561-5790,571-580,571-590,581-600";
my $g0=0;
$groups =~ s/(\d+)\D*?-\D*(\d+)\D*/++\$group{"@{[++$g0]}: ($1-$2)"}if \$_ >= $1 && \$_<=$2;/g;
$groups=eval"sub{$groups;\$total++}";
our $total;
our %group;
while( <DATA> ){
 &$groups for /avg1,([\d.]+)/g;
}
print "Group $_ $group{$_}\n" for sort {$a<=>$b} keys %group;
print "\ntotal count : $total\n"
__DATA__
232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
232,356,434,767,786,854,avg1,571.50,0011100210,124,78,333,19,68,avg2,124.40
232,336,454,716,727,844,avg1,551.50,0011100210,104,118,262,11,117,avg2,122.40
234,239,316,768,796,954,avg1,551.17,0021000201,5,77,452,28,158,avg2,144.00
134,256,512,716,717,852,avg1,531.17,0110010210,122,256,204,1,135,avg2,143.6
0
 

Author Comment

by:MichaelGlancy
ID: 24864599
I have to read the data from another text file
0
 

Author Comment

by:MichaelGlancy
ID: 24864947
hey guys, can I just say this is not an academic dishonesty issue, this is not an assignment. I am post graduate, doing a completely unrelated project. I have taken the initiative to use and hopefully learn perl. I dont have to use it and this stuff isnt part of any assessment.
0
 
LVL 84

Expert Comment

by:ozo
ID: 24864977
to read from @ARGV, change <DATA> to <>
0
 
LVL 39

Accepted Solution

by:
Adam314 earned 300 total points
ID: 24865111
The grouping logic is based on what ozo gave, but modified to allow for separate groups for avg1 and avg2.  This gets it's data from an input file, like what I posted before.

#!/usr/bin/perl

use strict;

use warnings;

use Data::Dumper;

 

##### Define your groups ranges here

my $groups1="501-510,511-520,521-530,531-540,541-550,551-560,561-570,571-580,581-590,591-600";

my $groups2="101-110,111-120,121-130,131-140,141-150,151-160,161-170,171-180,181-190,191-200";
 

##### Convert string to function

my (%groups1, %groups2, $total);

my $g0=0;

$groups1 =~ s/(\d+)\D*?-\D*(\d+)\D*/++\$groups1{"@{[++$g0]}: ($1-$2)"}if \$_[0] >= $1 && \$_[0]<=$2;\n/g;

$g0=0;

$groups2 =~ s/(\d+)\D*?-\D*(\d+)\D*/++\$groups2{"@{[++$g0]}: ($1-$2)"}if \$_[0] >= $1 && \$_[0]<=$2;\n/g;
 

{

no strict;

$groups1=eval"sub{$groups1;\$total++}";

$groups2=eval"sub{$groups2}";

}
 

while(<>) {

	next unless /avg1,([\d\.]+).*avg2,([\d\.]+)/;

	my ($avg1, $avg2) = ($1, $2);

	$groups1->($avg1);

	$groups2->($avg2);

}
 

print "Average1 grouping:\n";

foreach (sort keys %groups1) {

	print "  $_ = $groups1{$_}\n";

}
 

print "Average2 grouping:\n";

foreach (sort keys %groups2) {

	print "  $_ = $groups2{$_}\n";

}
 

print "Total: $total\n";

Open in new window

0
 

Author Comment

by:MichaelGlancy
ID: 24865170
this is what Im getting
untitled.JPG
0
 
LVL 39

Expert Comment

by:Adam314
ID: 24865187
That means that there were no averages found in the ranges specified.  Are the ranges in the code (line 7 and 8) appropriate for that file?  Does the file you processed look like your example?  Can you attach a sample of the input file you used.
0
 

Author Comment

by:MichaelGlancy
ID: 24865188
hold it, wrong data group :-) sorry
0
 

Author Comment

by:MichaelGlancy
ID: 24865222
ok, that works fine for my data set.

who gets the points ?
0
 

Author Closing Comment

by:MichaelGlancy
ID: 31602673
thanks again
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
MySQL Web Database 5 111
Need help with bash and/or perl commands on OS X Terminal 9 100
Using Perl to parse rows 7 93
what are these modules in perl script 4 94
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Hi friends,  in this video  I'll show you how new windows 10 user can learn the using of windows 10. Thank you.

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now