Solved

What's strategy to write this simple prog. in perl ?

Posted on 1998-11-07
161 Views
I have a file which contain lots of numbers. I want to find the frequancy of each range.Can somebody
help me with the strategy to write this program. -the file contain numbers say 1-1000 so the min and max are known in advance
-the user input the width eg. 10 so there should be 1000/10 =100 number of classes (width=user input variable)
the program should search and find the frequency between numbers b/w 0-199, 200-299, 300-399 until 1000 so the result should be
class       Lower limit    upper limit  frequency      Prop
0           0.000           199.000      120         0.0987

Thanks alot for you help
0
Question by:aja101498
• 6
• 5

LVL 84

Expert Comment

How do you get 0-199, 200-299, 300-399 out of 1-1000?
0

Author Comment

Lower limit = 0
Upper limit = 1000 both lower & upper limits are known in advance)

Divide the n(1000)/ width(10). The class width is chosen by users and the n is known in advance.  1000/10= 100 so you get the size of each class:

class 1 0-199 (100),
class 2 200-299 (100),
class 3 300-399 (100)
and so forth.
Thanks
0

LVL 5

Accepted Solution

b2pi earned 50 total points
Assuming that the numbers are in a file, one per line: (And note the almost total lack of error checking)

my(\$Lower) = 1;
my(\$Upper) = 1000;
my(\$Width) = 200;
my(%Buckets);
my(\$buck);

open(FIL, "<\$filename") || die "Unable to open \$filename: \$!";
while(<FIL>) {
\$buck = int((\$_ - \$Lower)/\$Width);
\$Buckets{\$buck}->{count} ++;
if (!defined(\$Buckets{\$buck}->{high} or (\$_ > \$Buckets{\$buck}->{high})) {
\$Buckets{\$buck}->{high} = \$_;
}
if (!defined(\$Buckets{\$buck}->{low} or (\$_ < \$Buckets{\$buck}->{low})) {
\$Buckets{\$buck}->{low} = \$_;
}
}
close(FIL);

Does all that make sense?

0

Author Comment

: b2pi,

Can you please comment it. I want to understand it and then I will post the grade. Thanks a million.
0

Author Comment

b2pi

What variable holds the frequency ?
and classes. Is it buck ?

Yes the numbers are in a file, one per line. Thanks
0

LVL 5

Expert Comment

The count of the number of items held in a particular bucket, if that bucket number is n, is in \$Buckets{n}->{count}

All the code is  doing is running through the file, one line at a time.  It reads the number from the file, and

1.) increments the counter for the appropriate bucket.
2.) updates the high or low values, as appropriate.
0

Author Comment

Sorry.You know my friend I know very little about Perl that's why I am asking this question so please bear with me. I appreciate your help. Thanks

Frequency:
It is how many number b/w 0.000 and 199.000. Sometimes there are 50 numbers in one range or class and sometimes there are 0 numbers in a class.
Let us say in range or class between 0-199, there are only the following numbers:

20 X the number 0
10 X the number 88
100 X the number 120
1 X the number 7

then
class       Lower limit    upper limit  frequency    averagr
0           0.000           199.000      131        131/3

So in other words it is the total number of all the numbers between 0.000 - 199.000

X: times
the first column class is just the serial number of classes. It start from 0 to the 9 if the n=1000 and width=10

Thanks
0

LVL 5

Expert Comment

OK, so I'm using the word 'count', and you're using the word frequency. (BTW, I've assumed thoughout that your numbers are always positive, i.e. that \$Lower is > 0)

Also, it appears that I misunderstood your definitions of upper and lower, and that the min/max code I threw in there is uneccessary.

To figure out the limits, the salient piece of code is the

int(\$_ - \$Lower)/\$Width

\$_ is the number that was just read, \$Lower is your lower bound of all numbers
So your lower limit and upper limit of a basket n would be the smallest and largest numbers, x and y, respectively, for which int(x - \$Lower)/\$Width = int(y - \$Lower)/\$Width = n

Thus, the lower limit will be (I don't need to prove I can ignore the int.. trust me, I can :))
n*width + \$Lower
and the upper limit will be
(n+1)*width - \$Lower - epsilon
(where epsilon is greater than the smallest number on your architecture such that, for any number z, z - epsilon does not equal z.... It looks like you've implicitly used 1.0)

0

LVL 5

Expert Comment

OK, so I'm using the word 'count', and you're using the word frequency. (BTW, I've assumed thoughout that your numbers are always positive, i.e. that \$Lower is > 0)

Also, it appears that I misunderstood your definitions of upper and lower, and that the min/max code I threw in there is uneccessary.

To figure out the limits, the salient piece of code is the

int(\$_ - \$Lower)/\$Width

\$_ is the number that was just read, \$Lower is your lower bound of all numbers
So your lower limit and upper limit of a basket n would be the smallest and largest numbers, x and y, respectively, for which int(x - \$Lower)/\$Width = int(y - \$Lower)/\$Width = n

Thus, the lower limit will be (I don't need to prove I can ignore the int.. trust me, I can :))
n*width + \$Lower
and the upper limit will be
(n+1)*width - \$Lower - epsilon
(where epsilon is greater than the smallest number on your architecture such that, for any number z, z - epsilon does not equal z.... It looks like you've implicitly used 1.0)

0

Author Comment

If I understood you well the program should work as it is.
but the count variable doesn't have a \$ in it as all perl variable so how to print it ?

What to print to get the follwing
class       Lower limit    upper limit  frequency    averagr
0           0.000           199.000      131        131/3
1           200.000         299.000       ?         ?
2           300.000         399.000       ?          ?
and so forth. Thanks

0

LVL 5

Expert Comment

You mean like

print "\$Buckets{4}->{count}\n";

0

Author Comment

Merci !
0

Featured Post

Suggested Solutions

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…