Link to home
Start Free TrialLog in
Avatar of deadite
deaditeFlag for United States of America

asked on

Sort SMALL numbers in bins

I have alot extremely small numbers that I need to sort and add into bins to keep count, which is going to be used for a histogram.  I have a loop that is chosing 20 numbers from an array of 25 numbers (it can select the same number multiple times).  The array stores doubles ranging from 0.0 to 1.0.  The numbers i'm multiplying look somehting like this:
2.580645e-01
3.225806e-01
9.677419e-02
6.451613e-02
2.580645e-01

Multiplying these numbers are giving me results that look like this:
9.536743e-03
5.293956e-09
5.651273e-22
I think the smallest I saw was Xe-24.  I'm going to end up with a ton of these, about 5^20 worth of numbers!

As I am multiplying, how can I sort these results into MAXBIN number of bins to be used for a histogram?  I'm using 1000 for MAXBIN, for example here is part of my code (Not exactly the best ):

#define STEPS 20
#define MAXN 5
#define MAXBIN 1000

int counter[STEPS];
double pathTotal;
int c;
for (c0 = 0; c0 < MAXN; c0++) {
  counter[0] = c0;
  for (c1 = 0; c1 < MAXN; c1++) {
    counter[1] = c1;
...
    for (c19 = 0; c19 < MAXN; c19++) {
    counter[19] = c19;
    pathTotal = 1;
    for (c = 0; c < STEPS; c++) {
        pathTotal = pathTotal * arr[counter[c]][counter[c+1]];
        /* INSERT CODE TO ADD TO BIN FOR COUNT */
}



Avatar of PaulCaswell
PaulCaswell
Flag of United Kingdom of Great Britain and Northern Ireland image

Hi deadite,

Sorry but I dont understand what you are trying to achieve here. Could you explain a little more please?

Paul
Avatar of deadite

ASKER

Here is some quick background info:
In C, I have created a 5x5 transition matrix that contains conditional probabilities of state changes (Markov Chain).  Each row is a state. Each column represents the probability of moving to that next state. Each row will add up to 1.0  For example, using the following matrix, starting from state S0, the transition probability of moving to state S2 then state S4 is (0.4 * 0.1)

       S0   S1 S2  S3  S4
S0{ 0.3 0.1 0.4 0.1 0.1 }
S1{ 0.6 0.1 0.0 0.2 0.1 }
S2{ 0.9 0.0 0.1 0.0 0.1 }
S3{ 0.2 0.3 0.2 0.1 0.2 }
S4{ 0.6 0.1 0.1 0.1 0.1 }

From the previous code, that is looping through every possible path in this matrix for 20 steps.  For instance, here are the first few possible states:
00000000000000000000  (this is .3^20)
00000000000000000001 (.3^19 * .1)
00000000000000000002 (.3^19 * .4)
00000000000000000003 (.3^19 * .1)
00000000000000000004 (.3^19 * .1)
00000000000000000010 (.3^18 *.1 *.6)
...
44444444444444444444  (this is .1^20)

I already have all of this code producing those numbers (which is in the loop i pasted above).  As you can tell, this is going to produce some tiny numbers from 0e0 to Xe-25 probably.  Anyways I need to sort these outputs into 1000 bins as I calculate them.  For instance, let's say I create the following bins:

0
.1
.2
.3
.....
.9
.01
.02
.03
...
.1e-25

Basically, something similar to this but for 1000 bins.  It will compare the pathTotal and add a count to the bin that is closest to it.  Better?

If needed, let me know if I need to up the points.... I'm fresh out and will have to buy them.

Thanks
Avatar of Infinity08
Some questions :

1) Do you need to store the actual values in the bins, or do you just need the count ?
2) Will your "bin partitioning" be as in your example ? If not, what will the "bin borders" be ? Note that it's best to have some kind of logic behind this, because it makes the calculations a lot easier.


Note that I used "border" in the previous point. It will probably be handier to define borders for the bins instead of basing your bin choice on "closest to". Ie. for your example, the bins would be something like :

    ]0.9 .. 1.0]
    ]0.8 .. 0.9]
    ]0.7 .. 0.8]
    .....
    ]0.1 .. 0.2]
    ]0.09 .. 0.1]
    ]0.08 .. 0.09]
    ]0.07 .. 0.08]
    .....
    ]0.01 .. 0.02]
    ]0.009 .. 0.01]
    .....

But you don't get to 1000 bins this way, so maybe you'll need to split some bins up.

A suggestion to determine which bin to put a value in, is to use a binary tree with a bin on each node, ie. :

    if (value < current_node) {
      // go left
    }
    else if (value > current_node) {
      // go right
    }
    else {
      // found the bin
    }

Untill you arrive at the correct bin.


Just a question : what will this histogram be used for ?
Avatar of deadite

ASKER

The bins just need to keep count, not store the value... but the bins will pretty much represent the same value.  It doesn't  have to be 1000 bins, but it needs to be sufficient enough to spread the values out a little bit.  The bins will just be output to a file then imported to excel for the histogram.  Basically, the program is just producing some data to be manipulated in Excel.

I can use C a little bit, but by no means am a C programmer.  The more code you can provide for adding to the the results to the bins the better.  I'm fine with your binary tree idea, but can you expand the code a bit more?  

Also, if you want me to raise the points, let me know what you think it should be raised to.

Thanks

ASKER CERTIFIED SOLUTION
Avatar of PaulCaswell
PaulCaswell
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
>> I'm fine with your binary tree idea, but can you expand the code a bit more?  
Maybe to get things working first, Paul's idea is more suitable.
You can always expand the code with an optimised search algorithm to get a better performance.
Avatar of deadite

ASKER

Thanks for the replies, I'm finally getting back to working on this code.  I actually came up with something similar to Paul's code... so I'm going to give that a try...