choelt
asked on
Algorithm to sort numbers into equal groups
I have a process where we are manually trying to sort containers of varying weights into 4 bins of nearest equal total weights. Here is a sample of typical data with 13 items:
1.68
1.76
2.34
3.88
4.04
5.12
6.10
7.00
7.02
7.24
7.68
8.20
8.62
Now I need to determine which items to group into bins. I need 4 bins with the closest matching total weights possible. So in this case the total weight is 70.68. So I need 4 bins containing as close to 17.67/ea. possible. There will always be 4 bins no matter how many items. The total items varies usually between as few as 6 to upwards of 40. Also let me point out that it doesn't matter how many items are in each bin. One bin could have 5 items while another has only 2. We're just trying to match total weights as close as possible.
I'm a VB developer, but can understand most all basic logic as long as it is in somewhat readable form. (why/how explanations help me considerably) I have been working on some basic sorting using arrays, but I'm not able to come up with anything that consistently calculates a best case scenario for different data sets. I need help with an algorithm that will work much more consistently than the manual system we are using now which is for the most part trial and error.
1.68
1.76
2.34
3.88
4.04
5.12
6.10
7.00
7.02
7.24
7.68
8.20
8.62
Now I need to determine which items to group into bins. I need 4 bins with the closest matching total weights possible. So in this case the total weight is 70.68. So I need 4 bins containing as close to 17.67/ea. possible. There will always be 4 bins no matter how many items. The total items varies usually between as few as 6 to upwards of 40. Also let me point out that it doesn't matter how many items are in each bin. One bin could have 5 items while another has only 2. We're just trying to match total weights as close as possible.
I'm a VB developer, but can understand most all basic logic as long as it is in somewhat readable form. (why/how explanations help me considerably) I have been working on some basic sorting using arrays, but I'm not able to come up with anything that consistently calculates a best case scenario for different data sets. I need help with an algorithm that will work much more consistently than the manual system we are using now which is for the most part trial and error.
@choelt,
I think you can solve this with a knapsack algorithm, optimized with a greedy (Big-In-First) start.
You might use the variance of these four-bin combinations to pick your winner. =sum((avg-binsum)^2) or a simpler =sum(abs(avg-binsum))
ozo is correct in that this problem is NP difficult.
I think you can solve this with a knapsack algorithm, optimized with a greedy (Big-In-First) start.
You might use the variance of these four-bin combinations to pick your winner. =sum((avg-binsum)^2) or a simpler =sum(abs(avg-binsum))
ozo is correct in that this problem is NP difficult.
choelt
Given the 40 item limit, there is another approach that I've used in a different NP problem...simple iteration with masking.
Since you must look at all combinations of items, you must do a kind of binary representation of the inclusion/exclusion of an item based on the binary representation of a loop iteration variable.
There is some trimming of this iteration approach. Since your combination must approach 1/4 of the sum of all the items, you can reasonably exclude the smallest items from your main iteration loop. In the example you posted, the bottom 5 (of 13) items are never going to be used as one of 'big in first' combinations. These are always going to be filler items. You will iterate over these small items, but for a different purpose. This will usually reduce our problem size to something that we can represent with a Long integer for the 'big item' loop.
A second performance boost will be the representation of these values as integers, rather than floating point values. Multiply your values by 100.
You might do a space/time trade-off and represent your best-fit (4 bin) combinations with 1-4 arrays.
========================== ====
EE reference material:
I solved a similar problem in the interpreted VBA (MS-Access) environment:
https://www.experts-exchange.com/questions/20908880/Find-any-entries-debits-and-credits-within-an-account-that-cancel-each-other-out-add-up-to-zero.html
There were different constraints than this problem, but it is illustrative of a complete enumeration approach optimization. In your case, we aren't looking for exact sums but rather the item-sums surrounding the 1/4 sum. I might suggest that the equivalent to my byte array needs to be
From (AllItemSum/4 - SmallestItemsSum/2) To (AllItemSum/4 + SmallestItemsSum/2)
The SmallestItemsSum is the sum of the smallest items less than will not be used in the big items iteration. I had originally thought that I might be able to use some form of standard deviation for this array dimension, but couldn't come up with a simple way.
I hope that you are working in a compiled environment and won't have to externalize your iteration/enumeration looping.
__________________________ __________ __________ __________ _______
Some time later, I participated a similar enumeration question in the Delphi forum:
https://www.experts-exchange.com/questions/21354398/Algorithm-Combination.html
I was curious enough to ask for an implementation of this enumeration in Delphi
https://www.experts-exchange.com/questions/21355411/Seeking-fastest-bit-counting-and-combinatorial-enumeration.html
ozo and I both participated in another discussion from a pharmacy environment:
https://www.experts-exchange.com/questions/22752392/Procedure-to-optimize-mixes.html
Given the 40 item limit, there is another approach that I've used in a different NP problem...simple iteration with masking.
Since you must look at all combinations of items, you must do a kind of binary representation of the inclusion/exclusion of an item based on the binary representation of a loop iteration variable.
There is some trimming of this iteration approach. Since your combination must approach 1/4 of the sum of all the items, you can reasonably exclude the smallest items from your main iteration loop. In the example you posted, the bottom 5 (of 13) items are never going to be used as one of 'big in first' combinations. These are always going to be filler items. You will iterate over these small items, but for a different purpose. This will usually reduce our problem size to something that we can represent with a Long integer for the 'big item' loop.
A second performance boost will be the representation of these values as integers, rather than floating point values. Multiply your values by 100.
You might do a space/time trade-off and represent your best-fit (4 bin) combinations with 1-4 arrays.
==========================
EE reference material:
I solved a similar problem in the interpreted VBA (MS-Access) environment:
https://www.experts-exchange.com/questions/20908880/Find-any-entries-debits-and-credits-within-an-account-that-cancel-each-other-out-add-up-to-zero.html
There were different constraints than this problem, but it is illustrative of a complete enumeration approach optimization. In your case, we aren't looking for exact sums but rather the item-sums surrounding the 1/4 sum. I might suggest that the equivalent to my byte array needs to be
From (AllItemSum/4 - SmallestItemsSum/2) To (AllItemSum/4 + SmallestItemsSum/2)
The SmallestItemsSum is the sum of the smallest items less than will not be used in the big items iteration. I had originally thought that I might be able to use some form of standard deviation for this array dimension, but couldn't come up with a simple way.
I hope that you are working in a compiled environment and won't have to externalize your iteration/enumeration looping.
__________________________
Some time later, I participated a similar enumeration question in the Delphi forum:
https://www.experts-exchange.com/questions/21354398/Algorithm-Combination.html
I was curious enough to ask for an implementation of this enumeration in Delphi
https://www.experts-exchange.com/questions/21355411/Seeking-fastest-bit-counting-and-combinatorial-enumeration.html
ozo and I both participated in another discussion from a pharmacy environment:
https://www.experts-exchange.com/questions/22752392/Procedure-to-optimize-mixes.html
ASKER
aikimark
Thanks for the post. WOW! The VBA Access environment problem is alot of information to digest. I'll need time to study it to get my bearings. It never fails to amaze me that the problems that seem simple at first glance really are quite complex.
I have both vb.net and vb6 at my disposal. So yes, it will be a compiled program. I'm digging out some of my dusty algorithm books purchased over the years to study up on NP's and dynamic algorithms. I have looked into the knapsack solution, but I'm not sure what you are referring to with the greedy(big-in-first) start. Could you explain in more detail using as much "plain English" as possible. I don't have a degree in computer science, but I don't mind learning. Keep in mind that I'm trying to tackle this in my "spare time". LOL
I'm on the same page in terms of using the sum of variance of all 4 groups to pick the winner. Maybe standard deviation also. The light bulb works - If I could only find the switch
Thanks for the post. WOW! The VBA Access environment problem is alot of information to digest. I'll need time to study it to get my bearings. It never fails to amaze me that the problems that seem simple at first glance really are quite complex.
I have both vb.net and vb6 at my disposal. So yes, it will be a compiled program. I'm digging out some of my dusty algorithm books purchased over the years to study up on NP's and dynamic algorithms. I have looked into the knapsack solution, but I'm not sure what you are referring to with the greedy(big-in-first) start. Could you explain in more detail using as much "plain English" as possible. I don't have a degree in computer science, but I don't mind learning. Keep in mind that I'm trying to tackle this in my "spare time". LOL
I'm on the same page in terms of using the sum of variance of all 4 groups to pick the winner. Maybe standard deviation also. The light bulb works - If I could only find the switch
@choelt
I gave a presentation of the Access/VBA problem to one of the local user groups and one of the CompSci Departments. One of the CS grad students mentioned that my solution was an example of a class of NP solutions, something I didn't know before my presentation. Sorry to lead with that solution, but it was complicated by the interpreted nature of the VBA runtime environment.
Upon some reflection, I might use the square root of the 1/4 sum as a limit of the lower and upper ranges of my array.
From (AllItemSum/4 - SQRT(AllItemSum/4)) To (AllItemSum/4 + SQRT(AllItemSum/4))
In the posted example and made integers, this would be an array dimensioned
From(1767 - SQRT(1767)) To (1767 + SQRT(1767))
From (1725) To (1809)
========================
An explanation of the knapsack problem.
* Simple restatement -- what is the best way to pack an knapsack to get as much stuff into it as possible.
* Greedy packing -- start packing the largest items first and then fit the smaller items around them.
* Masking -- Using the binary representation of an integer value as a use/don't use indicator.
Example:
13 items = (2^13) - 1 combinations = 8191 combinations
Let's say that the bottom 5 items can not possibly add to our target, so
For lngItemLoop = 8191 To 2^5 Step -1 '2^5 = 32
CurrentIterationSum = SumItems(lngItemLoop)
If CurrentIterationSum between lowDim and highDim Then
'look for next bin's sum, excluding the items used to make the prior bins' sums
'there will be a secondary 2^5 loop to try and get close to the exact (AllSum/4) value
End If
Next
The SumItems() function adds the items where there is a 1 at that position in the value of lngItemLoop. The subsequent bin enumerations can use a mask of the current loop value against the remaining (not already used) mask, which is a very fast operation.
In the Access/VBA example, I had sorted by the oldest items to the most recent items, trying to retire the oldest balances first. In your case, you would use the weight of the item as your sort key. (which you presented in your list)
You can improve the performance of this by precalculating the SumItem values in an array. Then the lngItemLoop is merely indexing into an array.
You only need to evaluate combinations variance when you have all four bins sums calculated. You can
Don't feel badly about the size and complexity of this. Very few non CompSci folks understand the nature of NP. I have been working in IT for decades and this Access/VBA question was the FIRST one that presented this level of complexity.
Good luck.
I gave a presentation of the Access/VBA problem to one of the local user groups and one of the CompSci Departments. One of the CS grad students mentioned that my solution was an example of a class of NP solutions, something I didn't know before my presentation. Sorry to lead with that solution, but it was complicated by the interpreted nature of the VBA runtime environment.
Upon some reflection, I might use the square root of the 1/4 sum as a limit of the lower and upper ranges of my array.
From (AllItemSum/4 - SQRT(AllItemSum/4)) To (AllItemSum/4 + SQRT(AllItemSum/4))
In the posted example and made integers, this would be an array dimensioned
From(1767 - SQRT(1767)) To (1767 + SQRT(1767))
From (1725) To (1809)
========================
An explanation of the knapsack problem.
* Simple restatement -- what is the best way to pack an knapsack to get as much stuff into it as possible.
* Greedy packing -- start packing the largest items first and then fit the smaller items around them.
* Masking -- Using the binary representation of an integer value as a use/don't use indicator.
Example:
13 items = (2^13) - 1 combinations = 8191 combinations
Let's say that the bottom 5 items can not possibly add to our target, so
For lngItemLoop = 8191 To 2^5 Step -1 '2^5 = 32
CurrentIterationSum = SumItems(lngItemLoop)
If CurrentIterationSum between lowDim and highDim Then
'look for next bin's sum, excluding the items used to make the prior bins' sums
'there will be a secondary 2^5 loop to try and get close to the exact (AllSum/4) value
End If
Next
The SumItems() function adds the items where there is a 1 at that position in the value of lngItemLoop. The subsequent bin enumerations can use a mask of the current loop value against the remaining (not already used) mask, which is a very fast operation.
In the Access/VBA example, I had sorted by the oldest items to the most recent items, trying to retire the oldest balances first. In your case, you would use the weight of the item as your sort key. (which you presented in your list)
You can improve the performance of this by precalculating the SumItem values in an array. Then the lngItemLoop is merely indexing into an array.
You only need to evaluate combinations variance when you have all four bins sums calculated. You can
Don't feel badly about the size and complexity of this. Very few non CompSci folks understand the nature of NP. I have been working in IT for decades and this Access/VBA question was the FIRST one that presented this level of complexity.
Good luck.
ASKER
aikimark
Thanks for the details in your last post, but I still need a little more information. I've looked at the VBA Access sample, but I'm not able to really make heads or tails out of the big picture of what your trying to do with the numbers. I'm firmly believe in trying to figure out a problem on your own. If your handed the answer you don't really learn how to problem solve. I think I can figure out how to code it, but I just need to see the big picture here. I have some questions that may help me clarify exactly what I'm trying to accomplish here.
Are you calling lowDim and highDim the 1725 & 1809 values respectively? So add big items first and if it falls between 1725 & 1809 then iterate through small 5 items to determine the combination which is closest to 1767 (AllItemSum/4). At the point that I determine the combination that is closest to 1767, what is the next step? Do I store this in a 2 dimenstion array with a decimal representing the binary equivalency of selections and the subset sumtotal representing my group 1. From your last post it looks to just return to main loop and do it all over again with a new big in first start. Are you saying to mask against the previous selections so that none of those values are included in this next iteration? If this is the case, how do you know there isn't a better combination down the line which include some of the bigs with a mix and match of mids/smalls?
In the main loop it would start off by passing all 13 values to the SumItems subroutine, "1111111111111 binary" correct? If so, why even calculate for values where you are including more than (N-3) items since I need 4 different subsets at a minimum.
Where did you come up with the following for lower/upper limits?
From (AllItemSum/4 - SQRT(AllItemSum/4)) To (AllItemSum/4 + SQRT(AllItemSum/4))
I would also think that standard deviation could be used here. I would think it would depend on how dispersed the data is.
I seem to have a basic feel for what you are trying to do, but I'm not seeing how I'm supposed to be able to evaluate all different group combinations that are in the ballpark and then compare variance among the ones that are closet to sumtotals of 1767 with unique selections. The only way I can see it is if I come up with 4 groups in this manner and then calculate variance and store the result. Continue on until I have 4 more groups and again store the calculated variance. At the end choose the set of 4 with the smallest variance. If this is the case, would another method be more accurate as follows:
Calculate sumtotal for every combination and store sumtotal and binary mask representing selections for each total in a 2 dimenstion array. At the end of this loop we would have a distribution of sumtotals along with their respective selections. Then work outward from the midpoint of selection sets totaling 1767 until finding 4 that have unique selection sets using mask comparisons.
Maybe I'm not seeing it at all. If at all possible could you revisit this and try to provide me with a more literal explanation of the entire process. I'm a big picture kind of guy. One other thing. You stated in your previous post that by precalculating SumItems the process would be speeded up. Is this due to avoiding the overhead of calling the subroutine and returning?
I would appreciate as much attention as you could give me here as I feel I'm on the verge of understanding.
Thanks, Cory
Thanks for the details in your last post, but I still need a little more information. I've looked at the VBA Access sample, but I'm not able to really make heads or tails out of the big picture of what your trying to do with the numbers. I'm firmly believe in trying to figure out a problem on your own. If your handed the answer you don't really learn how to problem solve. I think I can figure out how to code it, but I just need to see the big picture here. I have some questions that may help me clarify exactly what I'm trying to accomplish here.
Are you calling lowDim and highDim the 1725 & 1809 values respectively? So add big items first and if it falls between 1725 & 1809 then iterate through small 5 items to determine the combination which is closest to 1767 (AllItemSum/4). At the point that I determine the combination that is closest to 1767, what is the next step? Do I store this in a 2 dimenstion array with a decimal representing the binary equivalency of selections and the subset sumtotal representing my group 1. From your last post it looks to just return to main loop and do it all over again with a new big in first start. Are you saying to mask against the previous selections so that none of those values are included in this next iteration? If this is the case, how do you know there isn't a better combination down the line which include some of the bigs with a mix and match of mids/smalls?
In the main loop it would start off by passing all 13 values to the SumItems subroutine, "1111111111111 binary" correct? If so, why even calculate for values where you are including more than (N-3) items since I need 4 different subsets at a minimum.
Where did you come up with the following for lower/upper limits?
From (AllItemSum/4 - SQRT(AllItemSum/4)) To (AllItemSum/4 + SQRT(AllItemSum/4))
I would also think that standard deviation could be used here. I would think it would depend on how dispersed the data is.
I seem to have a basic feel for what you are trying to do, but I'm not seeing how I'm supposed to be able to evaluate all different group combinations that are in the ballpark and then compare variance among the ones that are closet to sumtotals of 1767 with unique selections. The only way I can see it is if I come up with 4 groups in this manner and then calculate variance and store the result. Continue on until I have 4 more groups and again store the calculated variance. At the end choose the set of 4 with the smallest variance. If this is the case, would another method be more accurate as follows:
Calculate sumtotal for every combination and store sumtotal and binary mask representing selections for each total in a 2 dimenstion array. At the end of this loop we would have a distribution of sumtotals along with their respective selections. Then work outward from the midpoint of selection sets totaling 1767 until finding 4 that have unique selection sets using mask comparisons.
Maybe I'm not seeing it at all. If at all possible could you revisit this and try to provide me with a more literal explanation of the entire process. I'm a big picture kind of guy. One other thing. You stated in your previous post that by precalculating SumItems the process would be speeded up. Is this due to avoiding the overhead of calling the subroutine and returning?
I would appreciate as much attention as you could give me here as I feel I'm on the verge of understanding.
Thanks, Cory
@Cory
I need to play with this some more, both in my head and with some code. At this point, I was thinking about the following:
1. Use a big integer variable as my iteration (looping) variable to count down from 2^itemcount to lowerlimit.
2. Use the bitmask of the iteration variable (1 bits) to index into the array of weights.
3. If the sum of the indexed items is within some range (surrounding our target), add the iteration variable to a data structure (collection or array of collections/lists).
4. Sort the list/array by proximity to the actual 1/4 sum value.
Here comes the tough part...
5. Starting at the top of the sorted list/array, find the next three combinations that have a non-intersecting bit mask pattern.
6. Stop looking when you find the first set of four items.
===============
Notes:
* I'm not sure if the Big Integer variables in VB.Net have native boolean operators available like the Long Integer variables to in VB6.
* The bit mask is the 0/1 pattern of the binary value of the (big) integer variable
* If we use an array of collection/list items, we don't have to do the sorting step. We fill our 'sorted' array starting with the items at our exact target value (1/4 sum) and then the items immediately above/below the target. Then we add items target+/-2.
* Since we added the items in numerically descending order, this should give us the big-in-first values at the top of each list set.
* Although I suggested the +/- sqrt(targetvalue), I might change my mind about the lowerlimit value.
=======================
"Are you calling lowDim and highDim the 1725 & 1809 ..."
Yes.
I don't want to have a double iteration for the calculations. The summation function can have an upper limit parameter, allowing it to stop summing as soon as the limit has been met/exceeded.
"Are you saying to mask against the previous selections so that none of those values are included in this next iteration?"
Mostly yes. We have to be clear what we mean by iteration. I expect there to be a FindSum() function that will have two parameters
parmMask - bitmask pattern of current iteration variable and previous FindSum() results for the current iteration.
parmStartPosn - The index returned by the previous FindSum() invocation
If any of the FindSum() functions returns a -1 within some iteration, then our code will 'back up' one level, increment the parmStartPosn and to another round of FindSum() until the top level returns a -1 or we 'find' all 4 sum items. This is a lot like searching a tree or a shortest-pathfinding algorithm.
I need to play with this some more, both in my head and with some code. At this point, I was thinking about the following:
1. Use a big integer variable as my iteration (looping) variable to count down from 2^itemcount to lowerlimit.
2. Use the bitmask of the iteration variable (1 bits) to index into the array of weights.
3. If the sum of the indexed items is within some range (surrounding our target), add the iteration variable to a data structure (collection or array of collections/lists).
4. Sort the list/array by proximity to the actual 1/4 sum value.
Here comes the tough part...
5. Starting at the top of the sorted list/array, find the next three combinations that have a non-intersecting bit mask pattern.
6. Stop looking when you find the first set of four items.
===============
Notes:
* I'm not sure if the Big Integer variables in VB.Net have native boolean operators available like the Long Integer variables to in VB6.
* The bit mask is the 0/1 pattern of the binary value of the (big) integer variable
* If we use an array of collection/list items, we don't have to do the sorting step. We fill our 'sorted' array starting with the items at our exact target value (1/4 sum) and then the items immediately above/below the target. Then we add items target+/-2.
* Since we added the items in numerically descending order, this should give us the big-in-first values at the top of each list set.
* Although I suggested the +/- sqrt(targetvalue), I might change my mind about the lowerlimit value.
=======================
"Are you calling lowDim and highDim the 1725 & 1809 ..."
Yes.
I don't want to have a double iteration for the calculations. The summation function can have an upper limit parameter, allowing it to stop summing as soon as the limit has been met/exceeded.
"Are you saying to mask against the previous selections so that none of those values are included in this next iteration?"
Mostly yes. We have to be clear what we mean by iteration. I expect there to be a FindSum() function that will have two parameters
parmMask - bitmask pattern of current iteration variable and previous FindSum() results for the current iteration.
parmStartPosn - The index returned by the previous FindSum() invocation
If any of the FindSum() functions returns a -1 within some iteration, then our code will 'back up' one level, increment the parmStartPosn and to another round of FindSum() until the top level returns a -1 or we 'find' all 4 sum items. This is a lot like searching a tree or a shortest-pathfinding algorithm.
I might 'package' the FindSum() code into a recursive routine, but the packaging isn't important right now. Just in case, do you know what recursive routines are?
While I'm thinking and playing, try to find an instance of the 40 item set that is representative of what you are likely to encounter and post it in this thread.
While I'm thinking and playing, try to find an instance of the 40 item set that is representative of what you are likely to encounter and post it in this thread.
ASKER
aikimark
Yes, I've coded some recursive routines over the years in some special situations. I'm not an expert programmer by any means, but I do understand the basics of recursive programming. Calling a function from within itself up to n levels deep.
I don't have a sample of data that large with me, but I'm working on that. I'll post the largest sample set I can get my hands on ASAP.
Thanks for continuing with this.
Yes, I've coded some recursive routines over the years in some special situations. I'm not an expert programmer by any means, but I do understand the basics of recursive programming. Calling a function from within itself up to n levels deep.
I don't have a sample of data that large with me, but I'm working on that. I'll post the largest sample set I can get my hands on ASAP.
Thanks for continuing with this.
No problem. I like problem solving.
Although I stated that the code might be able to stop with the first set of four sum combinations, I can think of cases where the first set of four might be one exact match and three combinations near the end of our 'sorted' list. The variance of this set-of-four might be greater than a set with no exact matches.
The eventual 'winner' set-of-four will be one where all the bits (items) are used and the variance of the set is the lowest.
========================
To get an idea of the magnitude of this problem consider:
* Simply iterating from 1 to (2^40 -1) is 512 times slower than just iterating to (2^31 -1)
* For each iteration, we need to do some item value (weight) summation
* For each iteration, we need to do a range check and, potentially, add the item to a collection/list
* Once the initial iteration is completed, we have roughly 4*(N^2) more iterations to find our four-bin optimal set, where N is the number of candidate sums found in the first iteration.
If N=30000 Then we need about 3,600,000,000 more find-and-compare iterations.
=======================
In our case, we pass some level/depth data to the recursive routine, letting it calculate whether there is something found (and return it) or keep looking. With each depth increase, we pass a decremented value of the depth parameter passed to us.
We only need to calculate the variance, once we have a set of four sums.
Although I stated that the code might be able to stop with the first set of four sum combinations, I can think of cases where the first set of four might be one exact match and three combinations near the end of our 'sorted' list. The variance of this set-of-four might be greater than a set with no exact matches.
The eventual 'winner' set-of-four will be one where all the bits (items) are used and the variance of the set is the lowest.
========================
To get an idea of the magnitude of this problem consider:
* Simply iterating from 1 to (2^40 -1) is 512 times slower than just iterating to (2^31 -1)
* For each iteration, we need to do some item value (weight) summation
* For each iteration, we need to do a range check and, potentially, add the item to a collection/list
* Once the initial iteration is completed, we have roughly 4*(N^2) more iterations to find our four-bin optimal set, where N is the number of candidate sums found in the first iteration.
If N=30000 Then we need about 3,600,000,000 more find-and-compare iterations.
=======================
In our case, we pass some level/depth data to the recursive routine, letting it calculate whether there is something found (and return it) or keep looking. With each depth increase, we pass a decremented value of the depth parameter passed to us.
We only need to calculate the variance, once we have a set of four sums.
ASKER
aikimark
Here are 2 samples that may help. I ranked them and multiplied the values by 100. I could not find a 40 count, but that is rare. I would say the average number of items is close to 23.
1) 35 items
1 84
2 93
3 94
4 102
5 110
6 116
7 120
8 120
9 121
10 132
11 132
12 134
13 142
14 154
15 156
16 164
17 164
18 167
19 178
20 180
21 192
22 210
23 226
24 226
25 248
26 268
27 278
28 293
29 306
30 313
31 346
32 368
33 386
34 406
35 474
2) 23 items
1 124
2 170
3 210
4 296
5 330
6 400
7 476
8 492
9 522
10 544
11 564
12 570
13 650
14 662
15 686
16 724
17 750
18 780
19 780
20 786
21 996
22 996
23 1074
Here are 2 samples that may help. I ranked them and multiplied the values by 100. I could not find a 40 count, but that is rare. I would say the average number of items is close to 23.
1) 35 items
1 84
2 93
3 94
4 102
5 110
6 116
7 120
8 120
9 121
10 132
11 132
12 134
13 142
14 154
15 156
16 164
17 164
18 167
19 178
20 180
21 192
22 210
23 226
24 226
25 248
26 268
27 278
28 293
29 306
30 313
31 346
32 368
33 386
34 406
35 474
2) 23 items
1 124
2 170
3 210
4 296
5 330
6 400
7 476
8 492
9 522
10 544
11 564
12 570
13 650
14 662
15 686
16 724
17 750
18 780
19 780
20 786
21 996
22 996
23 1074
Just to get a feel for the expected N, I ran this little code snippet against your 23 item set of data.
I know that this code isn't flexible to cover differently sized sets of data and I don't know the source of your data (file, manual entry, database, etc.). However, it is a proof-of-concept start on the code.
Sum = 13582
Sum/4 = 3395.5
SQRT(Sum/4) = 58.27091899
RangeLow = 3337.229081 = 3337
RangeHigh = 3453.770919 = 3454
The enumaration iteration pass populated the CandidateSums array with the following counts at each sum value position:
3337 0
3338 374
3339 0
3340 346
3341 0
3342 339
3343 0
3344 368
3345 0
3346 380
3347 0
3348 329
3349 0
3350 361
3351 0
3352 334
3353 0
3354 362
3355 0
3356 403
3357 0
3358 317
3359 0
3360 376
3361 0
3362 412
3363 0
3364 346
3365 0
3366 424
3367 0
3368 375
3369 0
3370 384
3371 0
3372 390
3373 0
3374 363
3375 0
3376 404
3377 0
3378 362
3379 0
3380 351
3381 0
3382 403
3383 0
3384 358
3385 0
3386 416
3387 0
3388 346
3389 0
3390 378
3391 0
3392 411
3393 0
3394 379
3395 0 <- Target sum closest values
3396 408 <- Target sum closest values
3397 0
3398 361
3399 0
3400 390
3401 0
3402 450
3403 0
3404 348
3405 0
3406 429
3407 0
3408 409
3409 0
3410 362
3411 0
3412 431
3413 0
3414 385
3415 0
3416 417
3417 0
3418 408
3419 0
3420 380
3421 0
3422 467
3423 0
3424 393
3425 0
3426 410
3427 0
3428 416
3429 0
3430 385
3431 0
3432 452
3433 0
3434 374
3435 0
3436 389
3437 0
3438 432
3439 0
3440 389
3441 0
3442 442
3443 0
3444 388
3445 0
3446 419
3447 0
3448 446
3449 0
3450 412
3451 0
3452 449
3453 0
3454 427
========
TOTAL: 23059
So my earlier estimate of 30000 candidate sums seems like a reasonable guess, although this will increase as we increase the set sizes.
Would you like to tweak this code snippet to better meet your needs before I continue?
* This code will work for sets of 31 or less because I'm using a Long Integer variable. Different variable data types will be needed for larger set sizes.
* I don't know if there is an equivalent AND operator in VB.Net that will work with larger integer variables. If not, we will have to create something.
* This code still needs to copy the items into another array, starting at the target position and alternating to the lower/upper boundaries.
I know that this code isn't flexible to cover differently sized sets of data and I don't know the source of your data (file, manual entry, database, etc.). However, it is a proof-of-concept start on the code.
Sum = 13582
Sum/4 = 3395.5
SQRT(Sum/4) = 58.27091899
RangeLow = 3337.229081 = 3337
RangeHigh = 3453.770919 = 3454
The enumaration iteration pass populated the CandidateSums array with the following counts at each sum value position:
3337 0
3338 374
3339 0
3340 346
3341 0
3342 339
3343 0
3344 368
3345 0
3346 380
3347 0
3348 329
3349 0
3350 361
3351 0
3352 334
3353 0
3354 362
3355 0
3356 403
3357 0
3358 317
3359 0
3360 376
3361 0
3362 412
3363 0
3364 346
3365 0
3366 424
3367 0
3368 375
3369 0
3370 384
3371 0
3372 390
3373 0
3374 363
3375 0
3376 404
3377 0
3378 362
3379 0
3380 351
3381 0
3382 403
3383 0
3384 358
3385 0
3386 416
3387 0
3388 346
3389 0
3390 378
3391 0
3392 411
3393 0
3394 379
3395 0 <- Target sum closest values
3396 408 <- Target sum closest values
3397 0
3398 361
3399 0
3400 390
3401 0
3402 450
3403 0
3404 348
3405 0
3406 429
3407 0
3408 409
3409 0
3410 362
3411 0
3412 431
3413 0
3414 385
3415 0
3416 417
3417 0
3418 408
3419 0
3420 380
3421 0
3422 467
3423 0
3424 393
3425 0
3426 410
3427 0
3428 416
3429 0
3430 385
3431 0
3432 452
3433 0
3434 374
3435 0
3436 389
3437 0
3438 432
3439 0
3440 389
3441 0
3442 442
3443 0
3444 388
3445 0
3446 419
3447 0
3448 446
3449 0
3450 412
3451 0
3452 449
3453 0
3454 427
========
TOTAL: 23059
So my earlier estimate of 30000 candidate sums seems like a reasonable guess, although this will increase as we increase the set sizes.
Would you like to tweak this code snippet to better meet your needs before I continue?
* This code will work for sets of 31 or less because I'm using a Long Integer variable. Different variable data types will be needed for larger set sizes.
* I don't know if there is an equivalent AND operator in VB.Net that will work with larger integer variables. If not, we will have to create something.
* This code still needs to copy the items into another array, starting at the target position and alternating to the lower/upper boundaries.
Option Explicit
Sub SumIteration()
Dim CandidateSums(3337 To 3454) As New Collection
Dim lngItems(0 To 22) As Long
Dim lngPower(0 To 22) As Long
Dim lngLoop As Long
Dim lngSum As Long
For lngLoop = 0 To 22
lngPower(lngLoop) = 2 ^ lngLoop
Next
lngItems(0) = 124
lngItems(1) = 170
lngItems(2) = 210
lngItems(3) = 296
lngItems(4) = 330
lngItems(5) = 400
lngItems(6) = 476
lngItems(7) = 492
lngItems(8) = 522
lngItems(9) = 544
lngItems(10) = 564
lngItems(11) = 570
lngItems(12) = 650
lngItems(13) = 662
lngItems(14) = 686
lngItems(15) = 724
lngItems(16) = 750
lngItems(17) = 780
lngItems(18) = 780
lngItems(19) = 786
lngItems(20) = 996
lngItems(21) = 996
lngItems(22) = 1074
For lngLoop = 2 ^ 23 - 1 To 1 Step -1
lngSum = SumItStatic(lngLoop, lngItems(), 3455)
Select Case lngSum
Case 3337 To 3454
CandidateSums(lngSum).Add lngLoop
End Select
Next
Stop
End Sub
Function SumItStatic(parmMask As Long, parmItems() As Long, parmSumLimit As Long) As Long
Static lngPower() As Long
Static IsInitialized As Boolean
Dim lngLoop As Long
Dim lngSum As Long
Dim lngUbound As Long
lngUbound = UBound(parmItems)
If IsInitialized Then
Else
ReDim lngPower(0 To lngUbound)
For lngLoop = 0 To lngUbound
lngPower(lngLoop) = 2 ^ lngLoop
Next
IsInitialized = True
End If
For lngLoop = lngUbound To 0 Step -1
If (parmMask And lngPower(lngLoop)) <> 0 Then
lngSum = lngSum + parmItems(lngLoop)
If lngSum > parmSumLimit Then
SumItStatic = lngSum
Exit Function
End If
End If
Next
SumItStatic = lngSum
End Function
ASKER
aikimark
The data will be manually entered from a sheet with values written down. I will create a user interface for that. Once the data is entered, I can sort into ascending array and pass the array into the SumItStatic function like your doing in this code. Do you need that at this point?
I don't see anything needing tweaking at this point. I understand what this code snippet is doing, but just to clarify. At completion the CandidateSums collection contains the long integers whose binary equivalents represent selections which add up to sums of 3337 through 3454. In this case there are no odd numbered sums due to all data being evenly numbered. The results are indicating there are 408 different combinations that add up to 3396. Am I correct so far? I don't see how there are 408 different combinations adding up to 3396. I guess that's why we have the computer doing the math.
Now the hard part right, as you mentioned above? Looking at results and coming up with unique selection sets starting with these 408 possibilities, right?
As far as vb.net goes, I'm perfectly fine with using VB6 for this project. If I need to adapt it to VB.Net in the future, then I can. I believe VB6 may run it faster anyways.
The data will be manually entered from a sheet with values written down. I will create a user interface for that. Once the data is entered, I can sort into ascending array and pass the array into the SumItStatic function like your doing in this code. Do you need that at this point?
I don't see anything needing tweaking at this point. I understand what this code snippet is doing, but just to clarify. At completion the CandidateSums collection contains the long integers whose binary equivalents represent selections which add up to sums of 3337 through 3454. In this case there are no odd numbered sums due to all data being evenly numbered. The results are indicating there are 408 different combinations that add up to 3396. Am I correct so far? I don't see how there are 408 different combinations adding up to 3396. I guess that's why we have the computer doing the math.
Now the hard part right, as you mentioned above? Looking at results and coming up with unique selection sets starting with these 408 possibilities, right?
As far as vb.net goes, I'm perfectly fine with using VB6 for this project. If I need to adapt it to VB.Net in the future, then I can. I believe VB6 may run it faster anyways.
You correctly interpreted the results.
I do not need your input methods as long as you can adjust the code to handle a variety of different set sizes.
I just verified that VB.Net AND operator supports the 64-bit LONG integer data type. I think that the solution would be easier in VB.Net than in VB6 since VB6 doesn't have a native 64-bit integer data type.
Since we are here to help you, can you take a crack at the next bit of code to add the CandidateSums items into a single array in the proper order?
I do not need your input methods as long as you can adjust the code to handle a variety of different set sizes.
I just verified that VB.Net AND operator supports the 64-bit LONG integer data type. I think that the solution would be easier in VB.Net than in VB6 since VB6 doesn't have a native 64-bit integer data type.
Since we are here to help you, can you take a crack at the next bit of code to add the CandidateSums items into a single array in the proper order?
ASKER
Yes I can do that. As far as this array building code goes, your wanting to add all of the items in the 3396 index first. Then alternate adding the values in (3395 & 3397), (3394 & 3398) ....... (3337 & 3454) indexes respectively ending up with all 23,059 values added. If this is not correct, please let me know before I go to far with it. It is midnight here so I'm going to catch a few winks. I'll post the code sometime tomorrow depending on how my normal day goes.
The last hour or so I have been testing routines to mask items in the 3396 index of the collection. I didn't use recursive routines. So it is not optimized for speed. I was just wanting to evaluate the results stored in the CandidateSums collection. I found numerous groups comprised of 3 unique subsets of values totaling 3396, but not 4. If there consists of three unique subsets totaling 3396, then there is no reason to go any further since the remaining 4th subset will have to fall in line. Do you agree? The remaining unused items can easily be determined using a final mask. In this case they will add up to 3394.
The last hour or so I have been testing routines to mask items in the 3396 index of the collection. I didn't use recursive routines. So it is not optimized for speed. I was just wanting to evaluate the results stored in the CandidateSums collection. I found numerous groups comprised of 3 unique subsets of values totaling 3396, but not 4. If there consists of three unique subsets totaling 3396, then there is no reason to go any further since the remaining 4th subset will have to fall in line. Do you agree? The remaining unused items can easily be determined using a final mask. In this case they will add up to 3394.
you understand correctly.
If you want an optimal answer, you have to keep looking. If you find a set of four within the first (target value) set, then there is no better solution and you can finish early. The problem with your proposal is that you might find three mutually exclusive combinations within the target set of items and your only other set is at the periphery of the range of sets -- or beyond. Using the variance metric, your solution might not be better than two combinations from the target set and two from the +/-1 sets. This is the 'N' part of the NP problem complexity description...Nondetermini stic. To me, the best optimization trick is to find a sure fire method of stopping early.
In looking at your set of 23 items, there might be some valid conclusion to use the remainder of the three sums as a stopping criteria and constructing the number from the unmasked bits. However, I do not think this would be true with any general set of item weights. I will also give some thought to stoppage conditions -- if we look farther, we will be starting with a worse (combination set) variance than we already have found. As long as the list/array iteration is looking at the target value items, I can't think of such a condition.
Both of your larger set examples contain fairly even data values. I can think of sets of data where the data values might be clumpped with few combinations close to the target value.
You can check the mutual exclusiveness of the candidates using the AND operator.
Example:
Bin1 AND Bin2 AND Bin3 AND Bin4 = 0
As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator. We pass this OR'ed bit mask down to the next level along with the next position (index) in the list.
I realize that this might seem to contradict what I'd written earlier in this comment, but if we find a set of four, we don't need to look further at the lowest level, but at the next to lowest level. I'm going to call it a night and think about the structure of the code and parameters when performing a FindNext (what I'm calling this kind of function call/use similar to what you'd use with a recordset object) starting at the next-to-last depth.
If you want an optimal answer, you have to keep looking. If you find a set of four within the first (target value) set, then there is no better solution and you can finish early. The problem with your proposal is that you might find three mutually exclusive combinations within the target set of items and your only other set is at the periphery of the range of sets -- or beyond. Using the variance metric, your solution might not be better than two combinations from the target set and two from the +/-1 sets. This is the 'N' part of the NP problem complexity description...Nondetermini
In looking at your set of 23 items, there might be some valid conclusion to use the remainder of the three sums as a stopping criteria and constructing the number from the unmasked bits. However, I do not think this would be true with any general set of item weights. I will also give some thought to stoppage conditions -- if we look farther, we will be starting with a worse (combination set) variance than we already have found. As long as the list/array iteration is looking at the target value items, I can't think of such a condition.
Both of your larger set examples contain fairly even data values. I can think of sets of data where the data values might be clumpped with few combinations close to the target value.
You can check the mutual exclusiveness of the candidates using the AND operator.
Example:
Bin1 AND Bin2 AND Bin3 AND Bin4 = 0
As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator. We pass this OR'ed bit mask down to the next level along with the next position (index) in the list.
I realize that this might seem to contradict what I'd written earlier in this comment, but if we find a set of four, we don't need to look further at the lowest level, but at the next to lowest level. I'm going to call it a night and think about the structure of the code and parameters when performing a FindNext (what I'm calling this kind of function call/use similar to what you'd use with a recordset object) starting at the next-to-last depth.
I think I might know an early stoppage criteria. Once we've done the initial enumeration iteration, we can calculate the possible 'variances' of the combinations. We can then compare our candidate combinations' variances. When we have iterated all the possible combinations with one distance, we might be able to state (programmatically) that we have found an optimal combination. While this calculation is more work, it would allow us to stop with a high confidence that we'd found an optimal solution.
It might also be possible to use this distance information when we are searching for combinations, so that we might limit the depth of the searches.
In the snippet, I have done a calculation of 4 bin 'variance' (distance from target) combinations. The first set uses the sqare of the distance. The second set uses the ABS() of the distance. In both of these sets, the right-most column is the sum of the first 4 columns. The first example is more representative of statistical variance, since larger values are magnified.
It might also be possible to use this distance information when we are searching for combinations, so that we might limit the depth of the searches.
In the snippet, I have done a calculation of 4 bin 'variance' (distance from target) combinations. The first set uses the sqare of the distance. The second set uses the ABS() of the distance. In both of these sets, the right-most column is the sum of the first 4 columns. The first example is more representative of statistical variance, since larger values are magnified.
0.25 0.25 0.25 0.25 1
0.25 0.25 0.25 2.25 3
0.25 0.25 2.25 2.25 5
0.25 2.25 2.25 2.25 7
2.25 2.25 2.25 2.25 9
==============================
0.5 0.5 0.5 0.5 2
0.5 0.5 0.5 1.5 3
0.5 0.5 1.5 1.5 4
0.5 1.5 1.5 1.5 5
1.5 1.5 1.5 1.5 6
ASKER
aikimark:
Here is my stab at the code to build the sorted array building out from target sum to lower and upper limits alternating between the 2. This code alternates up until the point that it reaches the lower of 2 total counts in the upper and lower targets. Let's say we've already added all of the 3396 sums to the array. Offset +/- 1 (3395 & 3397 respectively) are both zero counts in this case, but the next set is offset +/- 2 (3394 & 3398) respectively. 3394 has 379 total items & 3398 has 361 total items. I wrote the code to alternate up to the lower of the 2 totals. In this case 361. At that point I add the remaining 18 values from 3394 before continuing on to offset +/- 3 (3393 & 3399). Is this the correct way to approach this? Also in my main DO Loop I continue until I hit either lower or upper limit. If there is an even number of total sum sets (as there are in this case) there will be an odd man out. So I add it last. I have tested the code and it seems to be working. Let me know otherwise.
Here is my stab at the code to build the sorted array building out from target sum to lower and upper limits alternating between the 2. This code alternates up until the point that it reaches the lower of 2 total counts in the upper and lower targets. Let's say we've already added all of the 3396 sums to the array. Offset +/- 1 (3395 & 3397 respectively) are both zero counts in this case, but the next set is offset +/- 2 (3394 & 3398) respectively. 3394 has 379 total items & 3398 has 361 total items. I wrote the code to alternate up to the lower of the 2 totals. In this case 361. At that point I add the remaining 18 values from 3394 before continuing on to offset +/- 3 (3393 & 3399). Is this the correct way to approach this? Also in my main DO Loop I continue until I hit either lower or upper limit. If there is an even number of total sum sets (as there are in this case) there will be an odd man out. So I add it last. I have tested the code and it seems to be working. Let me know otherwise.
'Count total candidate sums
lngSum = 0
For lngLoop = 3337 To 3454
lngSum = lngSum + CandidateSums(lngLoop).Count
Next
Dim lngIndexedSums() As Long
ReDim lngIndexedSums(lngSum)
Dim lngIndexedPosition As Long 'Current position to store next item
Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
Dim lngTotalTargetItems, lngLowerTotal, lngUpperTotal As Long
lngIndexedPosition = 0
lngTarget = 3396
'1st add all items in target index to array
For lngLoop = 1 To CandidateSums(lngTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget + 1
Do While lngLowerTarget >= 3337 And lngUpperTarget <= 3454
lngLowerTotal = CandidateSums(lngLowerTarget).Count
lngUpperTotal = CandidateSums(lngUpperTarget).Count
'Determine which collection has larger number of items for this offset
If lngLowerTotal = lngUpperTotal Then
'Count is equal for this iteration so set loop count to Count
lngTotalTargetItems = lngLowerTotal
Else
'Set loop count to higher of 2
If lngLowerTotal > lngUpperTotal Then
lngTotalTargetItems = lngLowerTotal
Else
lngTotalTargetItems = lngUpperTotal
End If
End If
For lngLoop = 1 To lngTotalTargetItems
'Add next sum from lower offset target if it exists
If lngLoop <= lngLowerTotal Then
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
End If
'Add next sum from upper offset target if it exists
If lngLoop <= lngUpperTotal Then
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
End If
Next
lngLowerTarget = lngLowerTarget - 1
lngUpperTarget = lngUpperTarget + 1
Loop
'Check to see if there is an odd man out and add it if necessary
If lngLowerTarget = 3337 Then
'There is 1 more lower offset to iterate through
For lngLoop = 1 To CandidateSums(lngLowerTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
If lngUpperTarget = 3454 Then
'There is 1 more upper offset to iterate through
For lngLoop = 1 To CandidateSums(lngUpperTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
'Validate all positions of indexed array contain sums
Dim bZeroFound As Boolean: bZeroFound = False
Dim lngIndexedValue As Long
For lngLoop = 0 To UBound(lngIndexedSums()) - 1
lngIndexedValue = lngIndexedSums(lngLoop) 'set watch to see values
If Not lngIndexedValue > 0 Then
bZeroFound = True
End If
Next
Debug.Print "LowerTargetOffset: " & Str(lngLowerTarget)
Debug.Print "UpperTargetOffset: " & Str(lngUpperTarget)
Debug.Print "Zero Found: " & Str(bZeroFound)
Since the target might not be evenly divisible by four, you should tweak your code, adding some preference to which of the lower/upper index gets used first. It might help to look at a variation on our 23 item example (in the snippet).
You could also use a MOD function, where
* target MOD 4 = 1 should prefer the lower adding before the upper
* target MOD 4 = 3 should prefer the upper adding before the lower
* for 0 and 2 MOD values, there is no preference
I've done something similar where I have a two item array that is initialized as either (-1,1) or (1,-1) as my preferences.
You could also use a MOD function, where
* target MOD 4 = 1 should prefer the lower adding before the upper
* target MOD 4 = 3 should prefer the upper adding before the lower
* for 0 and 2 MOD values, there is no preference
I've done something similar where I have a two item array that is initialized as either (-1,1) or (1,-1) as my preferences.
Sum Sum/4 Preference
------ -------- -----------
13580 3395 None
13581 3395.25 Lower
13582 3395.5 None
13583 3395.75 Upper
13584 3396 None
13585 3396.25 Lower
13586 3396.5 None
13587 3396.75 Upper
13588 3397 None
ASKER
So in this case where the sum/4 is 3395.5, I should NOT start out by adding all of the 3396 sums to the sorted array. I should start out alternating between 3395 & 3396, although in our case 3395 has no items.
Here is how I think it should be sorted for the different cases:
Case
MOD 0 - Start by adding all exact target sums and then alternate
MOD 1 - Start by adding all target sums from the next lowest and then alternate
MOD 2 - Start by alternating
MOD 3 - Start by adding all target sums from the next highest and then alternate
Examples with above sum variations you presented:
Sum Sum/4 Preference
------ -------- -----------
13580 3395 add all sums from 3395 and start alternating 3394 & 3396
13581 3395.25 add all sums from 3395 and start alternating 3394 & 3396
13582 3395.5 start alternating 3394 & 3396
13583 3395.75 add all sums from 3396 and start alternating 3397 & 3395
13584 3396 add all sums from 3396 and start alternating 3395 & 3397
.............. and so on
Is that what you had in mind?
Something else I'm a little confused on. Hypothetically speaking lets say the sum/4 was 3395 mod 0. So I add all items from 3395 first and then start alternating between 3394 & 3396. I know it's not real data because our sum/4 is 3395.5, but for the sake of argument would it be better to sort these numbers from high to low, rather than alternate back and forth since Big In First is wanting biggest numbers first.
Looking the the actual first 5 items of 3394 & 3396 collections:
3394: 6554112
3394: 6423040
3394: 6324289
3394: 6299778
3394: 6294020
3396: 7340048
3396: 6815764
3396: 6299669
3396: 6295566
3396: 6291607
These could be stored in the sorted array as:
7340048 3396
6815764 3396
6554112 3394
6423040 3394
6324289 3394
6299778 3394
6299669 3396
6295566 3396
6294020 3394
6291607 3396
Here is how I think it should be sorted for the different cases:
Case
MOD 0 - Start by adding all exact target sums and then alternate
MOD 1 - Start by adding all target sums from the next lowest and then alternate
MOD 2 - Start by alternating
MOD 3 - Start by adding all target sums from the next highest and then alternate
Examples with above sum variations you presented:
Sum Sum/4 Preference
------ -------- -----------
13580 3395 add all sums from 3395 and start alternating 3394 & 3396
13581 3395.25 add all sums from 3395 and start alternating 3394 & 3396
13582 3395.5 start alternating 3394 & 3396
13583 3395.75 add all sums from 3396 and start alternating 3397 & 3395
13584 3396 add all sums from 3396 and start alternating 3395 & 3397
.............. and so on
Is that what you had in mind?
Something else I'm a little confused on. Hypothetically speaking lets say the sum/4 was 3395 mod 0. So I add all items from 3395 first and then start alternating between 3394 & 3396. I know it's not real data because our sum/4 is 3395.5, but for the sake of argument would it be better to sort these numbers from high to low, rather than alternate back and forth since Big In First is wanting biggest numbers first.
Looking the the actual first 5 items of 3394 & 3396 collections:
3394: 6554112
3394: 6423040
3394: 6324289
3394: 6299778
3394: 6294020
3396: 7340048
3396: 6815764
3396: 6299669
3396: 6295566
3396: 6291607
These could be stored in the sorted array as:
7340048 3396
6815764 3396
6554112 3394
6423040 3394
6324289 3394
6299778 3394
6299669 3396
6295566 3396
6294020 3394
6291607 3396
I did *not* mean to alternate adding items from the +/-1 arrays.
You want to add all the items from arrays closest to your target before adding any items from arrays farther away from your target.
In our example case items from 3396 and 3395 are equidistant from 3395.5, so there is no order preference. In this case, I don't think it matters whether you create an algorithm that gives preference to the higher value (in the case of ties).
I think you have made an interesting case for the introduction of a merge operation when adding items from equidistant (from target) arrays. It is consistent with my earlier Big-In-First comments. Are you familiar with merge sort algorithm? It is quite simple -- add the largest/smallest item from the 'head' of two lists until you exhaust one of the lists and then add the remaining items from the non-empty list. You would use a merge when
target MOD 4 = 0 or 2
I had envisioned an evaluation of candidates algorithm that would iterate through an entire 'distance set' (all items from an array with the same distance from the target). In that case, the merging wouldn't be necessary.
From your recent comments, it seems like you are starting to get a handle on this. How do you feel about your understanding of the problem and the path we are following toward a solution?
===================
I'm thinking about some of these shortcuts:
* create and populate a data structure array that has the lower and upper index values and the distance-from-target for all items within that range. That will allow us to calculate the 'variance' without having to recalculate the sums of the four items we are considering. I've included an example of part of such an data structure with 1-origin. This is only the first five ranges closest to the target value.
* use some sort of vector to communicate the starting positions for the FindSum/FindNext functions. This would be a two-way communication vector.
Note: the variance values get large quickly, so I would expect the optimal four item set to be found in the first few ranges. However, that is not guaranteed.
You want to add all the items from arrays closest to your target before adding any items from arrays farther away from your target.
In our example case items from 3396 and 3395 are equidistant from 3395.5, so there is no order preference. In this case, I don't think it matters whether you create an algorithm that gives preference to the higher value (in the case of ties).
I think you have made an interesting case for the introduction of a merge operation when adding items from equidistant (from target) arrays. It is consistent with my earlier Big-In-First comments. Are you familiar with merge sort algorithm? It is quite simple -- add the largest/smallest item from the 'head' of two lists until you exhaust one of the lists and then add the remaining items from the non-empty list. You would use a merge when
target MOD 4 = 0 or 2
I had envisioned an evaluation of candidates algorithm that would iterate through an entire 'distance set' (all items from an array with the same distance from the target). In that case, the merging wouldn't be necessary.
From your recent comments, it seems like you are starting to get a handle on this. How do you feel about your understanding of the problem and the path we are following toward a solution?
===================
I'm thinking about some of these shortcuts:
* create and populate a data structure array that has the lower and upper index values and the distance-from-target for all items within that range. That will allow us to calculate the 'variance' without having to recalculate the sums of the four items we are considering. I've included an example of part of such an data structure with 1-origin. This is only the first five ranges closest to the target value.
* use some sort of vector to communicate the starting positions for the FindSum/FindNext functions. This would be a two-way communication vector.
Note: the variance values get large quickly, so I would expect the optimal four item set to be found in the first few ranges. However, that is not guaranteed.
From To Variance
----- ------- ---------
1 408 0.25
409 788 2.25
789 1150 6.25
1151 1562 12.25
1563 1953 20.25
ASKER
I had no time to think about this today, but I would like to say that 'yes' I'm fairly comfortable with the understanding of the problem and the path being taken thus far. I have to admit that I'm confused about your approach your comments on data structure containing upper/lower index and distance from target values. I'm really not following your thought pattern here at all with the sample structure shown and what is representing.
Are we still going to need the sorted array I was working on?
Are we still going to need the sorted array I was working on?
"...I'm confused..."
there are 408 items that are in the group closest to the target.
there are 379 items that are in the next closest group.
there are 361 items that are in the next closest group.
there are 411 items that are in the next closest group.
the values in the data structure are cumulative sums, representing the original counts stacked on top of each other in the single array/list.
We don't have to have the From column in this list. I thought it made the data structure more understandable. I was wrong.
====================
"Are we still going to need the sorted array I was working on?"
Yes. The new data structure merely describes the sets of items added to this array.
there are 408 items that are in the group closest to the target.
there are 379 items that are in the next closest group.
there are 361 items that are in the next closest group.
there are 411 items that are in the next closest group.
the values in the data structure are cumulative sums, representing the original counts stacked on top of each other in the single array/list.
We don't have to have the From column in this list. I thought it made the data structure more understandable. I was wrong.
====================
"Are we still going to need the sorted array I was working on?"
Yes. The new data structure merely describes the sets of items added to this array.
ASKER
OK,
I see where you are coming up with the cumulative sums and calculating the squared variance, but I think my problem is that I don't fully understand how we are going to evaluate 4 candidate sets.
Let me see if I understand.
You stated
"You can check the mutual exclusiveness of the candidates using the AND operator.
Example:
Bin1 AND Bin2 AND Bin3 AND Bin4 = 0
As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator. We pass this OR'ed bit mask down to the next level along with the next position (index) in the list."
With the sorted array you're wanting to start with index 0 AND'ing index 1,2,3...N until you find a mutually exclusive set. This gives you 2 sets of items. Hypothetically, lets' say now you have items (0 and 240) which have mutually exclusive sets. So continue on AND'ing (0,240, 241....N) until we find 3 mutually exclusive items. (0,240,319) Continue on AND'ing (0,240,319,320.....N) until we find 4. If a 4th item is found, store into structure with the 4 items and sum total of squared variance. Let's say (0,240,319,689). In our case the sum total of squared variance will be .25 + .25 + .25 + 2.25 = 3.00.
At this point what comes next? Do we start with index 1 now and repeat the above process?
Above you state:
"As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator. We pass this OR'ed bit mask down to the next level along with the next position (index) in the list"
You lose me here. Is this what you're referring too with the proposed FindNext function?
I fully understand the CandidateSums collections and that it represents the Big-In-First combinations of all combinations which sum up to totals ranging from 3337 to 3454. I understand how to sort these items into an array starting with target sum(s) working out towards upper and lower range of 3454 and 3337 respectively. I understand how to AND the selections to come up with 4 that are mutually exclusive, but after this is where I don't follow your thought pattern. Can you possibly explain it in another way so that a 'DUMMY' might understand?
I see where you are coming up with the cumulative sums and calculating the squared variance, but I think my problem is that I don't fully understand how we are going to evaluate 4 candidate sets.
Let me see if I understand.
You stated
"You can check the mutual exclusiveness of the candidates using the AND operator.
Example:
Bin1 AND Bin2 AND Bin3 AND Bin4 = 0
As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator. We pass this OR'ed bit mask down to the next level along with the next position (index) in the list."
With the sorted array you're wanting to start with index 0 AND'ing index 1,2,3...N until you find a mutually exclusive set. This gives you 2 sets of items. Hypothetically, lets' say now you have items (0 and 240) which have mutually exclusive sets. So continue on AND'ing (0,240, 241....N) until we find 3 mutually exclusive items. (0,240,319) Continue on AND'ing (0,240,319,320.....N) until we find 4. If a 4th item is found, store into structure with the 4 items and sum total of squared variance. Let's say (0,240,319,689). In our case the sum total of squared variance will be .25 + .25 + .25 + 2.25 = 3.00.
At this point what comes next? Do we start with index 1 now and repeat the above process?
Above you state:
"As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator. We pass this OR'ed bit mask down to the next level along with the next position (index) in the list"
You lose me here. Is this what you're referring too with the proposed FindNext function?
I fully understand the CandidateSums collections and that it represents the Big-In-First combinations of all combinations which sum up to totals ranging from 3337 to 3454. I understand how to sort these items into an array starting with target sum(s) working out towards upper and lower range of 3454 and 3337 respectively. I understand how to AND the selections to come up with 4 that are mutually exclusive, but after this is where I don't follow your thought pattern. Can you possibly explain it in another way so that a 'DUMMY' might understand?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@choelt
Now you see why some of these NP problem solutions result in such long threads.
Now you see why some of these NP problem solutions result in such long threads.
ASKER
OK, I'll look over the section of code posted to try and gain a better understanding of the process. For testing purposes, I did not scope the lngIndexedSums() array (I'll change the name of it to ItemArray() for compatibility) at the module level, but will do so once complete. It will probably be later in the evening before I have a chance to look at your previous post in detail.
Now I'm well aware of how these threads can reach such lengths. Going through this is great exercise for the old mind though.
Now I'm well aware of how these threads can reach such lengths. Going through this is great exercise for the old mind though.
correction...two old minds :-)
Since I haven't gotten around to any substantive coding, there isn't a reason to rename the array. Just post your code and I'll know what you call the merged array.
Since I haven't gotten around to any substantive coding, there isn't a reason to rename the array. Just post your code and I'll know what you call the merged array.
ASKER
I'm just about to start tweaking the code I have up to this point. I should be able to post it later tonight. I've had a little fire put under me on this. Apparently, someone else in another division is working on an Access database to try and do something along similar lines. I guess they are using VBA, but I don't think they have anything completed with an algorithm yet. I'll call tomorrow to see where they are at, and let them know what is being done on this end. So at least I shouldn't have to build a GUI for entering the data. I can just query a database and then post the results back to the database. However, that also means that I need to wrap this up quicker than previously expected. As always!
I was just slammed with about 4 inches of rain/small hail in about 15 minutes which ended up flooding my garage. I'll try and cleanup that first and then get on the coding express.
I was just slammed with about 4 inches of rain/small hail in about 15 minutes which ended up flooding my garage. I'll try and cleanup that first and then get on the coding express.
ASKER
aikimark
Here is the latest code to sort the array. I have taken into consideration MOD 4 as follows:
Result:
0 - Exact integer target value
Add all target values 1st and then perform merge-sort on +/- 1 target values choosing highest values
1 - .25
Add values from next lowest target, next highest, next lowest, next highest...........
2 - .5
Start by merge-sort on +/- .5 target values choosing highest values
3 - .75
Add values from next highest target, next lowest, next highest, next lowest...........
I have compiled the program and it runs in 4 seconds on my old AMD 1Ghz tower. Pretty impressive in my opinion considering its calculating over 8191 possible combinations and sorting the results.
Here is the latest code to sort the array. I have taken into consideration MOD 4 as follows:
Result:
0 - Exact integer target value
Add all target values 1st and then perform merge-sort on +/- 1 target values choosing highest values
1 - .25
Add values from next lowest target, next highest, next lowest, next highest...........
2 - .5
Start by merge-sort on +/- .5 target values choosing highest values
3 - .75
Add values from next highest target, next lowest, next highest, next lowest...........
I have compiled the program and it runs in 4 seconds on my old AMD 1Ghz tower. Pretty impressive in my opinion considering its calculating over 8191 possible combinations and sorting the results.
Option Explicit
Dim lngIndexedSums() As Long
Sub SumIteration()
Dim CandidateSums(3337 To 3454) As New Collection
Dim Groups() As String
Dim lngItems(0 To 22) As Long
Dim lngPower(0 To 22) As Long
Dim lngLoop As Long
Dim lngSum As Long
For lngLoop = 0 To 22
lngPower(lngLoop) = 2 ^ lngLoop
Next
lngItems(0) = 124
lngItems(1) = 170
lngItems(2) = 210
lngItems(3) = 296
lngItems(4) = 330
lngItems(5) = 400
lngItems(6) = 476
lngItems(7) = 492
lngItems(8) = 522
lngItems(9) = 544
lngItems(10) = 564
lngItems(11) = 570
lngItems(12) = 650
lngItems(13) = 662
lngItems(14) = 686
lngItems(15) = 724
lngItems(16) = 750
lngItems(17) = 780
lngItems(18) = 780
lngItems(19) = 786
lngItems(20) = 996
lngItems(21) = 996
lngItems(22) = 1074
For lngLoop = 2 ^ 23 - 1 To 1 Step -1
lngSum = SumItStatic(lngLoop, lngItems(), 3455)
Select Case lngSum
Case 3337 To 3454
CandidateSums(lngSum).Add lngLoop
End Select
Next
'Count total candidate sums
lngSum = 0
For lngLoop = 3337 To 3454
lngSum = lngSum + CandidateSums(lngLoop).Count
Next
ReDim lngIndexedSums(lngSum)
Dim lngIndexedPosition As Long 'Current position to store next item
Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
Dim lngLowerTotal, lngUpperTotal As Long
Dim lngLowerValue, lngUpperValue As Long
Dim lngItemSumTotal As Long
For lngLoop = 0 To UBound(lngItems)
lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)
Next
Select Case lngItemSumTotal Mod 4
Case 0 'exact integer
lngTarget = lngItemSumTotal / 4
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget + 1
Case 1 '.25
lngTarget = Round(lngItemSumTotal / 4, 0)
lngLowerTarget = lngTarget
lngUpperTarget = lngTarget + 1
Case 2 '.5
lngTarget = 0
lngLowerTarget = Int(lngItemSumTotal / 4)
lngUpperTarget = lngLowerTarget + 1
Case 3 '.75
lngTarget = Round(lngItemSumTotal / 4, 0)
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget
End Select
'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)
'Debug.Print "Target: " & Str(lngTarget)
'Debug.Print "LowerTarget: " & Str(lngLowerTarget)
'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
Dim lngLowerMergePosition, lngUpperMergePosition As Long
lngIndexedPosition = 0
Do While lngTarget <> -1
If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then
'MOD 1 or 3 so add all values from single target
For lngLoop = 1 To CandidateSums(lngTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
'Alternate to next target
If lngTarget = lngLowerTarget Then
lngLowerTarget = lngTarget - 1
If lngUpperTarget <= 3454 Then
lngTarget = lngUpperTarget
Else
If lngLowerTarget >= 3337 Then
lngTarget = lngLowerTarget
Else
lngTarget = -1
End If
End If
Else
lngUpperTarget = lngTarget + 1
If lngLowerTarget >= 3337 Then
lngTarget = lngLowerTarget
Else
If lngUpperTarget <= 3454 Then
lngTarget = lngUpperTarget
Else
lngTarget = -1
End If
End If
End If
Else
'MOD 0 or 2
If lngTarget > 0 Then 'MOD 0 so add exact target values first
For lngLoop = 1 To CandidateSums(lngTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
lngTarget = 0 'Start merging
End If
'Merge largest items from lower & upper target groups
lngLowerMergePosition = 1
lngUpperMergePosition = 1
lngLowerTotal = CandidateSums(lngLowerTarget).Count
lngUpperTotal = CandidateSums(lngUpperTarget).Count
Do While lngLowerMergePosition <= lngLowerTotal And _
lngUpperMergePosition <= lngUpperTotal
lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)
lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)
If lngLowerValue >= lngUpperValue Then
lngIndexedSums(lngIndexedPosition) = lngLowerValue
lngLowerMergePosition = lngLowerMergePosition + 1
Else
lngIndexedSums(lngIndexedPosition) = lngUpperValue
lngUpperMergePosition = lngUpperMergePosition + 1
End If
lngIndexedPosition = lngIndexedPosition + 1
Loop
If lngLowerMergePosition <= lngLowerTotal Then
'Add remaining lower target values
For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
If lngUpperMergePosition <= lngUpperTotal Then
'Add remaining upper target values
For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
'Determine next upper/lower targets
lngLowerTarget = lngLowerTarget - 1
lngUpperTarget = lngUpperTarget + 1
If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then
If lngLowerTarget < 3337 Then
If lngUpperTarget <= 3454 Then
'Upper target(s) still exist so add values
lngTarget = lngUpperTarget
Else
'Both outside limits so exit loop
lngTarget = -1
End If
Else
lngTarget = lngLowerTarget
End If
End If
End If
Loop
End Sub
Function SumItStatic(parmMask As Long, parmItems() As Long, parmSumLimit As Long) As Long
Static lngPower() As Long
Static IsInitialized As Boolean
Dim lngLoop As Long
Dim lngSum As Long
Dim lngUbound As Long
lngUbound = UBound(parmItems)
If IsInitialized Then
Else
ReDim lngPower(0 To lngUbound)
For lngLoop = 0 To lngUbound
lngPower(lngLoop) = 2 ^ lngLoop
Next
IsInitialized = True
End If
For lngLoop = lngUbound To 0 Step -1
If (parmMask And lngPower(lngLoop)) <> 0 Then
lngSum = lngSum + parmItems(lngLoop)
If lngSum > parmSumLimit Then
SumItStatic = lngSum
Exit Function
End If
End If
Next
SumItStatic = lngSum
End Function
A nice bit of coding under suboptimal conditions. Sorry to hear about the flooding. Hope the water didn't contain alligators. :-)
If the other person is using MS-Access, does that mean that the data originates in a database? If so, revisit my original NP discussion, where both the data and the computing took place in an Access VBA environment. It should give you a template for sizing and filling an array. Much more flexible than the literal numeric values in the current version of the code.
Speaking of flexibility...if you have the time, look at ways of removing the hard-coded limits, such as 3454. Reference limits that you calculate and store in variables or in the LBound() and UBound() values applied to the array. You will need to do this eventually. Might as well do it while I'm working on the next bit.
One other thing about your 'competition'...for reasonably few items, you can solve such NP problems in Access using either the database engine or VBA code. However, the interperative nature of the code soon limits the solvability size because of the time required. In other words, your VB.Net runtime environment is going to scale much better than VBA with your 40 item max. In an earlier test, I ran a 35 item enumeration loop in about 5 minutes.
It looks like the ball is in my court now. Time to roll up my sleeves and get the next section fleshed out. I should have some time this weekend to work on it. Please be patient.
If the other person is using MS-Access, does that mean that the data originates in a database? If so, revisit my original NP discussion, where both the data and the computing took place in an Access VBA environment. It should give you a template for sizing and filling an array. Much more flexible than the literal numeric values in the current version of the code.
Speaking of flexibility...if you have the time, look at ways of removing the hard-coded limits, such as 3454. Reference limits that you calculate and store in variables or in the LBound() and UBound() values applied to the array. You will need to do this eventually. Might as well do it while I'm working on the next bit.
One other thing about your 'competition'...for reasonably few items, you can solve such NP problems in Access using either the database engine or VBA code. However, the interperative nature of the code soon limits the solvability size because of the time required. In other words, your VB.Net runtime environment is going to scale much better than VBA with your 40 item max. In an earlier test, I ran a 35 item enumeration loop in about 5 minutes.
It looks like the ball is in my court now. Time to roll up my sleeves and get the next section fleshed out. I should have some time this weekend to work on it. Please be patient.
ASKER
Yes I realize there are alot of hard numbers in this code. I will take care of that. I'm going to discuss integration with the late comer to the party and come up with a plan. I'm going to be in and out for the next several days so my response time won't be ideal, but I'll try to check-in at least once a day through about Wednesday. It just depends on internet access availability.
Did you have a chance to look at the code I posted late last night? I wasn't able to test it as thoroughly as I wanted to due to heavy eyelids. Let me know if you have any issues with the way it is sorting the sums?
Thanks for all of your help on this. I look forward to racking my brains trying to figure out your next routines.
Did you have a chance to look at the code I posted late last night? I wasn't able to test it as thoroughly as I wanted to due to heavy eyelids. Let me know if you have any issues with the way it is sorting the sums?
Thanks for all of your help on this. I look forward to racking my brains trying to figure out your next routines.
I got compile errors when I brought your code into VS2008.
I've tweaked the code down through the
ReDim lngIndexedSums
statement.
Although the VB.Net code isn't as simple as classic VB code, the availability of the 64-bit LONG integer data type allows enumerations beyond 31 items.
Two thoughts I had this weekend:
* we might be able to use the largest item as the upper limit if it exceeds our target value. Not sure of this assumption, so I didn't include it in the snippet.
* we might tighten the range from: target +/-SQRT(target) to something that looks at the smallest values in the list. For instance, if I take the largest item of the subset that will not exceed the target and use its SQRT(), I get a much smaller range = 22.85. I also played with the difference between the largest of the small item subset and the smallest item SQRT = 19.95. While we do not have a lot of candidate sums with our 23-item sample set, trimming the range will result in faster performance of the SumItStatic() function -- fewer iterations as well as memory footprint reductions for the largest sets.
I've tweaked the code down through the
ReDim lngIndexedSums
statement.
Although the VB.Net code isn't as simple as classic VB code, the availability of the 64-bit LONG integer data type allows enumerations beyond 31 items.
Two thoughts I had this weekend:
* we might be able to use the largest item as the upper limit if it exceeds our target value. Not sure of this assumption, so I didn't include it in the snippet.
* we might tighten the range from: target +/-SQRT(target) to something that looks at the smallest values in the list. For instance, if I take the largest item of the subset that will not exceed the target and use its SQRT(), I get a much smaller range = 22.85. I also played with the difference between the largest of the small item subset and the smallest item SQRT = 19.95. While we do not have a lot of candidate sums with our 23-item sample set, trimming the range will result in faster performance of the SumItStatic() function -- fewer iterations as well as memory footprint reductions for the largest sets.
Option Explicit On
Module Module1
Public lngIndexedSums() As Long
Sub SumIteration()
Dim Groups() As String
Dim lngItems(0 To 22) As Long
Dim lngPower(0 To 22) As Long
Dim lngEnumerate As Long
Dim lngLoop As Integer
Dim lngSum As Integer
Dim lngCandidateCount As Integer
For lngLoop = 0 To 22
lngPower(lngLoop) = 2 ^ lngLoop
Next
lngItems(0) = 124
lngItems(1) = 170
lngItems(2) = 210
lngItems(3) = 296
lngItems(4) = 330
lngItems(5) = 400
lngItems(6) = 476
lngItems(7) = 492
lngItems(8) = 522
lngItems(9) = 544
lngItems(10) = 564
lngItems(11) = 570
lngItems(12) = 650
lngItems(13) = 662
lngItems(14) = 686
lngItems(15) = 724
lngItems(16) = 750
lngItems(17) = 780
lngItems(18) = 780
lngItems(19) = 786
lngItems(20) = 996
lngItems(21) = 996
lngItems(22) = 1074
'calculate lower and upper bounds
' * calculate target = sum(lngItems)/4
' * if the largest item is > target, it should probably be the upper bound
' * otherwise, use the range as target +/- SQRT(target)
'
'initialize the sizes and lb arrays
Dim sizes() As Integer '= {118}
Dim lb() As Integer '= {3337}
Dim Target As Single, LowerBound As Integer, UpperBound As Integer
Target = lngItems.Sum / 4
LowerBound = Target - Math.Sqrt(Target)
UpperBound = Target + Math.Sqrt(Target)
ReDim lb(0)
ReDim sizes(0)
lb(0) = LowerBound
sizes(0) = UpperBound - LowerBound + 1
Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)
For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
CandidateSums(lngLoop) = New Collection
Next
For lngEnumerate = (2 ^ 23) - 1 To 1 Step -1
lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)
Select Case lngSum
Case LowerBound To UpperBound
CandidateSums(lngSum).Add(lngLoop)
lngCandidateCount = lngCandidateCount + 1
End Select
Next
'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
' Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)
'Next
'Count total candidate sums
'lngSum = 0
'For lngLoop = LowerBound To UpperBound
' lngSum = lngSum + CandidateSums(lngLoop).Count
'Next
'NOTE: the following array is one larger than requred
ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
Dim lngIndexedPosition As Long 'Current position to store next item
Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
Dim lngLowerTotal, lngUpperTotal As Long
Dim lngLowerValue, lngUpperValue As Long
Dim lngItemSumTotal As Long
For lngLoop = 0 To UBound(lngItems)
lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)
Next
Select Case lngItemSumTotal Mod 4
Case 0 'exact integer
lngTarget = lngItemSumTotal / 4
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget + 1
Case 1 '.25
lngTarget = Math.Round(lngItemSumTotal / 4, 0)
lngLowerTarget = lngTarget
lngUpperTarget = lngTarget + 1
Case 2 '.5
lngTarget = 0
lngLowerTarget = Int(lngItemSumTotal / 4)
lngUpperTarget = lngLowerTarget + 1
Case 3 '.75
lngTarget = Math.Round(lngItemSumTotal / 4, 0)
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget
End Select
'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)
'Debug.Print "Target: " & Str(lngTarget)
'Debug.Print "LowerTarget: " & Str(lngLowerTarget)
'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
Dim lngLowerMergePosition, lngUpperMergePosition As Long
lngIndexedPosition = 0
Do While lngTarget <> -1
If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then
'MOD 1 or 3 so add all values from single target
For lngLoop = 1 To CandidateSums(lngTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
'Alternate to next target
If lngTarget = lngLowerTarget Then
lngLowerTarget = lngTarget - 1
If lngUpperTarget <= 3454 Then
lngTarget = lngUpperTarget
Else
If lngLowerTarget >= 3337 Then
lngTarget = lngLowerTarget
Else
lngTarget = -1
End If
End If
Else
lngUpperTarget = lngTarget + 1
If lngLowerTarget >= 3337 Then
lngTarget = lngLowerTarget
Else
If lngUpperTarget <= 3454 Then
lngTarget = lngUpperTarget
Else
lngTarget = -1
End If
End If
End If
Else
'MOD 0 or 2
If lngTarget > 0 Then 'MOD 0 so add exact target values first
For lngLoop = 1 To CandidateSums(lngTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
lngTarget = 0 'Start merging
End If
'Merge largest items from lower & upper target groups
lngLowerMergePosition = 1
lngUpperMergePosition = 1
lngLowerTotal = CandidateSums(lngLowerTarget).Count
lngUpperTotal = CandidateSums(lngUpperTarget).Count
Do While lngLowerMergePosition <= lngLowerTotal And _
lngUpperMergePosition <= lngUpperTotal
lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)
lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)
If lngLowerValue >= lngUpperValue Then
lngIndexedSums(lngIndexedPosition) = lngLowerValue
lngLowerMergePosition = lngLowerMergePosition + 1
Else
lngIndexedSums(lngIndexedPosition) = lngUpperValue
lngUpperMergePosition = lngUpperMergePosition + 1
End If
lngIndexedPosition = lngIndexedPosition + 1
Loop
If lngLowerMergePosition <= lngLowerTotal Then
'Add remaining lower target values
For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
If lngUpperMergePosition <= lngUpperTotal Then
'Add remaining upper target values
For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
'Determine next upper/lower targets
lngLowerTarget = lngLowerTarget - 1
lngUpperTarget = lngUpperTarget + 1
If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then
If lngLowerTarget < 3337 Then
If lngUpperTarget <= 3454 Then
'Upper target(s) still exist so add values
lngTarget = lngUpperTarget
Else
'Both outside limits so exit loop
lngTarget = -1
End If
Else
lngTarget = lngLowerTarget
End If
End If
End If
Loop
End Sub
Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long
Static lngPower() As Long
Static IsInitialized As Boolean
Dim lngLoop As Long
Dim lngSum As Long
Dim lngUbound As Long
lngUbound = UBound(parmItems)
If IsInitialized Then
Else
ReDim lngPower(0 To lngUbound)
For lngLoop = 0 To lngUbound
lngPower(lngLoop) = 2 ^ lngLoop
Next
IsInitialized = True
End If
For lngLoop = lngUbound To 0 Step -1
If (parmMask And lngPower(lngLoop)) <> 0 Then
lngSum = lngSum + parmItems(lngLoop)
If lngSum > parmSumLimit Then
SumItStatic = lngSum
Exit Function
End If
End If
Next
SumItStatic = lngSum
End Function
End Module
ASKER
I coded that in VB6 because that was what you were using originally. I have loaded it into VB.NET Visual Studio 2005 and have one problem with your changes.
Line 51: Target = lngItems.Sum / 4
Is this supported in VS 2008? The compiler in VS 2005 shows no such method supported. Anyways, I went ahead and just looped through the items again to calculate the sum as before and it seems to work now. I compiled it as a console app and renamed the sumiteration sub to Main also. It flies through the initial 23 items, but when I added 8 more items onto the end for a total of 31 it gives me an error (index out of bounds???) during the sorting phase. This is after it calculates around 2.5 Million possible candidate sums.
I did not consider data sets where we single values larger than the target value. Would they not have to stand on their own? I guess it depends on the spread of data.
I'm off for now, but later this evening I will look back into this error I'm receiving. If I switch it back to the original 23 items it runs fine.
Line 51: Target = lngItems.Sum / 4
Is this supported in VS 2008? The compiler in VS 2005 shows no such method supported. Anyways, I went ahead and just looped through the items again to calculate the sum as before and it seems to work now. I compiled it as a console app and renamed the sumiteration sub to Main also. It flies through the initial 23 items, but when I added 8 more items onto the end for a total of 31 it gives me an error (index out of bounds???) during the sorting phase. This is after it calculates around 2.5 Million possible candidate sums.
I did not consider data sets where we single values larger than the target value. Would they not have to stand on their own? I guess it depends on the spread of data.
I'm off for now, but later this evening I will look back into this error I'm receiving. If I switch it back to the original 23 items it runs fine.
Option Explicit On
Module Module1
Public lngIndexedSums() As Long
Sub Main()
Dim Groups() As String
Dim lngItems(0 To 30) As Long
Dim lngPower(0 To 30) As Long
Dim lngEnumerate As Long
Dim lngLoop As Integer
Dim lngSum As Integer
Dim lngCandidateCount As Integer
For lngLoop = 0 To 30
lngPower(lngLoop) = 2 ^ lngLoop
Next
lngItems(0) = 124
lngItems(1) = 170
lngItems(2) = 210
lngItems(3) = 296
lngItems(4) = 330
lngItems(5) = 400
lngItems(6) = 476
lngItems(7) = 492
lngItems(8) = 522
lngItems(9) = 544
lngItems(10) = 564
lngItems(11) = 570
lngItems(12) = 650
lngItems(13) = 662
lngItems(14) = 686
lngItems(15) = 724
lngItems(16) = 750
lngItems(17) = 780
lngItems(18) = 780
lngItems(19) = 786
lngItems(20) = 996
lngItems(21) = 996
lngItems(22) = 1074
lngItems(23) = 1079
lngItems(24) = 1083
lngItems(25) = 1239
lngItems(26) = 1287
lngItems(27) = 1391
lngItems(28) = 1396
lngItems(29) = 1412
lngItems(30) = 1457
For i As Integer = 0 To lngItems.GetUpperBound(0)
lngSum += lngItems(i)
Next
'calculate lower and upper bounds
' * calculate target = sum(lngItems)/4
' * if the largest item is > target, it should probably be the upper bound
' * otherwise, use the range as target +/- SQRT(target)
'
'initialize the sizes and lb arrays
Dim sizes() As Integer '= {118}
Dim lb() As Integer '= {3337}
Dim Target As Single, LowerBound As Integer, UpperBound As Integer
'Target = lngItems.lngItems.lngItems.Sum / 4
Target = lngSum / 4
LowerBound = Target - Math.Sqrt(Target)
UpperBound = Target + Math.Sqrt(Target)
Console.WriteLine("Target: " + Target.ToString)
Console.WriteLine("LowerBound: " + LowerBound.ToString)
Console.WriteLine("UpperBound: " + UpperBound.ToString)
Console.WriteLine()
ReDim lb(0)
ReDim sizes(0)
lb(0) = LowerBound
sizes(0) = UpperBound - LowerBound + 1
Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)
For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
CandidateSums(lngLoop) = New Collection
Next
For lngEnumerate = (2 ^ 31) - 1 To 1 Step -1
lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)
Select Case lngSum
Case LowerBound To UpperBound
CandidateSums(lngSum).Add(lngLoop)
lngCandidateCount = lngCandidateCount + 1
Console.CursorLeft = 0
Console.CursorTop = 4
Console.Write("Total Candidates: " + lngCandidateCount.ToString)
End Select
Next
'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
' Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)
'Next
'Count total candidate sums
'lngSum = 0
'For lngLoop = LowerBound To UpperBound
' lngSum = lngSum + CandidateSums(lngLoop).Count
'Next
'NOTE: the following array is one larger than requred
ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
Dim lngIndexedPosition As Long 'Current position to store next item
Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
Dim lngLowerTotal, lngUpperTotal As Long
Dim lngLowerValue, lngUpperValue As Long
Dim lngItemSumTotal As Long
For lngLoop = 0 To UBound(lngItems)
lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)
Next
Select Case lngItemSumTotal Mod 4
Case 0 'exact integer
lngTarget = lngItemSumTotal / 4
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget + 1
Case 1 '.25
lngTarget = Math.Round(lngItemSumTotal / 4, 0)
lngLowerTarget = lngTarget
lngUpperTarget = lngTarget + 1
Case 2 '.5
lngTarget = 0
lngLowerTarget = Int(lngItemSumTotal / 4)
lngUpperTarget = lngLowerTarget + 1
Case 3 '.75
lngTarget = Math.Round(lngItemSumTotal / 4, 0)
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget
End Select
'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)
'Debug.Print "Target: " & Str(lngTarget)
'Debug.Print "LowerTarget: " & Str(lngLowerTarget)
'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
Dim lngLowerMergePosition, lngUpperMergePosition As Long
lngIndexedPosition = 0
Do While lngTarget <> -1
If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then
'MOD 1 or 3 so add all values from single target
For lngLoop = 1 To CandidateSums(lngTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
'Alternate to next target
If lngTarget = lngLowerTarget Then
lngLowerTarget = lngTarget - 1
If lngUpperTarget <= 3454 Then
lngTarget = lngUpperTarget
Else
If lngLowerTarget >= 3337 Then
lngTarget = lngLowerTarget
Else
lngTarget = -1
End If
End If
Else
lngUpperTarget = lngTarget + 1
If lngLowerTarget >= 3337 Then
lngTarget = lngLowerTarget
Else
If lngUpperTarget <= 3454 Then
lngTarget = lngUpperTarget
Else
lngTarget = -1
End If
End If
End If
Else
'MOD 0 or 2
If lngTarget > 0 Then 'MOD 0 so add exact target values first
For lngLoop = 1 To CandidateSums(lngTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
lngTarget = 0 'Start merging
End If
'Merge largest items from lower & upper target groups
lngLowerMergePosition = 1
lngUpperMergePosition = 1
lngLowerTotal = CandidateSums(lngLowerTarget).Count
lngUpperTotal = CandidateSums(lngUpperTarget).Count
Do While lngLowerMergePosition <= lngLowerTotal And _
lngUpperMergePosition <= lngUpperTotal
lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)
lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)
If lngLowerValue >= lngUpperValue Then
lngIndexedSums(lngIndexedPosition) = lngLowerValue
lngLowerMergePosition = lngLowerMergePosition + 1
Else
lngIndexedSums(lngIndexedPosition) = lngUpperValue
lngUpperMergePosition = lngUpperMergePosition + 1
End If
lngIndexedPosition = lngIndexedPosition + 1
Loop
If lngLowerMergePosition <= lngLowerTotal Then
'Add remaining lower target values
For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
If lngUpperMergePosition <= lngUpperTotal Then
'Add remaining upper target values
For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
'Determine next upper/lower targets
lngLowerTarget = lngLowerTarget - 1
lngUpperTarget = lngUpperTarget + 1
If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then
If lngLowerTarget < 3337 Then
If lngUpperTarget <= 3454 Then
'Upper target(s) still exist so add values
lngTarget = lngUpperTarget
Else
'Both outside limits so exit loop
lngTarget = -1
End If
Else
lngTarget = lngLowerTarget
End If
End If
End If
Console.CursorLeft = 0
Console.CursorTop = 5
Console.Write("Total Indexed Sums: " + lngIndexedPosition.ToString)
Loop
Console.Read()
End Sub
Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long
Static lngPower() As Long
Static IsInitialized As Boolean
Dim lngLoop As Long
Dim lngSum As Long
Dim lngUbound As Long
lngUbound = UBound(parmItems)
If IsInitialized Then
Else
ReDim lngPower(0 To lngUbound)
For lngLoop = 0 To lngUbound
lngPower(lngLoop) = 2 ^ lngLoop
Next
IsInitialized = True
End If
For lngLoop = lngUbound To 0 Step -1
If (parmMask And lngPower(lngLoop)) <> 0 Then
lngSum = lngSum + parmItems(lngLoop)
If lngSum > parmSumLimit Then
SumItStatic = lngSum
Exit Function
End If
End If
Next
SumItStatic = lngSum
End Function
End Module
I think the following (snippet) is the cause of the problem. You need to set lngTarget to a non-zero value.
Since I'm not running this as a console application, is it possible for us to compromise on your reliance on the Console for your diagnostic messages?
====================
One of the things in my code is the use of a Single data type for the Target value. I'm currently looking at your code that fills the array and the variables used in that section of code.
Since I'm not running this as a console application, is it possible for us to compromise on your reliance on the Console for your diagnostic messages?
====================
One of the things in my code is the use of a Single data type for the Target value. I'm currently looking at your code that fills the array and the variables used in that section of code.
Case 2 '.5
lngTarget = 0
ASKER
The way I'm using that variable, the lngTarget value should be set to 0. This is a flag to denote there is no actual integer target so start merging values from lower and upper range. Like I said the code works fine using the original 23 item set.
And yes, running as a console app sacrifices the debug window. So I guess in that case we need to just run it as a standard app.
And yes, running as a console app sacrifices the debug window. So I guess in that case we need to just run it as a standard app.
Then I'll look at the hard-coded values in your code as the most likely culprit.
With the 31 items you posted, the range should be: 5904 To 6059
With the 31 items you posted, the range should be: 5904 To 6059
ASKER
Yes it was the hard coded values. I don't know why I couldn't find that earlier. I have changed the rest of the hard coded values to dynamic values based on original number of items in data. The only hard coded value is the lngItems Array bounds. I posted the latest code along with a change to your code I think is required in the main candidate sum calculation iteration. Correct me if I'm wrong.
For lngEnumerate = (2 ^ 31) - 1 To 1 Step -1
lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)
Select Case lngSum
Case LowerBound To UpperBound
CandidateSums(lngSum).Add(lngLoop)
**************CHANGED TO****************
CandidateSums(lngsum).Add(lngEnumerate)
***************************************
lngCandidateCount = lngCandidateCount + 1
End Select
Next
***************************************************
Latest Modifications:
Option Explicit On
Module Module1
Public lngIndexedSums() As Long
Sub Main()
Dim Groups() As String
Dim lngItems(0 To 30) As Long
Dim lngPower(0 To lngItems.GetUpperBound(0)) As Long
Dim lngEnumerate As Long
Dim lngLoop As Integer
Dim lngSum As Integer
Dim lngCandidateCount As Integer
For lngLoop = 0 To lngItems.GetUpperBound(0)
lngPower(lngLoop) = 2 ^ lngLoop
Next
lngItems(0) = 124
lngItems(1) = 170
lngItems(2) = 210
lngItems(3) = 296
lngItems(4) = 330
lngItems(5) = 400
lngItems(6) = 476
lngItems(7) = 492
lngItems(8) = 522
lngItems(9) = 544
lngItems(10) = 564
lngItems(11) = 570
lngItems(12) = 650
lngItems(13) = 662
lngItems(14) = 686
lngItems(15) = 724
lngItems(16) = 750
lngItems(17) = 780
lngItems(18) = 780
lngItems(19) = 786
lngItems(20) = 996
lngItems(21) = 996
lngItems(22) = 1074
lngItems(23) = 1079
lngItems(24) = 1083
lngItems(25) = 1239
lngItems(26) = 1287
lngItems(27) = 1391
lngItems(28) = 1396
lngItems(29) = 1412
lngItems(30) = 2677
For i As Integer = 0 To lngItems.GetUpperBound(0)
lngSum += lngItems(i)
Next
'calculate lower and upper bounds
' * calculate target = sum(lngItems)/4
' * if the largest item is > target, it should probably be the upper bound
' * otherwise, use the range as target +/- SQRT(target)
'
'initialize the sizes and lb arrays
Dim sizes() As Integer '= {118}
Dim lb() As Integer '= {3337}
Dim Target As Single, LowerBound As Integer, UpperBound As Integer
'Target = lngItems.lngItems.lngItems.Sum / 4
Target = lngSum / 4
LowerBound = Target - Math.Sqrt(Target)
UpperBound = Target + Math.Sqrt(Target)
Console.WriteLine("Target: " + Target.ToString)
Console.WriteLine("LowerBound: " + LowerBound.ToString)
Console.WriteLine("UpperBound: " + UpperBound.ToString)
Console.WriteLine()
ReDim lb(0)
ReDim sizes(0)
lb(0) = LowerBound
sizes(0) = UpperBound - LowerBound + 1
Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)
Console.WriteLine("CandidateSums LowerBound(0): " + CandidateSums.GetLowerBound(0).ToString)
Console.WriteLine("CandidateSums UpperBound(0): " + CandidateSums.GetUpperBound(0).ToString)
For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
CandidateSums(lngLoop) = New Collection
Next
For lngEnumerate = ((2 ^ (lngItems.GetUpperBound(0) + 1))) - 1 To 1 Step -1
lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)
Select Case lngSum
Case LowerBound To UpperBound
CandidateSums(lngSum).Add(lngEnumerate)
lngCandidateCount = lngCandidateCount + 1
Console.CursorLeft = 0
Console.CursorTop = 7
Console.Write("Total Candidates: " + lngCandidateCount.ToString)
End Select
Next
'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
' Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)
'Next
'Count total candidate sums
'lngSum = 0
'For lngLoop = LowerBound To UpperBound
' lngSum = lngSum + CandidateSums(lngLoop).Count
'Next
'NOTE: the following array is one larger than requred
ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
Dim lngIndexedPosition As Long 'Current position to store next item
Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
Dim lngLowerTotal, lngUpperTotal As Long
Dim lngLowerValue, lngUpperValue As Long
Dim lngItemSumTotal As Long
For lngLoop = 0 To UBound(lngItems)
lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)
Next
Select Case lngItemSumTotal Mod 4
Case 0 'exact integer
lngTarget = lngItemSumTotal / 4
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget + 1
Case 1 '.25
lngTarget = Math.Round(lngItemSumTotal / 4, 0)
lngLowerTarget = lngTarget
lngUpperTarget = lngTarget + 1
Case 2 '.5
lngTarget = 0
lngLowerTarget = Int(lngItemSumTotal / 4)
lngUpperTarget = lngLowerTarget + 1
Case 3 '.75
lngTarget = Math.Round(lngItemSumTotal / 4, 0)
lngLowerTarget = lngTarget - 1
lngUpperTarget = lngTarget
End Select
'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)
'Debug.Print "Target: " & Str(lngTarget)
'Debug.Print "LowerTarget: " & Str(lngLowerTarget)
'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
Dim lngLowerMergePosition, lngUpperMergePosition As Long
lngIndexedPosition = 0
Do While lngTarget <> -1
If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then
'MOD 1 or 3 so add all values from single target
Console.CursorLeft = 0
Console.CursorTop = 9
'Console.Write("lngLoop: " + lngLoop.ToString)
Console.Write("CandidateSums(" + lngTarget.ToString + ") Count: " + CandidateSums(lngTarget).count.ToString)
For lngLoop = 1 To CandidateSums(lngTarget).Count
'Console.CursorLeft = 0
'Console.CursorTop = 6
'Console.Write("lngLoop: " + lngLoop.ToString)
'Console.Write("CandidateSums(" + lngTarget.ToString + ") Count: " + CandidateSums(lngTarget).count.ToString)
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
'Alternate to next target
If lngTarget = lngLowerTarget Then
lngLowerTarget = lngTarget - 1
If lngUpperTarget <= UpperBound Then
lngTarget = lngUpperTarget
Else
If lngLowerTarget >= LowerBound Then
lngTarget = lngLowerTarget
Else
lngTarget = -1
End If
End If
Else
lngUpperTarget = lngTarget + 1
If lngLowerTarget >= LowerBound Then
lngTarget = lngLowerTarget
Else
If lngUpperTarget <= UpperBound Then
lngTarget = lngUpperTarget
Else
lngTarget = -1
End If
End If
End If
Else
'MOD 0 or 2
If lngTarget > 0 Then 'MOD 0 so add exact target values first
For lngLoop = 1 To CandidateSums(lngTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
lngTarget = 0 'Start merging
End If
'Merge largest items from lower & upper target groups
lngLowerMergePosition = 1
lngUpperMergePosition = 1
lngLowerTotal = CandidateSums(lngLowerTarget).Count
lngUpperTotal = CandidateSums(lngUpperTarget).Count
Do While lngLowerMergePosition <= lngLowerTotal And _
lngUpperMergePosition <= lngUpperTotal
lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)
lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)
If lngLowerValue >= lngUpperValue Then
lngIndexedSums(lngIndexedPosition) = lngLowerValue
lngLowerMergePosition = lngLowerMergePosition + 1
Else
lngIndexedSums(lngIndexedPosition) = lngUpperValue
lngUpperMergePosition = lngUpperMergePosition + 1
End If
lngIndexedPosition = lngIndexedPosition + 1
Loop
If lngLowerMergePosition <= lngLowerTotal Then
'Add remaining lower target values
For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
If lngUpperMergePosition <= lngUpperTotal Then
'Add remaining upper target values
For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count
lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
lngIndexedPosition = lngIndexedPosition + 1
Next
End If
'Determine next upper/lower targets
lngLowerTarget = lngLowerTarget - 1
lngUpperTarget = lngUpperTarget + 1
If lngLowerTarget < LowerBound Or lngUpperTarget > UpperBound Then
If lngLowerTarget < LowerBound Then
If lngUpperTarget <= UpperBound Then
'Upper target(s) still exist so add values
lngTarget = lngUpperTarget
Else
'Both outside limits so exit loop
lngTarget = -1
End If
Else
lngTarget = lngLowerTarget
End If
End If
End If
Console.CursorLeft = 0
Console.CursorTop = 8
Console.Write("Total Indexed Sums: " + lngIndexedPosition.ToString)
Loop
Console.Read()
End Sub
Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long
Static lngPower() As Long
Static IsInitialized As Boolean
Dim lngLoop As Long
Dim lngSum As Long
Dim lngUbound As Long
lngUbound = UBound(parmItems)
If IsInitialized Then
Else
ReDim lngPower(0 To lngUbound)
For lngLoop = 0 To lngUbound
lngPower(lngLoop) = 2 ^ lngLoop
Next
IsInitialized = True
End If
For lngLoop = lngUbound To 0 Step -1
If (parmMask And lngPower(lngLoop)) <> 0 Then
lngSum = lngSum + parmItems(lngLoop)
If lngSum > parmSumLimit Then
SumItStatic = lngSum
Exit Function
End If
End If
Next
SumItStatic = lngSum
End Function
End Module
ASKER
aikimark
Don't mean to rush, but how is that next section coming?
About your comments:
************************** ********** ********** *****
Two thoughts I had this weekend:
* we might be able to use the largest item as the upper limit if it exceeds our target value. Not sure of this assumption, so I didn't include it in the snippet.
* we might tighten the range from: target +/-SQRT(target) to something that looks at the smallest values in the list. For instance, if I take the largest item of the subset that will not exceed the target and use its SQRT(), I get a much smaller range = 22.85. I also played with the difference between the largest of the small item subset and the smallest item SQRT = 19.95. While we do not have a lot of candidate sums with our 23-item sample set, trimming the range will result in faster performance of the SumItStatic() function -- fewer iterations as well as memory footprint reductions for the largest sets.
************************** ********** ********** ****
In my head, I had questioned how to handle the large items (items larger than the target value). If the item is much larger than the target value, then it will have to stand alone. Hypothetically, let's say we have the following data set:
4
5
8
10
11
11
12
12
13
42
SUM = 128, MEAN = 12.8, TARGET = 128/4 = 32 SQRT TARGET = 5.66
UPPERLIMIT = 38 LOWERLIMIT = 26
In this case 42 is not going to be included in the CandidateSums collection. If we make the UPPERLIMIT = 42, then this will increase the total number of iterations required to find the best grouping considerably - right? In this case should we throw out 42 and make it a stand alone group. So now we need to determine the closest 3 group matching totals with the remaining 9 data items as follows:
SUM = 86, MEAN = 9.56, TARGET = 86/3 = 28.67, SQRT TARGET = 5.35
UPPERLIMIT = 34 LOWERLIMIT = 23
In the case of 2 (although highly unlikely) very large items, then we would repeat the above process of making it a stand alone group.
So after calculating the UPPERLIMIT, any individual value greater than that should be thrown out and stand alone. New limits will then be calculated until all individual values are within the UPPERLIMIT. Would this work?
Don't mean to rush, but how is that next section coming?
About your comments:
**************************
Two thoughts I had this weekend:
* we might be able to use the largest item as the upper limit if it exceeds our target value. Not sure of this assumption, so I didn't include it in the snippet.
* we might tighten the range from: target +/-SQRT(target) to something that looks at the smallest values in the list. For instance, if I take the largest item of the subset that will not exceed the target and use its SQRT(), I get a much smaller range = 22.85. I also played with the difference between the largest of the small item subset and the smallest item SQRT = 19.95. While we do not have a lot of candidate sums with our 23-item sample set, trimming the range will result in faster performance of the SumItStatic() function -- fewer iterations as well as memory footprint reductions for the largest sets.
**************************
In my head, I had questioned how to handle the large items (items larger than the target value). If the item is much larger than the target value, then it will have to stand alone. Hypothetically, let's say we have the following data set:
4
5
8
10
11
11
12
12
13
42
SUM = 128, MEAN = 12.8, TARGET = 128/4 = 32 SQRT TARGET = 5.66
UPPERLIMIT = 38 LOWERLIMIT = 26
In this case 42 is not going to be included in the CandidateSums collection. If we make the UPPERLIMIT = 42, then this will increase the total number of iterations required to find the best grouping considerably - right? In this case should we throw out 42 and make it a stand alone group. So now we need to determine the closest 3 group matching totals with the remaining 9 data items as follows:
SUM = 86, MEAN = 9.56, TARGET = 86/3 = 28.67, SQRT TARGET = 5.35
UPPERLIMIT = 34 LOWERLIMIT = 23
In the case of 2 (although highly unlikely) very large items, then we would repeat the above process of making it a stand alone group.
So after calculating the UPPERLIMIT, any individual value greater than that should be thrown out and stand alone. New limits will then be calculated until all individual values are within the UPPERLIMIT. Would this work?
I'm cutting in your corrective code with what I'd been doing.
In answer to your question, the code I'll post calculates the Sum/4 target value and then calculates the range as +/- SQRT(largest of the small items)
* the small items are summed until their sum is greater than the target. Then the immediately prior item is used as the largest of the small items.
The upperbound value of the range is compared to the largest item in the list. If the largest item is greater than the calculated upperbound, the upperbound is reset to the largest item.
In the example set you posted:
Sum(items) = 128
target = 32
sum(4,5,8,10,11) = 38
sum(4,5,8,10) = 27
SQRT(10) = 3.16
CalculatedRange= [28.83772234 To 35.16227766]
Since 42 > 35.16227766, the range is adjusted to [28.83772234 To 42]
I calculated a (potential) best fit of this sample set by hand as:
In answer to your question, the code I'll post calculates the Sum/4 target value and then calculates the range as +/- SQRT(largest of the small items)
* the small items are summed until their sum is greater than the target. Then the immediately prior item is used as the largest of the small items.
The upperbound value of the range is compared to the largest item in the list. If the largest item is greater than the calculated upperbound, the upperbound is reset to the largest item.
In the example set you posted:
Sum(items) = 128
target = 32
sum(4,5,8,10,11) = 38
sum(4,5,8,10) = 27
SQRT(10) = 3.16
CalculatedRange= [28.83772234 To 35.16227766]
Since 42 > 35.16227766, the range is adjusted to [28.83772234 To 42]
I calculated a (potential) best fit of this sample set by hand as:
(4,12,13) => 29
(5,11,12) => 28
(8,10,11) => 29
(42) => 42
ASKER
aikimark:
***********************
In answer to your question, the code I'll post calculates the Sum/4 target value and then calculates the range as +/- SQRT(largest of the small items)
************************** ******
That will tighten the range considerably now that I understand what you meant by (largest of the small items). Coincidentally, that value seems to be close to SQRT(standard deviation) for several different data sets I evaluated.
***********************
In answer to your question, the code I'll post calculates the Sum/4 target value and then calculates the range as +/- SQRT(largest of the small items)
**************************
That will tighten the range considerably now that I understand what you meant by (largest of the small items). Coincidentally, that value seems to be close to SQRT(standard deviation) for several different data sets I evaluated.
I've been having some system problems and am in the process of switching systems. Seems like one step forward and two steps back. :-)
I know what I want to do, but I have to code it and test it. I appreciate your patience.
What has happened with the other person looking at this problem in your organization?
I know what I want to do, but I have to code it and test it. I appreciate your patience.
What has happened with the other person looking at this problem in your organization?
ASKER
aikimark:
Sorry to hear about the problems. I know that can be very frustrating.
I'm a fairly patient guy, but unfortunately the persons who will be using this are becoming a little anxious for it. This is a relatively new process and manually dividing the data sets into 4 equal groups is proving to be quite tedious.
The other party involved here is strictly working on the pre-phase of the process which involves collecting of data and sample(s) eliminations, etc. This is being done using Access. They would like for me to be able to interface with this database and retrieve the final data sets as they come available. I'm very comfortable working with Access databases in VB so this should not be a problem. I just need your continued expertise with the algorithm code. Let me know if there is anything I can do at this point to help.
Sorry to hear about the problems. I know that can be very frustrating.
I'm a fairly patient guy, but unfortunately the persons who will be using this are becoming a little anxious for it. This is a relatively new process and manually dividing the data sets into 4 equal groups is proving to be quite tedious.
The other party involved here is strictly working on the pre-phase of the process which involves collecting of data and sample(s) eliminations, etc. This is being done using Access. They would like for me to be able to interface with this database and retrieve the final data sets as they come available. I'm very comfortable working with Access databases in VB so this should not be a problem. I just need your continued expertise with the algorithm code. Let me know if there is anything I can do at this point to help.
ASKER
aikimark:
Again - I don't mean to push, but how is it coming? Are you going to be able to code the next section?
If it doesn't look like your going to be able to, then could you give me all of the details on your plan for iterating through the sorted array to find the best grouping. I'm going to have some downtime towards the end of this week to spend on this project. I could at least be attempting to code it which would help me to better understand the process.
Thanks
Again - I don't mean to push, but how is it coming? Are you going to be able to code the next section?
If it doesn't look like your going to be able to, then could you give me all of the details on your plan for iterating through the sorted array to find the best grouping. I'm going to have some downtime towards the end of this week to spend on this project. I could at least be attempting to code it which would help me to better understand the process.
Thanks
ASKER
aikimark:
I'm receiving emails informing me that I have open questions and should be actively posting. Do you have things with your system back to normal yet? I'm spending my time in a hospital waiting room so it can be worse. :) I've come up with a plan for determining the best grouping and will be working on some code over the next couple of days myself, but I'm going to have to relearn some things from VB6 to VB.NET as I go. I'm just going to try to get something to work like I want it and worry about speed optimization later.
I'm receiving emails informing me that I have open questions and should be actively posting. Do you have things with your system back to normal yet? I'm spending my time in a hospital waiting room so it can be worse. :) I've come up with a plan for determining the best grouping and will be working on some code over the next couple of days myself, but I'm going to have to relearn some things from VB6 to VB.NET as I go. I'm just going to try to get something to work like I want it and worry about speed optimization later.
I have the metacode, but haven't typed it into VS.Net
ASKER
aikimark:
Thanks for your help getting me on the right track with this algorithm. I haven't finished it yet, but I do have something that works about half of the time. I know what needs to change. I'm currently working on using a more object oriented approach with generic collections of candidates. This is starting to look promising, but I have many more hours ahead for testing and tweaking.
Anyways, I wanted to award you the 500 points for your help thus far before the question was closed by a moderator. I realize your busy and I really appreciate you taking the time to steer me in the right direction. It looks like I'll be able to finish it from here. Thanks again!
Thanks for your help getting me on the right track with this algorithm. I haven't finished it yet, but I do have something that works about half of the time. I know what needs to change. I'm currently working on using a more object oriented approach with generic collections of candidates. This is starting to look promising, but I have many more hours ahead for testing and tweaking.
Anyways, I wanted to award you the 500 points for your help thus far before the question was closed by a moderator. I realize your busy and I really appreciate you taking the time to steer me in the right direction. It looks like I'll be able to finish it from here. Thanks again!
thank you for the points.
would it be helpful, hurtful, or a waste of time if I posted my metacode?
would it be helpful, hurtful, or a waste of time if I posted my metacode?
ASKER
I have changed up the code so much at this point that I probably wouldn't use any of it, but it still may be helpful to see what you had in mind. All-in-all I have a program that works very well with data sets smaller than 35. I have also made huge strides in speed optimization. I'm receiving out of memory exceptions with data sets upwards of 35-40 items. I'm working on ways to reduce the memory footprint.
But you can solve it in pseudo-polynomial time with a Dynamic Programing algorithm