Link to home
Start Free TrialLog in
Avatar of choelt
choelt

asked on

Algorithm to sort numbers into equal groups

I have a process where we are manually trying to sort containers of varying weights into 4 bins of nearest equal total weights.  Here is a sample of typical data with 13 items:

1.68
1.76
2.34
3.88
4.04
5.12
6.10
7.00
7.02
7.24
7.68
8.20
8.62

Now I need to determine which items to group into bins.  I need 4 bins with the closest matching total weights possible.  So in this case the total weight is 70.68.  So I need 4 bins containing as close to 17.67/ea. possible.  There will always be 4 bins no matter how many items.  The total items varies usually between as few as 6 to upwards of 40.  Also let me point out that it doesn't matter how many items are in each bin.  One bin could have 5 items while another has only 2.  We're just trying to match total weights as close as possible.

I'm a VB developer, but can understand most all basic logic as long as it is in somewhat readable form. (why/how explanations help me considerably)  I have been working on some basic sorting using arrays, but I'm not able to come up with anything that consistently calculates a best case scenario for different data sets.  I need help with an algorithm that will work much more consistently than the manual system we are using now which is for the most part trial and error.
Avatar of ozo
ozo
Flag of United States of America image

In general, the problem is NP-Complete
But you can solve it in pseudo-polynomial time with a Dynamic Programing algorithm
@choelt,

I think you can solve this with a knapsack algorithm, optimized with a greedy (Big-In-First) start.

You might use the variance of these four-bin combinations to pick your winner. =sum((avg-binsum)^2) or a simpler =sum(abs(avg-binsum))

ozo is correct in that this problem is NP difficult.
choelt

Given the 40 item limit, there is another approach that I've used in a different NP problem...simple iteration with masking.

Since you must look at all combinations of items, you must do a kind of binary representation of the inclusion/exclusion of an item based on the binary representation of a loop iteration variable.

There is some trimming of this iteration approach.  Since your combination must approach 1/4 of the sum of all the items, you can reasonably exclude the smallest items from your main iteration loop.  In the example you posted, the bottom 5 (of 13) items are never going to be used as one of 'big in first' combinations.  These are always going to be filler items.  You will iterate over these small items, but for a different purpose.  This will usually reduce our problem size to something that we can represent with a Long integer for the 'big item' loop.

A second performance boost will be the representation of these values as integers, rather than floating point values.  Multiply your values by 100.

You might do a space/time trade-off and represent your best-fit (4 bin) combinations with 1-4 arrays.

==============================
EE reference material:

I solved a similar problem in the interpreted VBA (MS-Access) environment:
https://www.experts-exchange.com/questions/20908880/Find-any-entries-debits-and-credits-within-an-account-that-cancel-each-other-out-add-up-to-zero.html
There were different constraints than this problem, but it is illustrative of a complete enumeration approach optimization.  In your case, we aren't looking for exact sums but rather the item-sums surrounding the 1/4 sum.  I might suggest that the equivalent to my byte array needs to be
From (AllItemSum/4 - SmallestItemsSum/2) To (AllItemSum/4 + SmallestItemsSum/2)
The SmallestItemsSum is the sum of the smallest items less than will not be used in the big items iteration.  I had originally thought that I might be able to use some form of standard deviation for this array dimension, but couldn't come up with a simple way.

I hope that you are working in a compiled environment and won't have to externalize your iteration/enumeration looping.
_______________________________________________________________
Some time later, I participated a similar enumeration question in the Delphi forum:
https://www.experts-exchange.com/questions/21354398/Algorithm-Combination.html

I was curious enough to ask for an implementation of this enumeration in Delphi
https://www.experts-exchange.com/questions/21355411/Seeking-fastest-bit-counting-and-combinatorial-enumeration.html

ozo and I both participated in another discussion from a pharmacy environment:
https://www.experts-exchange.com/questions/22752392/Procedure-to-optimize-mixes.html
Avatar of choelt
choelt

ASKER

aikimark

Thanks for the post.  WOW!  The VBA Access environment problem is alot of information to digest.  I'll need time to study it to get my bearings.  It never fails to amaze me that the problems that seem simple at first glance really are quite complex.

I have both vb.net and vb6 at my disposal.  So yes, it will be a compiled program.  I'm digging out some of my dusty algorithm books purchased over the years to study up on NP's and dynamic algorithms.  I have looked into the knapsack solution, but I'm not sure what you are referring to with the greedy(big-in-first) start.  Could you explain in more detail using as much "plain English" as possible.  I don't have a degree in computer science, but I don't mind learning.  Keep in mind that I'm trying to tackle this in my "spare time".  LOL

I'm on the same page in terms of using the sum of variance of all 4 groups to pick the winner.  Maybe standard deviation also.  The light bulb works - If I could only find the switch
@choelt

I gave a presentation of the Access/VBA problem to one of the local user groups and one of the CompSci Departments.  One of the CS grad students mentioned that my solution was an example of a class of NP solutions, something I didn't know before my presentation.  Sorry to lead with that solution, but it was complicated by the interpreted nature of the VBA runtime environment.

Upon some reflection, I might use the square root of the 1/4 sum as a limit of the lower and upper ranges of my array.
From (AllItemSum/4 - SQRT(AllItemSum/4)) To (AllItemSum/4 + SQRT(AllItemSum/4))
In the posted example and made integers, this would be an array dimensioned
From(1767 - SQRT(1767)) To (1767 + SQRT(1767))
From (1725) To (1809)

========================
An explanation of the knapsack problem.
* Simple restatement -- what is the best way to pack an knapsack to get as much stuff into it as possible.
* Greedy packing -- start packing the largest items first and then fit the smaller items around them.
* Masking -- Using the binary representation of an integer value as a use/don't use indicator.  
Example:
13 items = (2^13) - 1 combinations = 8191 combinations
Let's say that the bottom 5 items can not possibly add to our target, so
For lngItemLoop = 8191 To 2^5 Step -1    '2^5 = 32
   CurrentIterationSum = SumItems(lngItemLoop)
   If CurrentIterationSum between lowDim and highDim Then
     'look for next bin's sum, excluding the items used to make the prior bins' sums
     'there will be a secondary 2^5 loop to try and get close to the exact (AllSum/4) value
   End If
Next
The SumItems() function adds the items where there is a 1 at that position in the value of lngItemLoop.  The subsequent bin enumerations can use a mask of the current loop value against the remaining (not already used) mask, which is a very fast operation.

In the Access/VBA example, I had sorted by the oldest items to the most recent items, trying to retire the oldest balances first.  In your case, you would use the weight of the item as your sort key. (which you presented in your list)

You can improve the performance of this by precalculating the SumItem values in an array.  Then the lngItemLoop is merely indexing into an array.

You only need to evaluate combinations variance when you have all four bins sums calculated.  You can

Don't feel badly about the size and complexity of this.  Very few non CompSci folks understand the nature of NP.  I have been working in IT for decades and this Access/VBA question was the FIRST one that presented this level of complexity.

Good luck.
Avatar of choelt

ASKER

aikimark

Thanks for the details in your last post, but I still need a little more information.  I've looked at the VBA Access sample, but I'm not able to really make heads or tails out of the big picture of what your trying to do with the numbers.  I'm firmly believe in trying to figure out a problem on your own.  If your handed the answer you don't really learn how to problem solve.  I think I can figure out how to code it, but I just need to see the big picture here.  I have some questions that may help me clarify exactly what I'm trying to accomplish here.

Are you calling lowDim and highDim the 1725 & 1809 values respectively?  So add big items first and if it falls between 1725 & 1809 then iterate through small 5 items to determine the combination which is closest to 1767 (AllItemSum/4).  At the point that I determine the combination that is closest to 1767, what is the next step?  Do I store this in a 2 dimenstion array with a decimal representing the binary equivalency of selections and the subset sumtotal representing my group 1.  From your last post it looks to just return to main loop and do it all over again with a new big in first start.  Are you saying to mask against the previous selections so that none of those values are included in this next iteration?  If this is the case, how do you know there isn't a better combination down the line which include some of the bigs with a mix and match of mids/smalls?


In the main loop it would start off by passing all 13 values to the SumItems subroutine, "1111111111111 binary" correct?  If so, why even calculate for values where you are including more than (N-3) items since I need 4 different subsets at a minimum.

Where did you come up with the following for lower/upper limits?
From (AllItemSum/4 - SQRT(AllItemSum/4)) To (AllItemSum/4 + SQRT(AllItemSum/4))
I would also think that standard deviation could be used here.  I would think it would depend on how dispersed the data is.


I seem to have a basic feel for what you are trying to do, but I'm not seeing how I'm supposed to be able to evaluate all different group combinations that are in the ballpark and then compare variance among the ones that are closet to sumtotals of 1767 with unique selections.  The only way I can see it is if I come up with 4 groups in this manner and then calculate variance and store the result.  Continue on until I have 4 more groups and again store the calculated variance.  At the end choose the set of 4 with the smallest variance.  If this is the case, would another method be more accurate as follows:
Calculate sumtotal for every combination and store sumtotal and binary mask representing selections for each total in a 2 dimenstion array.  At the end of this loop we would have a distribution of sumtotals along with their respective selections.  Then work outward from the midpoint of selection sets totaling 1767 until finding 4 that have unique selection sets using mask comparisons.

Maybe I'm not seeing it at all.  If at all possible could you revisit this and try to provide me with a more literal explanation of the entire process.  I'm a big picture kind of guy.  One other thing.  You stated in your previous post that by precalculating SumItems the process would be speeded up.  Is this due to avoiding the overhead of calling the subroutine and returning?

I would appreciate as much attention as you could give me here as I feel I'm on the verge of understanding.  

Thanks, Cory
@Cory

I need to play with this some more, both in my head and with some code.  At this point, I was thinking about the following:
1. Use a big integer variable as my iteration (looping) variable to count down from 2^itemcount to lowerlimit.  
2. Use the bitmask of the iteration variable (1 bits) to index into the array of weights.
3. If the sum of the indexed items is within some range (surrounding our target), add the iteration variable to a data structure (collection or array of collections/lists).
4. Sort the list/array by proximity to the actual 1/4 sum value.

Here comes the tough part...
5. Starting at the top of the sorted list/array, find the next three combinations that have a non-intersecting bit mask pattern.
6. Stop looking when you find the first set of four items.

===============
Notes:
* I'm not sure if the Big Integer variables in VB.Net have native boolean operators available like the Long Integer variables to in VB6.
* The bit mask is the 0/1 pattern of the binary value of the (big) integer variable
* If we use an array of collection/list items, we don't have to do the sorting step. We fill our 'sorted' array starting with the items at our exact target value (1/4 sum) and then the items immediately above/below the target.  Then we add items target+/-2.
* Since we added the items in numerically descending order, this should give us the big-in-first values at the top of each list set.
* Although I suggested the +/- sqrt(targetvalue), I might change my mind about the lowerlimit value.

=======================
"Are you calling lowDim and highDim the 1725 & 1809 ..."
Yes.
I don't want to have a double iteration for the calculations.  The summation function can have an upper limit parameter, allowing it to stop summing as soon as the limit has been met/exceeded.

"Are you saying to mask against the previous selections so that none of those values are included in this next iteration?"
Mostly yes.  We have to be clear what we mean by iteration.  I expect there to be a FindSum() function that will have two parameters
  parmMask - bitmask pattern of current iteration variable and previous FindSum() results for the current iteration.
  parmStartPosn - The index returned by the previous FindSum() invocation

If any of the FindSum() functions returns a -1 within some iteration, then our code will 'back up' one level, increment the parmStartPosn and to another round of FindSum() until the top level returns a -1 or we 'find' all 4 sum items.  This is a lot like searching a tree or a shortest-pathfinding algorithm.
I might 'package' the FindSum() code into a recursive routine, but the packaging isn't important right now.  Just in case, do you know what recursive routines are?

While I'm thinking and playing, try to find an instance of the 40 item set that is representative of what you are likely to encounter and post it in this thread.

Avatar of choelt

ASKER

aikimark

Yes, I've coded some recursive routines over the years in some special situations.  I'm not an expert programmer by any means, but I do understand the basics of recursive programming.  Calling a function from within itself up to n levels deep.

I don't have a sample of data that large with me, but I'm working on that.  I'll post the largest sample set I can get my hands on ASAP.

Thanks for continuing with this.
No problem.  I like problem solving.

Although I stated that the code might be able to stop with the first set of four sum combinations, I can think of cases where the first set of four might be one exact match and three combinations near the end of our 'sorted' list.  The variance of this set-of-four might be greater than a set with no exact matches.

The eventual 'winner' set-of-four will be one where all the bits (items) are used and the variance of the set is the lowest.

========================
To get an idea of the magnitude of this problem consider:
* Simply iterating from 1 to (2^40 -1) is 512 times slower than just iterating to (2^31 -1)
* For each iteration, we need to do some item value (weight) summation
* For each iteration, we need to do a range check and, potentially, add the item to a collection/list
* Once the initial iteration is completed, we have roughly 4*(N^2) more iterations to find our four-bin optimal set, where N is the number of candidate sums found in the first iteration.
If N=30000 Then we need about 3,600,000,000 more find-and-compare iterations.

=======================
In our case, we pass some level/depth data to the recursive routine, letting it calculate whether there is something found (and return it) or keep looking.  With each depth increase, we pass a decremented value of the depth parameter passed to us.

We only need to calculate the variance, once we have a set of four sums.
Avatar of choelt

ASKER

aikimark

Here are 2 samples that may help.  I ranked them and multiplied the values by 100.  I could not find a 40 count, but that is rare.  I would say the average number of items is close to 23.

1)  35 items
1      84
2      93
3      94
4      102
5      110
6      116
7      120
8      120
9      121
10      132
11      132
12      134
13      142
14      154
15      156
16      164
17      164
18      167
19      178
20      180
21      192
22      210
23      226
24      226
25      248
26      268
27      278
28      293
29      306
30      313
31      346
32      368
33      386
34      406
35      474

2) 23 items
1      124
2      170
3      210
4      296
5      330
6      400
7      476
8      492
9      522
10      544
11      564
12      570
13      650
14      662
15      686
16      724
17      750
18      780
19      780
20      786
21      996
22      996
23      1074
Just to get a feel for the expected N, I ran this little code snippet against your 23 item set of data.

I know that this code isn't flexible to cover differently sized sets of data and I don't know the source of your data (file, manual entry, database, etc.).  However, it is a proof-of-concept start on the code.

Sum = 13582
Sum/4 = 3395.5
SQRT(Sum/4) = 58.27091899
RangeLow = 3337.229081 = 3337
RangeHigh = 3453.770919 = 3454

The enumaration iteration pass populated the CandidateSums array with the following counts at each sum value position:
3337      0
3338      374
3339      0
3340      346
3341      0
3342      339
3343      0
3344      368
3345      0
3346      380
3347      0
3348      329
3349      0
3350      361
3351      0
3352      334
3353      0
3354      362
3355      0
3356      403
3357      0
3358      317
3359      0
3360      376
3361      0
3362      412
3363      0
3364      346
3365      0
3366      424
3367      0
3368      375
3369      0
3370      384
3371      0
3372      390
3373      0
3374      363
3375      0
3376      404
3377      0
3378      362
3379      0
3380      351
3381      0
3382      403
3383      0
3384      358
3385      0
3386      416
3387      0
3388      346
3389      0
3390      378
3391      0
3392      411
3393      0
3394      379
3395      0       <- Target sum closest values
3396      408   <- Target sum closest values
3397      0
3398      361
3399      0
3400      390
3401      0
3402      450
3403      0
3404      348
3405      0
3406      429
3407      0
3408      409
3409      0
3410      362
3411      0
3412      431
3413      0
3414      385
3415      0
3416      417
3417      0
3418      408
3419      0
3420      380
3421      0
3422      467
3423      0
3424      393
3425      0
3426      410
3427      0
3428      416
3429      0
3430      385
3431      0
3432      452
3433      0
3434      374
3435      0
3436      389
3437      0
3438      432
3439      0
3440      389
3441      0
3442      442
3443      0
3444      388
3445      0
3446      419
3447      0
3448      446
3449      0
3450      412
3451      0
3452      449
3453      0
3454      427
            ========
TOTAL:      23059

So my earlier estimate of 30000 candidate sums seems like a reasonable guess, although this will increase as we increase the set sizes.

Would you like to tweak this code snippet to better meet your needs before I continue?
* This code will work for sets of 31 or less because I'm using a Long Integer variable.  Different variable data types will be needed for larger set sizes.  
* I don't know if there is an equivalent AND operator in VB.Net that will work with larger integer variables.  If not, we will have to create something.
* This code still needs to copy the items into another array, starting at the target position and alternating to the lower/upper boundaries.

Option Explicit
 
 
Sub SumIteration()
    Dim CandidateSums(3337 To 3454) As New Collection
    Dim lngItems(0 To 22) As Long
    Dim lngPower(0 To 22) As Long
    Dim lngLoop As Long
    Dim lngSum As Long
    For lngLoop = 0 To 22
        lngPower(lngLoop) = 2 ^ lngLoop
    Next
    lngItems(0) = 124
    lngItems(1) = 170
    lngItems(2) = 210
    lngItems(3) = 296
    lngItems(4) = 330
    lngItems(5) = 400
    lngItems(6) = 476
    lngItems(7) = 492
    lngItems(8) = 522
    lngItems(9) = 544
    lngItems(10) = 564
    lngItems(11) = 570
    lngItems(12) = 650
    lngItems(13) = 662
    lngItems(14) = 686
    lngItems(15) = 724
    lngItems(16) = 750
    lngItems(17) = 780
    lngItems(18) = 780
    lngItems(19) = 786
    lngItems(20) = 996
    lngItems(21) = 996
    lngItems(22) = 1074
    For lngLoop = 2 ^ 23 - 1 To 1 Step -1
        lngSum = SumItStatic(lngLoop, lngItems(), 3455)
        Select Case lngSum
            Case 3337 To 3454
                CandidateSums(lngSum).Add lngLoop
        End Select
    Next
    Stop
End Sub
 
Function SumItStatic(parmMask As Long, parmItems() As Long, parmSumLimit As Long) As Long
    Static lngPower() As Long
    Static IsInitialized As Boolean
    Dim lngLoop As Long
    Dim lngSum As Long
    Dim lngUbound As Long
    
    lngUbound = UBound(parmItems)
    If IsInitialized Then
    Else
        ReDim lngPower(0 To lngUbound)
        For lngLoop = 0 To lngUbound
            lngPower(lngLoop) = 2 ^ lngLoop
        Next
        IsInitialized = True
    End If
    
    For lngLoop = lngUbound To 0 Step -1
        If (parmMask And lngPower(lngLoop)) <> 0 Then
            lngSum = lngSum + parmItems(lngLoop)
            If lngSum > parmSumLimit Then
                SumItStatic = lngSum
                Exit Function
            End If
        End If
    Next
    SumItStatic = lngSum
End Function

Open in new window

Avatar of choelt

ASKER

aikimark

The data will be manually entered from a sheet with values written down.  I will create a user interface for that.  Once the data is entered, I can sort into ascending array and pass the array into the SumItStatic function like your doing in this code.  Do you need that at this point?

I don't see anything needing tweaking at this point.  I understand what this code snippet is doing, but just to clarify.  At completion the CandidateSums collection contains the long integers whose binary equivalents represent selections which add up to sums of 3337 through 3454.  In this case there are no odd numbered sums due to all data being evenly numbered.  The results are indicating there are 408 different combinations that add up to 3396.  Am I correct so far?   I don't see how there are 408 different combinations adding up to 3396.  I guess that's why we have the computer doing the math.

Now the hard part right, as you mentioned above?  Looking at results and coming up with unique selection sets starting with these 408 possibilities, right?

As far as vb.net goes, I'm perfectly fine with using VB6 for this project.  If I need to adapt it to VB.Net in the future, then I can.  I believe VB6 may run it faster anyways.
You correctly interpreted the results.

I do not need your input methods as long as you can adjust the code to handle a variety of different set sizes.

I just verified that VB.Net AND operator supports the 64-bit LONG integer data type.  I think that the solution would be easier in VB.Net than in VB6 since VB6 doesn't have a native 64-bit integer data type.

Since we are here to help you, can you take a crack at the next bit of code to add the CandidateSums items into a single array in the proper order?
Avatar of choelt

ASKER

Yes I can do that.  As far as this array building code goes, your wanting to add all of the items in the 3396 index first.  Then alternate adding the values in (3395 & 3397), (3394 & 3398) ....... (3337 & 3454) indexes respectively ending up with all 23,059 values added.  If this is not correct, please let me know before I go to far with it.  It is midnight here so I'm going to catch a few winks.  I'll post the code sometime tomorrow depending on how my normal day goes.

The last hour or so I have been testing routines to mask items in the 3396 index of the collection.  I didn't use recursive routines.  So it is not optimized for speed.  I was just wanting to evaluate the results stored in the CandidateSums collection.  I found numerous groups comprised of 3 unique subsets of values totaling 3396, but not 4.  If there consists of three unique subsets totaling 3396, then there is no reason to go any further since the remaining 4th subset will have to fall in line.  Do you agree?  The remaining unused items can easily be determined using a final mask.  In this case they will add up to 3394.
you understand correctly.

If you want an optimal answer, you have to keep looking.  If you find a set of four within the first (target value) set, then there is no better solution and you can finish early.  The problem with your proposal is that you might find three mutually exclusive combinations within the target set of items and your only other set is at the periphery of the range of sets -- or beyond.  Using the variance metric, your solution might not be better than two combinations from the target set and two from the +/-1 sets.  This is the 'N' part of the NP problem complexity description...Nondeterministic.  To me, the best optimization trick is to find a sure fire method of stopping early.

In looking at your set of 23 items, there might be some valid conclusion to use the remainder of the three sums as a stopping criteria and constructing the number from the unmasked bits.  However, I do not think this would be true with any general set of item weights.  I will also give some thought to stoppage conditions -- if we look farther, we will be starting with a worse (combination set) variance than we already have found.  As long as the list/array iteration is looking at the target value items, I can't think of such a condition.

Both of your larger set examples contain fairly even data values.  I can think of sets of data where the data values might be clumpped with few combinations close to the target value.

You can check the mutual exclusiveness of the candidates using the AND operator.  
Example:
Bin1 AND Bin2 AND Bin3 AND Bin4 = 0

As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator.  We pass this OR'ed bit mask down to the next level along with the next position (index) in the list.

I realize that this might seem to contradict what I'd written earlier in this comment, but if we find a set of four, we don't need to look further at the lowest level, but at the next to lowest level.  I'm going to call it a night and think about the structure of the code and parameters when performing a FindNext (what I'm calling this kind of function call/use similar to what you'd use with a recordset object) starting at the next-to-last depth.
I think I might know an early stoppage criteria.  Once we've done the initial enumeration iteration, we can calculate the possible 'variances' of the combinations.  We can then compare our candidate combinations' variances.  When we have iterated all the possible combinations with one distance, we might be able to state (programmatically) that we have found an optimal combination.  While this calculation is more work, it would allow us to stop with a high confidence that we'd found an optimal solution.

It might also be possible to use this distance information when we are searching for combinations, so that we might limit the depth of the searches.

In the snippet, I have done a calculation of 4 bin 'variance' (distance from target) combinations.  The first set uses the sqare of the distance.  The second set uses the ABS() of the distance.  In both of these sets, the right-most column is the sum of the first 4 columns.  The first example is more representative of statistical variance, since larger values are magnified.

0.25	0.25	0.25	0.25	1
0.25	0.25	0.25	2.25	3
0.25	0.25	2.25	2.25	5
0.25	2.25	2.25	2.25	7
2.25	2.25	2.25	2.25	9 
==============================			
0.5	0.5	0.5	0.5	2
0.5	0.5	0.5	1.5	3
0.5	0.5	1.5	1.5	4
0.5	1.5	1.5	1.5	5
1.5	1.5	1.5	1.5	6

Open in new window

Avatar of choelt

ASKER

aikimark:

Here is my stab at the code to build the sorted array building out from target sum to lower and upper limits alternating between the 2.  This code alternates up until the point that it reaches the lower of 2 total counts in the upper and lower targets.  Let's say we've already added all of the 3396 sums to the array.  Offset +/- 1 (3395 & 3397 respectively) are both zero counts in this case, but the next set is offset +/- 2 (3394 & 3398) respectively.  3394 has 379 total items &  3398 has 361 total items.  I wrote the code to alternate up to the lower of the 2 totals.  In this case 361.  At that point I add the remaining 18 values from 3394 before continuing on to offset +/- 3 (3393 & 3399).  Is this the correct way to approach this?  Also in my main DO Loop I continue until I hit either lower or upper limit.  If there is an even number of total sum sets (as there are in this case) there will be an odd man out.  So I add it last.  I have tested the code and it seems to be working.  Let me know otherwise.
'Count total candidate sums
    lngSum = 0
    For lngLoop = 3337 To 3454
        lngSum = lngSum + CandidateSums(lngLoop).Count
    Next
    Dim lngIndexedSums() As Long
    ReDim lngIndexedSums(lngSum)
    Dim lngIndexedPosition As Long 'Current position to store next item
    Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
    Dim lngTotalTargetItems, lngLowerTotal, lngUpperTotal As Long
    lngIndexedPosition = 0
    lngTarget = 3396
    '1st add all items in target index to array
    For lngLoop = 1 To CandidateSums(lngTarget).Count
        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
        lngIndexedPosition = lngIndexedPosition + 1
    Next
    lngLowerTarget = lngTarget - 1
    lngUpperTarget = lngTarget + 1
    Do While lngLowerTarget >= 3337 And lngUpperTarget <= 3454
        lngLowerTotal = CandidateSums(lngLowerTarget).Count
        lngUpperTotal = CandidateSums(lngUpperTarget).Count
        'Determine which collection has larger number of items for this offset
        If lngLowerTotal = lngUpperTotal Then
            'Count is equal for this iteration so set loop count to Count
            lngTotalTargetItems = lngLowerTotal
        Else
            'Set loop count to higher of 2
            If lngLowerTotal > lngUpperTotal Then
                lngTotalTargetItems = lngLowerTotal
            Else
                lngTotalTargetItems = lngUpperTotal
            End If
        End If
        For lngLoop = 1 To lngTotalTargetItems
            'Add next sum from lower offset target if it exists
            If lngLoop <= lngLowerTotal Then
                lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
                lngIndexedPosition = lngIndexedPosition + 1
            End If
            'Add next sum from upper offset target if it exists
            If lngLoop <= lngUpperTotal Then
                lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
                lngIndexedPosition = lngIndexedPosition + 1
            End If
        Next
        lngLowerTarget = lngLowerTarget - 1
        lngUpperTarget = lngUpperTarget + 1
    Loop
    
    'Check to see if there is an odd man out and add it if necessary
    If lngLowerTarget = 3337 Then
        'There is 1 more lower offset to iterate through
        For lngLoop = 1 To CandidateSums(lngLowerTarget).Count
            lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
            lngIndexedPosition = lngIndexedPosition + 1
        Next
    End If
    If lngUpperTarget = 3454 Then
        'There is 1 more upper offset to iterate through
        For lngLoop = 1 To CandidateSums(lngUpperTarget).Count
            lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
            lngIndexedPosition = lngIndexedPosition + 1
        Next
    End If
    
    'Validate all positions of indexed array contain sums
    Dim bZeroFound As Boolean: bZeroFound = False
    Dim lngIndexedValue As Long
    For lngLoop = 0 To UBound(lngIndexedSums()) - 1
        lngIndexedValue = lngIndexedSums(lngLoop) 'set watch to see values
        If Not lngIndexedValue > 0 Then
            bZeroFound = True
        End If
    Next
    
    Debug.Print "LowerTargetOffset: " & Str(lngLowerTarget)
    Debug.Print "UpperTargetOffset: " & Str(lngUpperTarget)
    Debug.Print "Zero Found: " & Str(bZeroFound)

Open in new window

Since the target might not be evenly divisible by four, you should tweak your code, adding some preference to which of the lower/upper index gets used first.  It might help to look at a variation on our 23 item example (in the snippet).

You could also use a MOD function, where
* target MOD 4 = 1 should prefer the lower adding before the upper
* target MOD 4 = 3 should prefer the upper adding before the lower
* for 0 and 2 MOD values, there is no preference
I've done something similar where I have a two item array that is initialized as either (-1,1) or (1,-1) as my preferences.

Sum      Sum/4    Preference
------   -------- -----------
13580	3395     None
13581	3395.25  Lower
13582	3395.5   None
13583	3395.75  Upper
13584	3396     None
13585	3396.25  Lower
13586	3396.5   None
13587	3396.75  Upper
13588	3397     None

Open in new window

Avatar of choelt

ASKER

So in this case where the sum/4 is 3395.5, I should NOT start out by adding all of the 3396 sums to the sorted array.  I should start out alternating between 3395 & 3396, although in our case 3395 has no items.

Here is how I think it should be sorted for the different cases:
Case
MOD 0 - Start by adding all exact target sums and then alternate
MOD 1 - Start by adding all target sums from the next lowest and then alternate
MOD 2 - Start by alternating
MOD 3 - Start by adding all target sums from the next highest and then alternate

Examples with above sum variations you presented:
 Sum      Sum/4    Preference
------   -------- -----------
13580      3395    add all sums from 3395 and start alternating 3394 & 3396
13581      3395.25   add all sums from 3395 and start alternating 3394 & 3396
13582      3395.5   start alternating 3394 & 3396
13583      3395.75  add all sums from 3396 and start alternating 3397 & 3395
13584      3396     add all sums from 3396 and start alternating 3395 & 3397
 .............. and so on

Is that what you had in mind?

Something else I'm a little confused on.  Hypothetically speaking lets say the sum/4 was 3395 mod 0.  So I add all items from 3395 first and then start alternating between 3394 & 3396.  I know it's not real data because our sum/4 is 3395.5, but for the sake of argument would it be better to sort these numbers from high to low, rather than alternate back and forth since Big In First is wanting biggest numbers first.

Looking the the actual first 5 items of 3394 & 3396 collections:
3394:  6554112
3394:  6423040
3394:  6324289
3394:  6299778
3394:  6294020
3396:  7340048
3396:  6815764
3396:  6299669
3396:  6295566
3396:  6291607

These could be stored in the sorted array as:
7340048  3396
6815764  3396
6554112  3394
6423040  3394
6324289  3394
6299778  3394
6299669  3396
6295566  3396
6294020  3394
6291607  3396
I did *not* mean to alternate adding items from the +/-1 arrays.

You want to add all the items from arrays closest to your target before adding any items from arrays farther away from your target.

In our example case items from 3396 and 3395 are equidistant from 3395.5, so there is no order preference.  In this case, I don't think it matters whether you create an algorithm that gives preference to the higher value (in the case of ties).

I think you have made an interesting case for the introduction of a merge operation when adding items from equidistant (from target) arrays.  It is consistent with my earlier Big-In-First comments.  Are you familiar with merge sort algorithm?  It is quite simple -- add the largest/smallest item from the 'head' of two lists until you exhaust one of the lists and then add the remaining items from the non-empty list.  You would use a merge when
target MOD 4 = 0 or 2

I had envisioned an evaluation of candidates algorithm that would iterate through an entire 'distance set' (all items from an array with the same distance from the target).  In that case, the merging wouldn't be necessary.

From your recent comments, it seems like you are starting to get a handle on this.  How do you feel about your understanding of the problem and the path we are following toward a solution?

===================
I'm thinking about some of these shortcuts:
* create and populate a data structure array that has the lower and upper index values and the distance-from-target for all items within that range.  That will allow us to calculate the 'variance' without having to recalculate the sums of the four items we are considering.  I've included an example of part of such an data structure with 1-origin.  This is only the first five ranges closest to the target value.
* use some sort of vector to communicate the starting positions for the FindSum/FindNext functions.  This would be a two-way communication vector.

Note: the variance values get large quickly, so I would expect the optimal four item set to be found in the first few ranges.  However, that is not guaranteed.
From     To       Variance
-----    -------  ---------
1	408	0.25
409	788	2.25
789	1150	6.25
1151	1562	12.25
1563	1953	20.25

Open in new window

Avatar of choelt

ASKER

I had no time to think about this today, but I would like to say that 'yes' I'm fairly comfortable with the understanding of the problem and the path being taken thus far.  I have to admit that I'm confused about your approach your comments on data structure containing upper/lower index and distance from target values.  I'm really not following your thought pattern here at all with the sample structure shown and what is representing.

Are we still going to need the sorted array I was working on?
"...I'm confused..."
there are 408 items that are in the group closest to the target.
there are 379 items that are in the next closest group.
there are 361 items that are in the next closest group.
there are 411 items that are in the next closest group.

the values in the data structure are cumulative sums, representing the original counts stacked on top of each other in the single array/list.

We don't have to have the From column in this list.  I thought it made the data structure more understandable.  I was wrong.

====================
"Are we still going to need the sorted array I was working on?"
Yes.  The new data structure merely describes the sets of items added to this array.
Avatar of choelt

ASKER

OK,

I see where you are coming up with the cumulative sums and calculating the squared variance, but I think my problem is that I don't fully understand how we are going to evaluate 4 candidate sets.

Let me see if I understand.

You stated
"You can check the mutual exclusiveness of the candidates using the AND operator.  
Example:
Bin1 AND Bin2 AND Bin3 AND Bin4 = 0

As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator.  We pass this OR'ed bit mask down to the next level along with the next position (index) in the list."

With the sorted array you're wanting to start with index 0 AND'ing index 1,2,3...N until you find a mutually exclusive set.  This gives you 2 sets of items.  Hypothetically, lets' say now you have  items (0 and 240) which have mutually exclusive sets.  So continue on AND'ing (0,240, 241....N) until we find 3 mutually exclusive items.  (0,240,319)  Continue on AND'ing (0,240,319,320.....N) until we find 4.  If a 4th item is found, store into structure with the 4 items and sum total of squared variance.  Let's say (0,240,319,689).  In our case the sum total of squared variance will be .25 + .25 + .25 + 2.25 = 3.00.
At this point what comes next?  Do we start with index 1 now and repeat the above process?

Above you state:
"As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator.  We pass this OR'ed bit mask down to the next level along with the next position (index) in the list"

You lose me here.  Is this what you're referring too with the proposed FindNext function?

I fully understand the CandidateSums collections and that it represents the Big-In-First combinations of all combinations which sum up to totals ranging from 3337 to 3454.  I understand how to sort these items into an array starting with target sum(s) working out towards upper and lower range of 3454 and 3337 respectively.  I understand how to AND the selections to come up with 4 that are mutually exclusive, but after this is where I don't follow your thought pattern.  Can you possibly explain it in another way so that a 'DUMMY' might understand?
ASKER CERTIFIED SOLUTION
Avatar of aikimark
aikimark
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@choelt

Now you see why some of these NP problem solutions result in such long threads.
Avatar of choelt

ASKER

OK, I'll look over the section of code posted to try and gain a better understanding of the process.  For testing purposes, I did not scope the lngIndexedSums() array (I'll change the name of it to ItemArray() for compatibility) at the module level, but will do so once complete.  It will probably be later in the evening before I have a chance to look at your previous post in detail.

Now I'm well aware of how these threads can reach such lengths.  Going through this is great exercise for the old mind though.
correction...two old minds :-)

Since I haven't gotten around to any substantive coding, there isn't a reason to rename the array.  Just post your code and I'll know what you call the merged array.
Avatar of choelt

ASKER

I'm just about to start tweaking the code I have up to this point.  I should be able to post it later tonight.  I've had a little fire put under me on this.  Apparently, someone else in another division is working on an Access database to try and do something along similar lines.  I guess they are using VBA, but I don't think they have anything completed with an algorithm yet.  I'll call tomorrow to see where they are at, and let them know what is being done on this end.  So at least I shouldn't have to build a GUI for entering the data.  I can just query a database and then post the results back to the database.  However, that also means that I need to wrap this up quicker than previously expected.  As always!

I was just slammed with about 4 inches of rain/small hail in about 15 minutes which ended up flooding my garage.  I'll try and cleanup that first and then get on the coding express.
Avatar of choelt

ASKER

aikimark
Here is the latest code to sort the array.  I have taken into consideration MOD 4 as follows:

Result:
0 - Exact integer target value
Add all target values 1st and then perform merge-sort on +/- 1 target values choosing highest values
1 - .25
Add values from next lowest target, next highest, next lowest, next highest...........
2 - .5
Start by merge-sort on +/- .5 target values choosing highest values
3 - .75
Add values from next highest target, next lowest, next highest, next lowest...........

I have compiled the program and it runs in 4 seconds on my old AMD 1Ghz tower.  Pretty impressive in my opinion considering its calculating over 8191 possible combinations and sorting the results.
Option Explicit
Dim lngIndexedSums() As Long
 
Sub SumIteration()
    Dim CandidateSums(3337 To 3454) As New Collection
    Dim Groups() As String
    
    Dim lngItems(0 To 22) As Long
    Dim lngPower(0 To 22) As Long
    Dim lngLoop As Long
    Dim lngSum As Long
    For lngLoop = 0 To 22
        lngPower(lngLoop) = 2 ^ lngLoop
    Next
    lngItems(0) = 124
    lngItems(1) = 170
    lngItems(2) = 210
    lngItems(3) = 296
    lngItems(4) = 330
    lngItems(5) = 400
    lngItems(6) = 476
    lngItems(7) = 492
    lngItems(8) = 522
    lngItems(9) = 544
    lngItems(10) = 564
    lngItems(11) = 570
    lngItems(12) = 650
    lngItems(13) = 662
    lngItems(14) = 686
    lngItems(15) = 724
    lngItems(16) = 750
    lngItems(17) = 780
    lngItems(18) = 780
    lngItems(19) = 786
    lngItems(20) = 996
    lngItems(21) = 996
    lngItems(22) = 1074
    For lngLoop = 2 ^ 23 - 1 To 1 Step -1
        lngSum = SumItStatic(lngLoop, lngItems(), 3455)
        Select Case lngSum
            Case 3337 To 3454
                CandidateSums(lngSum).Add lngLoop
        End Select
    Next
    
    'Count total candidate sums
    lngSum = 0
    For lngLoop = 3337 To 3454
        lngSum = lngSum + CandidateSums(lngLoop).Count
    Next
    ReDim lngIndexedSums(lngSum)
    
    Dim lngIndexedPosition As Long 'Current position to store next item
    Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
    Dim lngLowerTotal, lngUpperTotal As Long
    Dim lngLowerValue, lngUpperValue As Long
    Dim lngItemSumTotal As Long
    For lngLoop = 0 To UBound(lngItems)
        lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)
    Next
    
    Select Case lngItemSumTotal Mod 4
        Case 0 'exact integer
            lngTarget = lngItemSumTotal / 4
            lngLowerTarget = lngTarget - 1
            lngUpperTarget = lngTarget + 1
        Case 1 '.25
            lngTarget = Round(lngItemSumTotal / 4, 0)
            lngLowerTarget = lngTarget
            lngUpperTarget = lngTarget + 1
        Case 2 '.5
            lngTarget = 0
            lngLowerTarget = Int(lngItemSumTotal / 4)
            lngUpperTarget = lngLowerTarget + 1
        Case 3 '.75
            lngTarget = Round(lngItemSumTotal / 4, 0)
            lngLowerTarget = lngTarget - 1
            lngUpperTarget = lngTarget
        
    End Select
    
    'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)
    'Debug.Print "Target: " & Str(lngTarget)
    'Debug.Print "LowerTarget: " & Str(lngLowerTarget)
    'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
    
    Dim lngLowerMergePosition, lngUpperMergePosition As Long
    lngIndexedPosition = 0
    Do While lngTarget <> -1
        If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then
            'MOD 1 or 3 so add all values from single target
            For lngLoop = 1 To CandidateSums(lngTarget).Count
                lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
                lngIndexedPosition = lngIndexedPosition + 1
            Next
            'Alternate to next target
            If lngTarget = lngLowerTarget Then
                lngLowerTarget = lngTarget - 1
                If lngUpperTarget <= 3454 Then
                    lngTarget = lngUpperTarget
                Else
                    If lngLowerTarget >= 3337 Then
                        lngTarget = lngLowerTarget
                    Else
                        lngTarget = -1
                    End If
                End If
            Else
                lngUpperTarget = lngTarget + 1
                If lngLowerTarget >= 3337 Then
                    lngTarget = lngLowerTarget
                Else
                    If lngUpperTarget <= 3454 Then
                        lngTarget = lngUpperTarget
                    Else
                        lngTarget = -1
                    End If
                End If
            End If
        Else
            'MOD 0 or 2
            If lngTarget > 0 Then 'MOD 0 so add exact target values first
                For lngLoop = 1 To CandidateSums(lngTarget).Count
                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
                    lngIndexedPosition = lngIndexedPosition + 1
                Next
                lngTarget = 0 'Start merging
            End If
            'Merge largest items from lower & upper target groups
            lngLowerMergePosition = 1
            lngUpperMergePosition = 1
            lngLowerTotal = CandidateSums(lngLowerTarget).Count
            lngUpperTotal = CandidateSums(lngUpperTarget).Count
            Do While lngLowerMergePosition <= lngLowerTotal And _
                     lngUpperMergePosition <= lngUpperTotal
                lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)
                lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)
                If lngLowerValue >= lngUpperValue Then
                    lngIndexedSums(lngIndexedPosition) = lngLowerValue
                    lngLowerMergePosition = lngLowerMergePosition + 1
                Else
                    lngIndexedSums(lngIndexedPosition) = lngUpperValue
                    lngUpperMergePosition = lngUpperMergePosition + 1
                End If
                lngIndexedPosition = lngIndexedPosition + 1
            Loop
            If lngLowerMergePosition <= lngLowerTotal Then
                'Add remaining lower target values
                For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count
                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
                    lngIndexedPosition = lngIndexedPosition + 1
                Next
            End If
            If lngUpperMergePosition <= lngUpperTotal Then
                'Add remaining upper target values
                For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count
                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
                    lngIndexedPosition = lngIndexedPosition + 1
                Next
            End If
            'Determine next upper/lower targets
            lngLowerTarget = lngLowerTarget - 1
            lngUpperTarget = lngUpperTarget + 1
            If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then
                If lngLowerTarget < 3337 Then
                    If lngUpperTarget <= 3454 Then
                        'Upper target(s) still exist so add values
                        lngTarget = lngUpperTarget
                    Else
                        'Both outside limits so exit loop
                        lngTarget = -1
                    End If
                Else
                    lngTarget = lngLowerTarget
                End If
            End If
        End If
    Loop 
        
End Sub
 
Function SumItStatic(parmMask As Long, parmItems() As Long, parmSumLimit As Long) As Long
    Static lngPower() As Long
    Static IsInitialized As Boolean
    Dim lngLoop As Long
    Dim lngSum As Long
    Dim lngUbound As Long
    
    lngUbound = UBound(parmItems)
    If IsInitialized Then
    Else
        ReDim lngPower(0 To lngUbound)
        For lngLoop = 0 To lngUbound
            lngPower(lngLoop) = 2 ^ lngLoop
        Next
        IsInitialized = True
    End If
    
    For lngLoop = lngUbound To 0 Step -1
        If (parmMask And lngPower(lngLoop)) <> 0 Then
            lngSum = lngSum + parmItems(lngLoop)
            If lngSum > parmSumLimit Then
                SumItStatic = lngSum
                Exit Function
            End If
        End If
    Next
    SumItStatic = lngSum
End Function

Open in new window

A nice bit of coding under suboptimal conditions.  Sorry to hear about the flooding.  Hope the water didn't contain alligators. :-)

If the other person is using MS-Access, does that mean that the data originates in a database?  If so, revisit my original NP discussion, where both the data and the computing took place in an Access VBA environment.  It should give you a template for sizing and filling an array.  Much more flexible than the literal numeric values in the current version of the code.

Speaking of flexibility...if you have the time, look at ways of removing the hard-coded limits, such as 3454.  Reference limits that you calculate and store in variables or in the LBound() and UBound() values applied to the array.  You will need to do this eventually.  Might as well do it while I'm working on the next bit.

One other thing about your 'competition'...for reasonably few items, you can solve such NP problems in Access using either the database engine or VBA code.  However, the interperative nature of the code soon limits the solvability size because of the time required.  In other words, your VB.Net runtime environment is going to scale much better than VBA with your 40 item max.  In an earlier test, I ran a 35 item enumeration loop in about 5 minutes.

It looks like the ball is in my court now.  Time to roll up my sleeves and get the next section fleshed out.  I should have some time this weekend to work on it.  Please be patient.
Avatar of choelt

ASKER

Yes I realize there are alot of hard numbers in this code.  I will take care of that.  I'm going to discuss integration with the late comer to the party and come up with a plan.  I'm going to be in and out for the next several days so my response time won't be ideal, but I'll try to check-in at least once a day through about Wednesday.  It just depends on internet access availability.

Did you have a chance to look at the code I posted late last night?  I wasn't able to test it as thoroughly as I wanted to due to heavy eyelids.  Let me know if you have any issues with the way it is sorting the sums?

Thanks for all of your help on this.  I look forward to racking my brains trying to figure out your next routines.
I got compile errors when I brought your code into VS2008.

I've tweaked the code down through the
   ReDim lngIndexedSums
statement.

Although the VB.Net code isn't as simple as classic VB code, the availability of the 64-bit LONG integer data type allows enumerations beyond 31 items.

Two thoughts I had this weekend:
* we might be able to use the largest item as the upper limit if it exceeds our target value.  Not sure of this assumption, so I didn't include it in the snippet.
* we might tighten the range from: target +/-SQRT(target) to something that looks at the smallest values in the list.  For instance, if I take the largest item of the subset that will not exceed the target and use its SQRT(), I get a much smaller range = 22.85.  I also played with the difference between the largest of the small item subset and the smallest item SQRT = 19.95.  While we do not have a lot of candidate sums with our 23-item sample set, trimming the range will result in faster performance of the SumItStatic() function -- fewer iterations as well as memory footprint reductions for the largest sets.


Option Explicit On
 
Module Module1
    Public lngIndexedSums() As Long
 
    Sub SumIteration()
        Dim Groups() As String
 
        Dim lngItems(0 To 22) As Long
        Dim lngPower(0 To 22) As Long
        Dim lngEnumerate As Long
        Dim lngLoop As Integer
        Dim lngSum As Integer
        Dim lngCandidateCount As Integer
 
        For lngLoop = 0 To 22
            lngPower(lngLoop) = 2 ^ lngLoop
        Next
        lngItems(0) = 124
        lngItems(1) = 170
        lngItems(2) = 210
        lngItems(3) = 296
        lngItems(4) = 330
        lngItems(5) = 400
        lngItems(6) = 476
        lngItems(7) = 492
        lngItems(8) = 522
        lngItems(9) = 544
        lngItems(10) = 564
        lngItems(11) = 570
        lngItems(12) = 650
        lngItems(13) = 662
        lngItems(14) = 686
        lngItems(15) = 724
        lngItems(16) = 750
        lngItems(17) = 780
        lngItems(18) = 780
        lngItems(19) = 786
        lngItems(20) = 996
        lngItems(21) = 996
        lngItems(22) = 1074
        'calculate lower and upper bounds
        '  * calculate target = sum(lngItems)/4
        '  * if the largest item is > target, it should probably be the upper bound
        '  * otherwise, use the range as target +/- SQRT(target)
        '
        'initialize the sizes and lb arrays
        Dim sizes() As Integer '= {118}
        Dim lb() As Integer '= {3337}
        Dim Target As Single, LowerBound As Integer, UpperBound As Integer
        Target = lngItems.Sum / 4
        LowerBound = Target - Math.Sqrt(Target)
        UpperBound = Target + Math.Sqrt(Target)
        ReDim lb(0)
        ReDim sizes(0)
        lb(0) = LowerBound
        sizes(0) = UpperBound - LowerBound + 1
        Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)
        For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
            CandidateSums(lngLoop) = New Collection
        Next
 
        For lngEnumerate = (2 ^ 23) - 1 To 1 Step -1
            lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)
            Select Case lngSum
                Case LowerBound To UpperBound
                    CandidateSums(lngSum).Add(lngLoop)
                    lngCandidateCount = lngCandidateCount + 1
            End Select
        Next
        'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
        '    Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)
        'Next
 
        'Count total candidate sums
        'lngSum = 0
        'For lngLoop = LowerBound To UpperBound
        '    lngSum = lngSum + CandidateSums(lngLoop).Count
        'Next
 
        'NOTE: the following array is one larger than requred
        ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
 
        Dim lngIndexedPosition As Long 'Current position to store next item
        Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
        Dim lngLowerTotal, lngUpperTotal As Long
        Dim lngLowerValue, lngUpperValue As Long
        Dim lngItemSumTotal As Long
        For lngLoop = 0 To UBound(lngItems)
            lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)
        Next
 
        Select Case lngItemSumTotal Mod 4
            Case 0 'exact integer
                lngTarget = lngItemSumTotal / 4
                lngLowerTarget = lngTarget - 1
                lngUpperTarget = lngTarget + 1
            Case 1 '.25
                lngTarget = Math.Round(lngItemSumTotal / 4, 0)
                lngLowerTarget = lngTarget
                lngUpperTarget = lngTarget + 1
            Case 2 '.5
                lngTarget = 0
                lngLowerTarget = Int(lngItemSumTotal / 4)
                lngUpperTarget = lngLowerTarget + 1
            Case 3 '.75
                lngTarget = Math.Round(lngItemSumTotal / 4, 0)
                lngLowerTarget = lngTarget - 1
                lngUpperTarget = lngTarget
 
        End Select
 
        'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)
        'Debug.Print "Target: " & Str(lngTarget)
        'Debug.Print "LowerTarget: " & Str(lngLowerTarget)
        'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
 
        Dim lngLowerMergePosition, lngUpperMergePosition As Long
        lngIndexedPosition = 0
        Do While lngTarget <> -1
            If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then
                'MOD 1 or 3 so add all values from single target
                For lngLoop = 1 To CandidateSums(lngTarget).Count
                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
                    lngIndexedPosition = lngIndexedPosition + 1
                Next
                'Alternate to next target
                If lngTarget = lngLowerTarget Then
                    lngLowerTarget = lngTarget - 1
                    If lngUpperTarget <= 3454 Then
                        lngTarget = lngUpperTarget
                    Else
                        If lngLowerTarget >= 3337 Then
                            lngTarget = lngLowerTarget
                        Else
                            lngTarget = -1
                        End If
                    End If
                Else
                    lngUpperTarget = lngTarget + 1
                    If lngLowerTarget >= 3337 Then
                        lngTarget = lngLowerTarget
                    Else
                        If lngUpperTarget <= 3454 Then
                            lngTarget = lngUpperTarget
                        Else
                            lngTarget = -1
                        End If
                    End If
                End If
            Else
                'MOD 0 or 2
                If lngTarget > 0 Then 'MOD 0 so add exact target values first
                    For lngLoop = 1 To CandidateSums(lngTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                    lngTarget = 0 'Start merging
                End If
                'Merge largest items from lower & upper target groups
                lngLowerMergePosition = 1
                lngUpperMergePosition = 1
                lngLowerTotal = CandidateSums(lngLowerTarget).Count
                lngUpperTotal = CandidateSums(lngUpperTarget).Count
                Do While lngLowerMergePosition <= lngLowerTotal And _
                         lngUpperMergePosition <= lngUpperTotal
                    lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)
                    lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)
                    If lngLowerValue >= lngUpperValue Then
                        lngIndexedSums(lngIndexedPosition) = lngLowerValue
                        lngLowerMergePosition = lngLowerMergePosition + 1
                    Else
                        lngIndexedSums(lngIndexedPosition) = lngUpperValue
                        lngUpperMergePosition = lngUpperMergePosition + 1
                    End If
                    lngIndexedPosition = lngIndexedPosition + 1
                Loop
                If lngLowerMergePosition <= lngLowerTotal Then
                    'Add remaining lower target values
                    For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                End If
                If lngUpperMergePosition <= lngUpperTotal Then
                    'Add remaining upper target values
                    For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                End If
                'Determine next upper/lower targets
                lngLowerTarget = lngLowerTarget - 1
                lngUpperTarget = lngUpperTarget + 1
                If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then
                    If lngLowerTarget < 3337 Then
                        If lngUpperTarget <= 3454 Then
                            'Upper target(s) still exist so add values
                            lngTarget = lngUpperTarget
                        Else
                            'Both outside limits so exit loop
                            lngTarget = -1
                        End If
                    Else
                        lngTarget = lngLowerTarget
                    End If
                End If
            End If
        Loop
 
    End Sub
 
    Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long
        Static lngPower() As Long
        Static IsInitialized As Boolean
        Dim lngLoop As Long
        Dim lngSum As Long
        Dim lngUbound As Long
 
        lngUbound = UBound(parmItems)
        If IsInitialized Then
        Else
            ReDim lngPower(0 To lngUbound)
            For lngLoop = 0 To lngUbound
                lngPower(lngLoop) = 2 ^ lngLoop
            Next
            IsInitialized = True
        End If
 
        For lngLoop = lngUbound To 0 Step -1
            If (parmMask And lngPower(lngLoop)) <> 0 Then
                lngSum = lngSum + parmItems(lngLoop)
                If lngSum > parmSumLimit Then
                    SumItStatic = lngSum
                    Exit Function
                End If
            End If
        Next
        SumItStatic = lngSum
    End Function
End Module

Open in new window

Avatar of choelt

ASKER

I coded that in VB6 because that was what you were using originally.  I have loaded it into VB.NET Visual Studio 2005 and have one problem with your changes.

Line 51:  Target = lngItems.Sum / 4

Is this supported in VS 2008?  The compiler in VS 2005 shows no such method supported.  Anyways, I went ahead and just looped through the items again to calculate the sum as before and it seems to work now.  I compiled it as a console app and renamed the sumiteration sub to Main also.  It flies through the initial 23 items, but when I added 8 more items onto the end for a total of 31 it gives me an error (index out of bounds???) during the sorting phase.  This is after it calculates around 2.5 Million possible candidate sums.

I did not consider data sets where we single values larger than the target value.  Would they not have to stand on their own?  I guess it depends on the spread of data.

I'm off for now, but later this evening I will look back into this error I'm receiving.  If I switch it back to the original 23 items it runs fine.
Option Explicit On
 
Module Module1
    Public lngIndexedSums() As Long
 
    Sub Main()
        Dim Groups() As String
 
        Dim lngItems(0 To 30) As Long
        Dim lngPower(0 To 30) As Long
        Dim lngEnumerate As Long
        Dim lngLoop As Integer
        Dim lngSum As Integer
        Dim lngCandidateCount As Integer
 
        For lngLoop = 0 To 30
            lngPower(lngLoop) = 2 ^ lngLoop
        Next
        lngItems(0) = 124
        lngItems(1) = 170
        lngItems(2) = 210
        lngItems(3) = 296
        lngItems(4) = 330
        lngItems(5) = 400
        lngItems(6) = 476
        lngItems(7) = 492
        lngItems(8) = 522
        lngItems(9) = 544
        lngItems(10) = 564
        lngItems(11) = 570
        lngItems(12) = 650
        lngItems(13) = 662
        lngItems(14) = 686
        lngItems(15) = 724
        lngItems(16) = 750
        lngItems(17) = 780
        lngItems(18) = 780
        lngItems(19) = 786
        lngItems(20) = 996
        lngItems(21) = 996
        lngItems(22) = 1074
        lngItems(23) = 1079
        lngItems(24) = 1083
        lngItems(25) = 1239
        lngItems(26) = 1287
        lngItems(27) = 1391
        lngItems(28) = 1396
        lngItems(29) = 1412
        lngItems(30) = 1457
 
        For i As Integer = 0 To lngItems.GetUpperBound(0)
            lngSum += lngItems(i)
        Next
        'calculate lower and upper bounds
        '  * calculate target = sum(lngItems)/4
        '  * if the largest item is > target, it should probably be the upper bound
        '  * otherwise, use the range as target +/- SQRT(target)
        '
        'initialize the sizes and lb arrays
        Dim sizes() As Integer '= {118}
        Dim lb() As Integer '= {3337}
        Dim Target As Single, LowerBound As Integer, UpperBound As Integer
        'Target = lngItems.lngItems.lngItems.Sum / 4
        Target = lngSum / 4
        LowerBound = Target - Math.Sqrt(Target)
        UpperBound = Target + Math.Sqrt(Target)
        Console.WriteLine("Target: " + Target.ToString)
        Console.WriteLine("LowerBound: " + LowerBound.ToString)
        Console.WriteLine("UpperBound: " + UpperBound.ToString)
        Console.WriteLine()
 
        ReDim lb(0)
        ReDim sizes(0)
        lb(0) = LowerBound
        sizes(0) = UpperBound - LowerBound + 1
        Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)
        For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
            CandidateSums(lngLoop) = New Collection
        Next
 
        For lngEnumerate = (2 ^ 31) - 1 To 1 Step -1
            lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)
            Select Case lngSum
                Case LowerBound To UpperBound
                    CandidateSums(lngSum).Add(lngLoop)
                    lngCandidateCount = lngCandidateCount + 1
                    Console.CursorLeft = 0
                    Console.CursorTop = 4
                    Console.Write("Total Candidates: " + lngCandidateCount.ToString)
            End Select
        Next
        'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
        '    Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)
        'Next
 
        'Count total candidate sums
        'lngSum = 0
        'For lngLoop = LowerBound To UpperBound
        '    lngSum = lngSum + CandidateSums(lngLoop).Count
        'Next
 
        'NOTE: the following array is one larger than requred
        ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
 
        Dim lngIndexedPosition As Long 'Current position to store next item
        Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
        Dim lngLowerTotal, lngUpperTotal As Long
        Dim lngLowerValue, lngUpperValue As Long
        Dim lngItemSumTotal As Long
        For lngLoop = 0 To UBound(lngItems)
            lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)
        Next
 
        Select Case lngItemSumTotal Mod 4
            Case 0 'exact integer
                lngTarget = lngItemSumTotal / 4
                lngLowerTarget = lngTarget - 1
                lngUpperTarget = lngTarget + 1
            Case 1 '.25
                lngTarget = Math.Round(lngItemSumTotal / 4, 0)
                lngLowerTarget = lngTarget
                lngUpperTarget = lngTarget + 1
            Case 2 '.5
                lngTarget = 0
                lngLowerTarget = Int(lngItemSumTotal / 4)
                lngUpperTarget = lngLowerTarget + 1
            Case 3 '.75
                lngTarget = Math.Round(lngItemSumTotal / 4, 0)
                lngLowerTarget = lngTarget - 1
                lngUpperTarget = lngTarget
 
        End Select
 
        'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)
        'Debug.Print "Target: " & Str(lngTarget)
        'Debug.Print "LowerTarget: " & Str(lngLowerTarget)
        'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
 
        Dim lngLowerMergePosition, lngUpperMergePosition As Long
        lngIndexedPosition = 0
        Do While lngTarget <> -1
            If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then
                'MOD 1 or 3 so add all values from single target
                For lngLoop = 1 To CandidateSums(lngTarget).Count
                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
                    lngIndexedPosition = lngIndexedPosition + 1
                Next
                'Alternate to next target
                If lngTarget = lngLowerTarget Then
                    lngLowerTarget = lngTarget - 1
                    If lngUpperTarget <= 3454 Then
                        lngTarget = lngUpperTarget
                    Else
                        If lngLowerTarget >= 3337 Then
                            lngTarget = lngLowerTarget
                        Else
                            lngTarget = -1
                        End If
                    End If
                Else
                    lngUpperTarget = lngTarget + 1
                    If lngLowerTarget >= 3337 Then
                        lngTarget = lngLowerTarget
                    Else
                        If lngUpperTarget <= 3454 Then
                            lngTarget = lngUpperTarget
                        Else
                            lngTarget = -1
                        End If
                    End If
                End If
            Else
                'MOD 0 or 2
                If lngTarget > 0 Then 'MOD 0 so add exact target values first
                    For lngLoop = 1 To CandidateSums(lngTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                    lngTarget = 0 'Start merging
                End If
                'Merge largest items from lower & upper target groups
                lngLowerMergePosition = 1
                lngUpperMergePosition = 1
                lngLowerTotal = CandidateSums(lngLowerTarget).Count
                lngUpperTotal = CandidateSums(lngUpperTarget).Count
                Do While lngLowerMergePosition <= lngLowerTotal And _
                         lngUpperMergePosition <= lngUpperTotal
                    lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)
                    lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)
                    If lngLowerValue >= lngUpperValue Then
                        lngIndexedSums(lngIndexedPosition) = lngLowerValue
                        lngLowerMergePosition = lngLowerMergePosition + 1
                    Else
                        lngIndexedSums(lngIndexedPosition) = lngUpperValue
                        lngUpperMergePosition = lngUpperMergePosition + 1
                    End If
                    lngIndexedPosition = lngIndexedPosition + 1
                Loop
                If lngLowerMergePosition <= lngLowerTotal Then
                    'Add remaining lower target values
                    For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                End If
                If lngUpperMergePosition <= lngUpperTotal Then
                    'Add remaining upper target values
                    For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                End If
                'Determine next upper/lower targets
                lngLowerTarget = lngLowerTarget - 1
                lngUpperTarget = lngUpperTarget + 1
                If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then
                    If lngLowerTarget < 3337 Then
                        If lngUpperTarget <= 3454 Then
                            'Upper target(s) still exist so add values
                            lngTarget = lngUpperTarget
                        Else
                            'Both outside limits so exit loop
                            lngTarget = -1
                        End If
                    Else
                        lngTarget = lngLowerTarget
                    End If
                End If
            End If
            Console.CursorLeft = 0
            Console.CursorTop = 5
            Console.Write("Total Indexed Sums: " + lngIndexedPosition.ToString)
        Loop
        Console.Read()
    End Sub
 
    Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long
        Static lngPower() As Long
        Static IsInitialized As Boolean
        Dim lngLoop As Long
        Dim lngSum As Long
        Dim lngUbound As Long
 
        lngUbound = UBound(parmItems)
        If IsInitialized Then
        Else
            ReDim lngPower(0 To lngUbound)
            For lngLoop = 0 To lngUbound
                lngPower(lngLoop) = 2 ^ lngLoop
            Next
            IsInitialized = True
        End If
 
        For lngLoop = lngUbound To 0 Step -1
            If (parmMask And lngPower(lngLoop)) <> 0 Then
                lngSum = lngSum + parmItems(lngLoop)
                If lngSum > parmSumLimit Then
                    SumItStatic = lngSum
                    Exit Function
                End If
            End If
        Next
        SumItStatic = lngSum
    End Function
End Module

Open in new window

I think the following (snippet) is the cause of the problem.  You need to set lngTarget to a non-zero value.

Since I'm not running this as a console application, is it possible for us to compromise on your reliance on the Console for your diagnostic messages?

====================
One of the things in my code is the use of a Single data type for the Target value.  I'm currently looking at your code that fills the array and the variables used in that section of code.


            Case 2 '.5
                lngTarget = 0

Open in new window

Avatar of choelt

ASKER

The way I'm using that variable, the lngTarget value should be set to 0.  This is a flag to denote there is no actual integer target so start merging values from lower and upper range.  Like I said the code works fine using the original 23 item set.

And yes, running as a console app sacrifices the debug window.  So I guess in that case we need to just run it as a standard app.
Then I'll look at the hard-coded values in your code as the most likely culprit.
With the 31 items you posted, the range should be: 5904 To 6059



Avatar of choelt

ASKER

Yes it was the hard coded values.  I don't know why I couldn't find that earlier.  I have changed the rest of the hard coded values to dynamic values based on original number of items in data.  The only hard coded value is the lngItems Array bounds.  I posted the latest code along with a change to your code I think is required in the main candidate sum calculation iteration.  Correct me if I'm wrong.
For lngEnumerate = (2 ^ 31) - 1 To 1 Step -1
       lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)
       Select Case lngSum
           Case LowerBound To UpperBound
               CandidateSums(lngSum).Add(lngLoop)
               **************CHANGED TO****************
               CandidateSums(lngsum).Add(lngEnumerate)
               ***************************************
               lngCandidateCount = lngCandidateCount + 1
     End Select
Next
 
***************************************************
 
 
Latest Modifications:
 
Option Explicit On
 
Module Module1
    Public lngIndexedSums() As Long
 
    Sub Main()
        Dim Groups() As String
 
        Dim lngItems(0 To 30) As Long
        Dim lngPower(0 To lngItems.GetUpperBound(0)) As Long
        Dim lngEnumerate As Long
        Dim lngLoop As Integer
        Dim lngSum As Integer
        Dim lngCandidateCount As Integer
 
        For lngLoop = 0 To lngItems.GetUpperBound(0)
            lngPower(lngLoop) = 2 ^ lngLoop
        Next
        lngItems(0) = 124
        lngItems(1) = 170
        lngItems(2) = 210
        lngItems(3) = 296
        lngItems(4) = 330
        lngItems(5) = 400
        lngItems(6) = 476
        lngItems(7) = 492
        lngItems(8) = 522
        lngItems(9) = 544
        lngItems(10) = 564
        lngItems(11) = 570
        lngItems(12) = 650
        lngItems(13) = 662
        lngItems(14) = 686
        lngItems(15) = 724
        lngItems(16) = 750
        lngItems(17) = 780
        lngItems(18) = 780
        lngItems(19) = 786
        lngItems(20) = 996
        lngItems(21) = 996
        lngItems(22) = 1074
        lngItems(23) = 1079
        lngItems(24) = 1083
        lngItems(25) = 1239
        lngItems(26) = 1287
        lngItems(27) = 1391
        lngItems(28) = 1396
        lngItems(29) = 1412
        lngItems(30) = 2677
 
        For i As Integer = 0 To lngItems.GetUpperBound(0)
            lngSum += lngItems(i)
        Next
        'calculate lower and upper bounds
        '  * calculate target = sum(lngItems)/4
        '  * if the largest item is > target, it should probably be the upper bound
        '  * otherwise, use the range as target +/- SQRT(target)
        '
        'initialize the sizes and lb arrays
        Dim sizes() As Integer '= {118}
        Dim lb() As Integer '= {3337}
        Dim Target As Single, LowerBound As Integer, UpperBound As Integer
        'Target = lngItems.lngItems.lngItems.Sum / 4
        Target = lngSum / 4
        LowerBound = Target - Math.Sqrt(Target)
        UpperBound = Target + Math.Sqrt(Target)
        Console.WriteLine("Target: " + Target.ToString)
        Console.WriteLine("LowerBound: " + LowerBound.ToString)
        Console.WriteLine("UpperBound: " + UpperBound.ToString)
        Console.WriteLine()
 
        ReDim lb(0)
        ReDim sizes(0)
        lb(0) = LowerBound
        sizes(0) = UpperBound - LowerBound + 1
        Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)
        Console.WriteLine("CandidateSums LowerBound(0): " + CandidateSums.GetLowerBound(0).ToString)
        Console.WriteLine("CandidateSums UpperBound(0): " + CandidateSums.GetUpperBound(0).ToString)
        For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
            CandidateSums(lngLoop) = New Collection
        Next
 
        For lngEnumerate = ((2 ^ (lngItems.GetUpperBound(0) + 1))) - 1 To 1 Step -1
            lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)
            Select Case lngSum
                Case LowerBound To UpperBound
                    CandidateSums(lngSum).Add(lngEnumerate)
                    lngCandidateCount = lngCandidateCount + 1
                    Console.CursorLeft = 0
                    Console.CursorTop = 7
                    Console.Write("Total Candidates: " + lngCandidateCount.ToString)
            End Select
        Next
        'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)
        '    Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)
        'Next
 
        'Count total candidate sums
        'lngSum = 0
        'For lngLoop = LowerBound To UpperBound
        '    lngSum = lngSum + CandidateSums(lngLoop).Count
        'Next
 
        'NOTE: the following array is one larger than requred
        ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
 
        Dim lngIndexedPosition As Long 'Current position to store next item
        Dim lngTarget, lngLowerTarget, lngUpperTarget As Long
        Dim lngLowerTotal, lngUpperTotal As Long
        Dim lngLowerValue, lngUpperValue As Long
        Dim lngItemSumTotal As Long
        For lngLoop = 0 To UBound(lngItems)
            lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)
        Next
 
        Select Case lngItemSumTotal Mod 4
            Case 0 'exact integer
                lngTarget = lngItemSumTotal / 4
                lngLowerTarget = lngTarget - 1
                lngUpperTarget = lngTarget + 1
            Case 1 '.25
                lngTarget = Math.Round(lngItemSumTotal / 4, 0)
                lngLowerTarget = lngTarget
                lngUpperTarget = lngTarget + 1
            Case 2 '.5
                lngTarget = 0
                lngLowerTarget = Int(lngItemSumTotal / 4)
                lngUpperTarget = lngLowerTarget + 1
            Case 3 '.75
                lngTarget = Math.Round(lngItemSumTotal / 4, 0)
                lngLowerTarget = lngTarget - 1
                lngUpperTarget = lngTarget
 
        End Select
 
        'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)
        'Debug.Print "Target: " & Str(lngTarget)
        'Debug.Print "LowerTarget: " & Str(lngLowerTarget)
        'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
 
        Dim lngLowerMergePosition, lngUpperMergePosition As Long
        lngIndexedPosition = 0
        Do While lngTarget <> -1
            If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then
                'MOD 1 or 3 so add all values from single target
                Console.CursorLeft = 0
                Console.CursorTop = 9
                'Console.Write("lngLoop: " + lngLoop.ToString)
                Console.Write("CandidateSums(" + lngTarget.ToString + ") Count: " + CandidateSums(lngTarget).count.ToString)
                For lngLoop = 1 To CandidateSums(lngTarget).Count
                    'Console.CursorLeft = 0
                    'Console.CursorTop = 6
                    'Console.Write("lngLoop: " + lngLoop.ToString)
                    'Console.Write("CandidateSums(" + lngTarget.ToString + ") Count: " + CandidateSums(lngTarget).count.ToString)
                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
                    lngIndexedPosition = lngIndexedPosition + 1
                Next
                'Alternate to next target
                If lngTarget = lngLowerTarget Then
                    lngLowerTarget = lngTarget - 1
                    If lngUpperTarget <= UpperBound Then
                        lngTarget = lngUpperTarget
                    Else
                        If lngLowerTarget >= LowerBound Then
                            lngTarget = lngLowerTarget
                        Else
                            lngTarget = -1
                        End If
                    End If
                Else
                    lngUpperTarget = lngTarget + 1
                    If lngLowerTarget >= LowerBound Then
                        lngTarget = lngLowerTarget
                    Else
                        If lngUpperTarget <= UpperBound Then
                            lngTarget = lngUpperTarget
                        Else
                            lngTarget = -1
                        End If
                    End If
                End If
            Else
                'MOD 0 or 2
                If lngTarget > 0 Then 'MOD 0 so add exact target values first
                    For lngLoop = 1 To CandidateSums(lngTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                    lngTarget = 0 'Start merging
                End If
                'Merge largest items from lower & upper target groups
                lngLowerMergePosition = 1
                lngUpperMergePosition = 1
                lngLowerTotal = CandidateSums(lngLowerTarget).Count
                lngUpperTotal = CandidateSums(lngUpperTarget).Count
                Do While lngLowerMergePosition <= lngLowerTotal And _
                         lngUpperMergePosition <= lngUpperTotal
                    lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)
                    lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)
                    If lngLowerValue >= lngUpperValue Then
                        lngIndexedSums(lngIndexedPosition) = lngLowerValue
                        lngLowerMergePosition = lngLowerMergePosition + 1
                    Else
                        lngIndexedSums(lngIndexedPosition) = lngUpperValue
                        lngUpperMergePosition = lngUpperMergePosition + 1
                    End If
                    lngIndexedPosition = lngIndexedPosition + 1
                Loop
                If lngLowerMergePosition <= lngLowerTotal Then
                    'Add remaining lower target values
                    For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                End If
                If lngUpperMergePosition <= lngUpperTotal Then
                    'Add remaining upper target values
                    For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count
                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)
                        lngIndexedPosition = lngIndexedPosition + 1
                    Next
                End If
                'Determine next upper/lower targets
                lngLowerTarget = lngLowerTarget - 1
                lngUpperTarget = lngUpperTarget + 1
                If lngLowerTarget < LowerBound Or lngUpperTarget > UpperBound Then
                    If lngLowerTarget < LowerBound Then
                        If lngUpperTarget <= UpperBound Then
                            'Upper target(s) still exist so add values
                            lngTarget = lngUpperTarget
                        Else
                            'Both outside limits so exit loop
                            lngTarget = -1
                        End If
                    Else
                        lngTarget = lngLowerTarget
                    End If
                End If
            End If
            Console.CursorLeft = 0
            Console.CursorTop = 8
            Console.Write("Total Indexed Sums: " + lngIndexedPosition.ToString)
        Loop
        Console.Read()
    End Sub
 
    Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long
        Static lngPower() As Long
        Static IsInitialized As Boolean
        Dim lngLoop As Long
        Dim lngSum As Long
        Dim lngUbound As Long
 
        lngUbound = UBound(parmItems)
        If IsInitialized Then
        Else
            ReDim lngPower(0 To lngUbound)
            For lngLoop = 0 To lngUbound
                lngPower(lngLoop) = 2 ^ lngLoop
            Next
            IsInitialized = True
        End If
 
        For lngLoop = lngUbound To 0 Step -1
            If (parmMask And lngPower(lngLoop)) <> 0 Then
                lngSum = lngSum + parmItems(lngLoop)
                If lngSum > parmSumLimit Then
                    SumItStatic = lngSum
                    Exit Function
                End If
            End If
        Next
        SumItStatic = lngSum
    End Function
End Module

Open in new window

Avatar of choelt

ASKER

aikimark

Don't mean to rush, but how is that next section coming?

About your comments:
***************************************************
Two thoughts I had this weekend:
* we might be able to use the largest item as the upper limit if it exceeds our target value.  Not sure of this assumption, so I didn't include it in the snippet.
* we might tighten the range from: target +/-SQRT(target) to something that looks at the smallest values in the list.  For instance, if I take the largest item of the subset that will not exceed the target and use its SQRT(), I get a much smaller range = 22.85.  I also played with the difference between the largest of the small item subset and the smallest item SQRT = 19.95.  While we do not have a lot of candidate sums with our 23-item sample set, trimming the range will result in faster performance of the SumItStatic() function -- fewer iterations as well as memory footprint reductions for the largest sets.
**************************************************

In my head, I had questioned how to handle the large items (items larger than the target value).  If the item is much larger than the target value, then it will have to stand alone.  Hypothetically, let's say we have the following data set:

4
5
8
10
11
11
12
12
13
42

SUM = 128, MEAN = 12.8, TARGET = 128/4 = 32  SQRT TARGET = 5.66
UPPERLIMIT = 38  LOWERLIMIT = 26

In this case 42 is not going to be included in the CandidateSums collection.  If we make the UPPERLIMIT = 42, then this will increase the total number of iterations required to find the best grouping considerably - right?  In this case should we throw out 42 and make it a stand alone group.  So now we need to determine the closest 3 group matching totals with the remaining 9 data items as follows:

SUM = 86, MEAN = 9.56, TARGET = 86/3 = 28.67, SQRT TARGET = 5.35
UPPERLIMIT = 34   LOWERLIMIT = 23

In the case of 2 (although highly unlikely) very large items, then we would repeat the above process of making it a stand alone group.

So after calculating the UPPERLIMIT, any individual value greater than that should be thrown out and stand alone.  New limits will then be calculated until all individual values are within the UPPERLIMIT.  Would this work?
I'm cutting in your corrective code with what I'd been doing.

In answer to your question, the code I'll post calculates the Sum/4 target value and then calculates the range as +/- SQRT(largest of the small items)


* the small items are summed until their sum is greater than the target.  Then the immediately prior item is used as the largest of the small items.

The upperbound value of the range is compared to the largest item in the list.  If the largest item is greater than the calculated upperbound, the upperbound is reset to the largest item.

In the example set you posted:
Sum(items) = 128
target = 32
sum(4,5,8,10,11) = 38
sum(4,5,8,10) = 27
SQRT(10) = 3.16
CalculatedRange= [28.83772234 To 35.16227766]

Since 42 > 35.16227766, the range is adjusted to [28.83772234 To 42]

I calculated a (potential) best fit of this sample set by hand as:


(4,12,13) => 29
(5,11,12) => 28
(8,10,11) => 29
(42)      => 42

Open in new window

Avatar of choelt

ASKER

aikimark:

***********************
In answer to your question, the code I'll post calculates the Sum/4 target value and then calculates the range as +/- SQRT(largest of the small items)
********************************

That will tighten the range considerably now that I understand what  you meant by (largest of the small items).  Coincidentally, that value seems to be close to SQRT(standard deviation) for several different data sets I evaluated.

I've been having some system problems and am in the process of switching systems.  Seems like one step forward and two steps back.  :-)

I know what I want to do, but I have to code it and test it.  I appreciate your patience.

What has happened with the other person looking at this problem in your organization?
Avatar of choelt

ASKER

aikimark:

Sorry to hear about the problems.  I know that can be very frustrating.

I'm a fairly patient guy, but unfortunately the persons who will be using this are becoming a little anxious for it.  This is a relatively new process and manually dividing the data sets into 4 equal groups is proving to be quite tedious.

The other party involved here is strictly working on the pre-phase of the process which involves collecting of data and sample(s) eliminations, etc.  This is being done using Access.  They would like for me to be able to interface with this database and retrieve the final data sets as they come available.  I'm very comfortable working with Access databases in VB so this should not be a problem.  I just need your continued expertise with the algorithm code.  Let me know if there is anything I can do at this point to help.
Avatar of choelt

ASKER

aikimark:

Again - I don't mean to push, but how is it coming?  Are you going to be able to code the next section?

If it doesn't look like your going to be able to, then could you give me all of the details on your plan for iterating through the sorted array to find the best grouping.  I'm going to have some downtime towards the end of this week to spend on this project.  I could at least be attempting to code it which would help me to better understand the process.

Thanks
Avatar of choelt

ASKER

aikimark:

I'm receiving emails informing me that I have open questions and should be actively posting.  Do you have things with your system back to normal yet?  I'm spending my time in a hospital waiting room so it can be worse. :)  I've come up with a plan for determining the best grouping and will be working on some code over the next couple of days myself, but I'm going to have to relearn some things from VB6 to VB.NET as I go.  I'm just going to try to get something to work like I want it and worry about speed optimization later.
I have the metacode, but haven't typed it into VS.Net
Avatar of choelt

ASKER

aikimark:
Thanks for your help getting me on the right track with this algorithm.  I haven't finished it yet, but I do have something that works about half of the time.  I know what needs to change.  I'm currently working on using a more object oriented approach with generic collections of candidates.  This is starting to look promising, but I have many more hours ahead for testing and tweaking.

Anyways, I wanted to award you the 500 points for your help thus far before the question was closed by a moderator.  I realize your busy and I really appreciate you taking the time to steer me in the right direction.  It looks like I'll be able to finish it from here.  Thanks again!
thank you for the points.

would it be helpful, hurtful, or a waste of time if I posted my metacode?
Avatar of choelt

ASKER

I have changed up the code so much at this point that I probably wouldn't use any of it, but it still may be helpful to see what you had in mind.  All-in-all I have a program that works very well with data sets smaller than 35.  I have also made huge strides in speed optimization.  I'm receiving out of memory exceptions with data sets upwards of 35-40 items.  I'm working on ways to reduce the memory footprint.