Solved

Algorithm to sort numbers into equal groups

Posted on 2009-04-08
49
987 Views
Last Modified: 2012-05-07
I have a process where we are manually trying to sort containers of varying weights into 4 bins of nearest equal total weights.  Here is a sample of typical data with 13 items:

1.68
1.76
2.34
3.88
4.04
5.12
6.10
7.00
7.02
7.24
7.68
8.20
8.62

Now I need to determine which items to group into bins.  I need 4 bins with the closest matching total weights possible.  So in this case the total weight is 70.68.  So I need 4 bins containing as close to 17.67/ea. possible.  There will always be 4 bins no matter how many items.  The total items varies usually between as few as 6 to upwards of 40.  Also let me point out that it doesn't matter how many items are in each bin.  One bin could have 5 items while another has only 2.  We're just trying to match total weights as close as possible.

I'm a VB developer, but can understand most all basic logic as long as it is in somewhat readable form. (why/how explanations help me considerably)  I have been working on some basic sorting using arrays, but I'm not able to come up with anything that consistently calculates a best case scenario for different data sets.  I need help with an algorithm that will work much more consistently than the manual system we are using now which is for the most part trial and error.
0
Comment
Question by:choelt
  • 24
  • 24
49 Comments
 
LVL 84

Expert Comment

by:ozo
Comment Utility
In general, the problem is NP-Complete
But you can solve it in pseudo-polynomial time with a Dynamic Programing algorithm
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@choelt,

I think you can solve this with a knapsack algorithm, optimized with a greedy (Big-In-First) start.

You might use the variance of these four-bin combinations to pick your winner. =sum((avg-binsum)^2) or a simpler =sum(abs(avg-binsum))

ozo is correct in that this problem is NP difficult.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
choelt

Given the 40 item limit, there is another approach that I've used in a different NP problem...simple iteration with masking.

Since you must look at all combinations of items, you must do a kind of binary representation of the inclusion/exclusion of an item based on the binary representation of a loop iteration variable.

There is some trimming of this iteration approach.  Since your combination must approach 1/4 of the sum of all the items, you can reasonably exclude the smallest items from your main iteration loop.  In the example you posted, the bottom 5 (of 13) items are never going to be used as one of 'big in first' combinations.  These are always going to be filler items.  You will iterate over these small items, but for a different purpose.  This will usually reduce our problem size to something that we can represent with a Long integer for the 'big item' loop.

A second performance boost will be the representation of these values as integers, rather than floating point values.  Multiply your values by 100.

You might do a space/time trade-off and represent your best-fit (4 bin) combinations with 1-4 arrays.

==============================
EE reference material:

I solved a similar problem in the interpreted VBA (MS-Access) environment:
http://www.experts-exchange.com/Q_20908880.html
There were different constraints than this problem, but it is illustrative of a complete enumeration approach optimization.  In your case, we aren't looking for exact sums but rather the item-sums surrounding the 1/4 sum.  I might suggest that the equivalent to my byte array needs to be
From (AllItemSum/4 - SmallestItemsSum/2) To (AllItemSum/4 + SmallestItemsSum/2)
The SmallestItemsSum is the sum of the smallest items less than will not be used in the big items iteration.  I had originally thought that I might be able to use some form of standard deviation for this array dimension, but couldn't come up with a simple way.

I hope that you are working in a compiled environment and won't have to externalize your iteration/enumeration looping.
_______________________________________________________________
Some time later, I participated a similar enumeration question in the Delphi forum:
http://www.experts-exchange.com/Q_21354398.html

I was curious enough to ask for an implementation of this enumeration in Delphi
http://www.experts-exchange.com/Q_21355411.html

ozo and I both participated in another discussion from a pharmacy environment:
http://www.experts-exchange.com/Q_22752392.html
0
 

Author Comment

by:choelt
Comment Utility
aikimark

Thanks for the post.  WOW!  The VBA Access environment problem is alot of information to digest.  I'll need time to study it to get my bearings.  It never fails to amaze me that the problems that seem simple at first glance really are quite complex.

I have both vb.net and vb6 at my disposal.  So yes, it will be a compiled program.  I'm digging out some of my dusty algorithm books purchased over the years to study up on NP's and dynamic algorithms.  I have looked into the knapsack solution, but I'm not sure what you are referring to with the greedy(big-in-first) start.  Could you explain in more detail using as much "plain English" as possible.  I don't have a degree in computer science, but I don't mind learning.  Keep in mind that I'm trying to tackle this in my "spare time".  LOL

I'm on the same page in terms of using the sum of variance of all 4 groups to pick the winner.  Maybe standard deviation also.  The light bulb works - If I could only find the switch
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@choelt

I gave a presentation of the Access/VBA problem to one of the local user groups and one of the CompSci Departments.  One of the CS grad students mentioned that my solution was an example of a class of NP solutions, something I didn't know before my presentation.  Sorry to lead with that solution, but it was complicated by the interpreted nature of the VBA runtime environment.

Upon some reflection, I might use the square root of the 1/4 sum as a limit of the lower and upper ranges of my array.
From (AllItemSum/4 - SQRT(AllItemSum/4)) To (AllItemSum/4 + SQRT(AllItemSum/4))
In the posted example and made integers, this would be an array dimensioned
From(1767 - SQRT(1767)) To (1767 + SQRT(1767))
From (1725) To (1809)

========================
An explanation of the knapsack problem.
* Simple restatement -- what is the best way to pack an knapsack to get as much stuff into it as possible.
* Greedy packing -- start packing the largest items first and then fit the smaller items around them.
* Masking -- Using the binary representation of an integer value as a use/don't use indicator.  
Example:
13 items = (2^13) - 1 combinations = 8191 combinations
Let's say that the bottom 5 items can not possibly add to our target, so
For lngItemLoop = 8191 To 2^5 Step -1    '2^5 = 32
   CurrentIterationSum = SumItems(lngItemLoop)
   If CurrentIterationSum between lowDim and highDim Then
     'look for next bin's sum, excluding the items used to make the prior bins' sums
     'there will be a secondary 2^5 loop to try and get close to the exact (AllSum/4) value
   End If
Next
The SumItems() function adds the items where there is a 1 at that position in the value of lngItemLoop.  The subsequent bin enumerations can use a mask of the current loop value against the remaining (not already used) mask, which is a very fast operation.

In the Access/VBA example, I had sorted by the oldest items to the most recent items, trying to retire the oldest balances first.  In your case, you would use the weight of the item as your sort key. (which you presented in your list)

You can improve the performance of this by precalculating the SumItem values in an array.  Then the lngItemLoop is merely indexing into an array.

You only need to evaluate combinations variance when you have all four bins sums calculated.  You can

Don't feel badly about the size and complexity of this.  Very few non CompSci folks understand the nature of NP.  I have been working in IT for decades and this Access/VBA question was the FIRST one that presented this level of complexity.

Good luck.
0
 

Author Comment

by:choelt
Comment Utility
aikimark

Thanks for the details in your last post, but I still need a little more information.  I've looked at the VBA Access sample, but I'm not able to really make heads or tails out of the big picture of what your trying to do with the numbers.  I'm firmly believe in trying to figure out a problem on your own.  If your handed the answer you don't really learn how to problem solve.  I think I can figure out how to code it, but I just need to see the big picture here.  I have some questions that may help me clarify exactly what I'm trying to accomplish here.

Are you calling lowDim and highDim the 1725 & 1809 values respectively?  So add big items first and if it falls between 1725 & 1809 then iterate through small 5 items to determine the combination which is closest to 1767 (AllItemSum/4).  At the point that I determine the combination that is closest to 1767, what is the next step?  Do I store this in a 2 dimenstion array with a decimal representing the binary equivalency of selections and the subset sumtotal representing my group 1.  From your last post it looks to just return to main loop and do it all over again with a new big in first start.  Are you saying to mask against the previous selections so that none of those values are included in this next iteration?  If this is the case, how do you know there isn't a better combination down the line which include some of the bigs with a mix and match of mids/smalls?


In the main loop it would start off by passing all 13 values to the SumItems subroutine, "1111111111111 binary" correct?  If so, why even calculate for values where you are including more than (N-3) items since I need 4 different subsets at a minimum.

Where did you come up with the following for lower/upper limits?
From (AllItemSum/4 - SQRT(AllItemSum/4)) To (AllItemSum/4 + SQRT(AllItemSum/4))
I would also think that standard deviation could be used here.  I would think it would depend on how dispersed the data is.


I seem to have a basic feel for what you are trying to do, but I'm not seeing how I'm supposed to be able to evaluate all different group combinations that are in the ballpark and then compare variance among the ones that are closet to sumtotals of 1767 with unique selections.  The only way I can see it is if I come up with 4 groups in this manner and then calculate variance and store the result.  Continue on until I have 4 more groups and again store the calculated variance.  At the end choose the set of 4 with the smallest variance.  If this is the case, would another method be more accurate as follows:
Calculate sumtotal for every combination and store sumtotal and binary mask representing selections for each total in a 2 dimenstion array.  At the end of this loop we would have a distribution of sumtotals along with their respective selections.  Then work outward from the midpoint of selection sets totaling 1767 until finding 4 that have unique selection sets using mask comparisons.

Maybe I'm not seeing it at all.  If at all possible could you revisit this and try to provide me with a more literal explanation of the entire process.  I'm a big picture kind of guy.  One other thing.  You stated in your previous post that by precalculating SumItems the process would be speeded up.  Is this due to avoiding the overhead of calling the subroutine and returning?

I would appreciate as much attention as you could give me here as I feel I'm on the verge of understanding.  

Thanks, Cory
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@Cory

I need to play with this some more, both in my head and with some code.  At this point, I was thinking about the following:
1. Use a big integer variable as my iteration (looping) variable to count down from 2^itemcount to lowerlimit.  
2. Use the bitmask of the iteration variable (1 bits) to index into the array of weights.
3. If the sum of the indexed items is within some range (surrounding our target), add the iteration variable to a data structure (collection or array of collections/lists).
4. Sort the list/array by proximity to the actual 1/4 sum value.

Here comes the tough part...
5. Starting at the top of the sorted list/array, find the next three combinations that have a non-intersecting bit mask pattern.
6. Stop looking when you find the first set of four items.

===============
Notes:
* I'm not sure if the Big Integer variables in VB.Net have native boolean operators available like the Long Integer variables to in VB6.
* The bit mask is the 0/1 pattern of the binary value of the (big) integer variable
* If we use an array of collection/list items, we don't have to do the sorting step. We fill our 'sorted' array starting with the items at our exact target value (1/4 sum) and then the items immediately above/below the target.  Then we add items target+/-2.
* Since we added the items in numerically descending order, this should give us the big-in-first values at the top of each list set.
* Although I suggested the +/- sqrt(targetvalue), I might change my mind about the lowerlimit value.

=======================
"Are you calling lowDim and highDim the 1725 & 1809 ..."
Yes.
I don't want to have a double iteration for the calculations.  The summation function can have an upper limit parameter, allowing it to stop summing as soon as the limit has been met/exceeded.

"Are you saying to mask against the previous selections so that none of those values are included in this next iteration?"
Mostly yes.  We have to be clear what we mean by iteration.  I expect there to be a FindSum() function that will have two parameters
  parmMask - bitmask pattern of current iteration variable and previous FindSum() results for the current iteration.
  parmStartPosn - The index returned by the previous FindSum() invocation

If any of the FindSum() functions returns a -1 within some iteration, then our code will 'back up' one level, increment the parmStartPosn and to another round of FindSum() until the top level returns a -1 or we 'find' all 4 sum items.  This is a lot like searching a tree or a shortest-pathfinding algorithm.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I might 'package' the FindSum() code into a recursive routine, but the packaging isn't important right now.  Just in case, do you know what recursive routines are?

While I'm thinking and playing, try to find an instance of the 40 item set that is representative of what you are likely to encounter and post it in this thread.

0
 

Author Comment

by:choelt
Comment Utility
aikimark

Yes, I've coded some recursive routines over the years in some special situations.  I'm not an expert programmer by any means, but I do understand the basics of recursive programming.  Calling a function from within itself up to n levels deep.

I don't have a sample of data that large with me, but I'm working on that.  I'll post the largest sample set I can get my hands on ASAP.

Thanks for continuing with this.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
No problem.  I like problem solving.

Although I stated that the code might be able to stop with the first set of four sum combinations, I can think of cases where the first set of four might be one exact match and three combinations near the end of our 'sorted' list.  The variance of this set-of-four might be greater than a set with no exact matches.

The eventual 'winner' set-of-four will be one where all the bits (items) are used and the variance of the set is the lowest.

========================
To get an idea of the magnitude of this problem consider:
* Simply iterating from 1 to (2^40 -1) is 512 times slower than just iterating to (2^31 -1)
* For each iteration, we need to do some item value (weight) summation
* For each iteration, we need to do a range check and, potentially, add the item to a collection/list
* Once the initial iteration is completed, we have roughly 4*(N^2) more iterations to find our four-bin optimal set, where N is the number of candidate sums found in the first iteration.
If N=30000 Then we need about 3,600,000,000 more find-and-compare iterations.

=======================
In our case, we pass some level/depth data to the recursive routine, letting it calculate whether there is something found (and return it) or keep looking.  With each depth increase, we pass a decremented value of the depth parameter passed to us.

We only need to calculate the variance, once we have a set of four sums.
0
 

Author Comment

by:choelt
Comment Utility
aikimark

Here are 2 samples that may help.  I ranked them and multiplied the values by 100.  I could not find a 40 count, but that is rare.  I would say the average number of items is close to 23.

1)  35 items
1      84
2      93
3      94
4      102
5      110
6      116
7      120
8      120
9      121
10      132
11      132
12      134
13      142
14      154
15      156
16      164
17      164
18      167
19      178
20      180
21      192
22      210
23      226
24      226
25      248
26      268
27      278
28      293
29      306
30      313
31      346
32      368
33      386
34      406
35      474

2) 23 items
1      124
2      170
3      210
4      296
5      330
6      400
7      476
8      492
9      522
10      544
11      564
12      570
13      650
14      662
15      686
16      724
17      750
18      780
19      780
20      786
21      996
22      996
23      1074
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Just to get a feel for the expected N, I ran this little code snippet against your 23 item set of data.

I know that this code isn't flexible to cover differently sized sets of data and I don't know the source of your data (file, manual entry, database, etc.).  However, it is a proof-of-concept start on the code.

Sum = 13582
Sum/4 = 3395.5
SQRT(Sum/4) = 58.27091899
RangeLow = 3337.229081 = 3337
RangeHigh = 3453.770919 = 3454

The enumaration iteration pass populated the CandidateSums array with the following counts at each sum value position:
3337      0
3338      374
3339      0
3340      346
3341      0
3342      339
3343      0
3344      368
3345      0
3346      380
3347      0
3348      329
3349      0
3350      361
3351      0
3352      334
3353      0
3354      362
3355      0
3356      403
3357      0
3358      317
3359      0
3360      376
3361      0
3362      412
3363      0
3364      346
3365      0
3366      424
3367      0
3368      375
3369      0
3370      384
3371      0
3372      390
3373      0
3374      363
3375      0
3376      404
3377      0
3378      362
3379      0
3380      351
3381      0
3382      403
3383      0
3384      358
3385      0
3386      416
3387      0
3388      346
3389      0
3390      378
3391      0
3392      411
3393      0
3394      379
3395      0       <- Target sum closest values
3396      408   <- Target sum closest values
3397      0
3398      361
3399      0
3400      390
3401      0
3402      450
3403      0
3404      348
3405      0
3406      429
3407      0
3408      409
3409      0
3410      362
3411      0
3412      431
3413      0
3414      385
3415      0
3416      417
3417      0
3418      408
3419      0
3420      380
3421      0
3422      467
3423      0
3424      393
3425      0
3426      410
3427      0
3428      416
3429      0
3430      385
3431      0
3432      452
3433      0
3434      374
3435      0
3436      389
3437      0
3438      432
3439      0
3440      389
3441      0
3442      442
3443      0
3444      388
3445      0
3446      419
3447      0
3448      446
3449      0
3450      412
3451      0
3452      449
3453      0
3454      427
            ========
TOTAL:      23059

So my earlier estimate of 30000 candidate sums seems like a reasonable guess, although this will increase as we increase the set sizes.

Would you like to tweak this code snippet to better meet your needs before I continue?
* This code will work for sets of 31 or less because I'm using a Long Integer variable.  Different variable data types will be needed for larger set sizes.  
* I don't know if there is an equivalent AND operator in VB.Net that will work with larger integer variables.  If not, we will have to create something.
* This code still needs to copy the items into another array, starting at the target position and alternating to the lower/upper boundaries.

Option Explicit
 
 

Sub SumIteration()

    Dim CandidateSums(3337 To 3454) As New Collection

    Dim lngItems(0 To 22) As Long

    Dim lngPower(0 To 22) As Long

    Dim lngLoop As Long

    Dim lngSum As Long

    For lngLoop = 0 To 22

        lngPower(lngLoop) = 2 ^ lngLoop

    Next

    lngItems(0) = 124

    lngItems(1) = 170

    lngItems(2) = 210

    lngItems(3) = 296

    lngItems(4) = 330

    lngItems(5) = 400

    lngItems(6) = 476

    lngItems(7) = 492

    lngItems(8) = 522

    lngItems(9) = 544

    lngItems(10) = 564

    lngItems(11) = 570

    lngItems(12) = 650

    lngItems(13) = 662

    lngItems(14) = 686

    lngItems(15) = 724

    lngItems(16) = 750

    lngItems(17) = 780

    lngItems(18) = 780

    lngItems(19) = 786

    lngItems(20) = 996

    lngItems(21) = 996

    lngItems(22) = 1074

    For lngLoop = 2 ^ 23 - 1 To 1 Step -1

        lngSum = SumItStatic(lngLoop, lngItems(), 3455)

        Select Case lngSum

            Case 3337 To 3454

                CandidateSums(lngSum).Add lngLoop

        End Select

    Next

    Stop

End Sub
 

Function SumItStatic(parmMask As Long, parmItems() As Long, parmSumLimit As Long) As Long

    Static lngPower() As Long

    Static IsInitialized As Boolean

    Dim lngLoop As Long

    Dim lngSum As Long

    Dim lngUbound As Long

    

    lngUbound = UBound(parmItems)

    If IsInitialized Then

    Else

        ReDim lngPower(0 To lngUbound)

        For lngLoop = 0 To lngUbound

            lngPower(lngLoop) = 2 ^ lngLoop

        Next

        IsInitialized = True

    End If

    

    For lngLoop = lngUbound To 0 Step -1

        If (parmMask And lngPower(lngLoop)) <> 0 Then

            lngSum = lngSum + parmItems(lngLoop)

            If lngSum > parmSumLimit Then

                SumItStatic = lngSum

                Exit Function

            End If

        End If

    Next

    SumItStatic = lngSum

End Function

Open in new window

0
 

Author Comment

by:choelt
Comment Utility
aikimark

The data will be manually entered from a sheet with values written down.  I will create a user interface for that.  Once the data is entered, I can sort into ascending array and pass the array into the SumItStatic function like your doing in this code.  Do you need that at this point?

I don't see anything needing tweaking at this point.  I understand what this code snippet is doing, but just to clarify.  At completion the CandidateSums collection contains the long integers whose binary equivalents represent selections which add up to sums of 3337 through 3454.  In this case there are no odd numbered sums due to all data being evenly numbered.  The results are indicating there are 408 different combinations that add up to 3396.  Am I correct so far?   I don't see how there are 408 different combinations adding up to 3396.  I guess that's why we have the computer doing the math.

Now the hard part right, as you mentioned above?  Looking at results and coming up with unique selection sets starting with these 408 possibilities, right?

As far as vb.net goes, I'm perfectly fine with using VB6 for this project.  If I need to adapt it to VB.Net in the future, then I can.  I believe VB6 may run it faster anyways.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
You correctly interpreted the results.

I do not need your input methods as long as you can adjust the code to handle a variety of different set sizes.

I just verified that VB.Net AND operator supports the 64-bit LONG integer data type.  I think that the solution would be easier in VB.Net than in VB6 since VB6 doesn't have a native 64-bit integer data type.

Since we are here to help you, can you take a crack at the next bit of code to add the CandidateSums items into a single array in the proper order?
0
 

Author Comment

by:choelt
Comment Utility
Yes I can do that.  As far as this array building code goes, your wanting to add all of the items in the 3396 index first.  Then alternate adding the values in (3395 & 3397), (3394 & 3398) ....... (3337 & 3454) indexes respectively ending up with all 23,059 values added.  If this is not correct, please let me know before I go to far with it.  It is midnight here so I'm going to catch a few winks.  I'll post the code sometime tomorrow depending on how my normal day goes.

The last hour or so I have been testing routines to mask items in the 3396 index of the collection.  I didn't use recursive routines.  So it is not optimized for speed.  I was just wanting to evaluate the results stored in the CandidateSums collection.  I found numerous groups comprised of 3 unique subsets of values totaling 3396, but not 4.  If there consists of three unique subsets totaling 3396, then there is no reason to go any further since the remaining 4th subset will have to fall in line.  Do you agree?  The remaining unused items can easily be determined using a final mask.  In this case they will add up to 3394.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
you understand correctly.

If you want an optimal answer, you have to keep looking.  If you find a set of four within the first (target value) set, then there is no better solution and you can finish early.  The problem with your proposal is that you might find three mutually exclusive combinations within the target set of items and your only other set is at the periphery of the range of sets -- or beyond.  Using the variance metric, your solution might not be better than two combinations from the target set and two from the +/-1 sets.  This is the 'N' part of the NP problem complexity description...Nondeterministic.  To me, the best optimization trick is to find a sure fire method of stopping early.

In looking at your set of 23 items, there might be some valid conclusion to use the remainder of the three sums as a stopping criteria and constructing the number from the unmasked bits.  However, I do not think this would be true with any general set of item weights.  I will also give some thought to stoppage conditions -- if we look farther, we will be starting with a worse (combination set) variance than we already have found.  As long as the list/array iteration is looking at the target value items, I can't think of such a condition.

Both of your larger set examples contain fairly even data values.  I can think of sets of data where the data values might be clumpped with few combinations close to the target value.

You can check the mutual exclusiveness of the candidates using the AND operator.  
Example:
Bin1 AND Bin2 AND Bin3 AND Bin4 = 0

As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator.  We pass this OR'ed bit mask down to the next level along with the next position (index) in the list.

I realize that this might seem to contradict what I'd written earlier in this comment, but if we find a set of four, we don't need to look further at the lowest level, but at the next to lowest level.  I'm going to call it a night and think about the structure of the code and parameters when performing a FindNext (what I'm calling this kind of function call/use similar to what you'd use with a recordset object) starting at the next-to-last depth.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I think I might know an early stoppage criteria.  Once we've done the initial enumeration iteration, we can calculate the possible 'variances' of the combinations.  We can then compare our candidate combinations' variances.  When we have iterated all the possible combinations with one distance, we might be able to state (programmatically) that we have found an optimal combination.  While this calculation is more work, it would allow us to stop with a high confidence that we'd found an optimal solution.

It might also be possible to use this distance information when we are searching for combinations, so that we might limit the depth of the searches.

In the snippet, I have done a calculation of 4 bin 'variance' (distance from target) combinations.  The first set uses the sqare of the distance.  The second set uses the ABS() of the distance.  In both of these sets, the right-most column is the sum of the first 4 columns.  The first example is more representative of statistical variance, since larger values are magnified.

0.25	0.25	0.25	0.25	1
0.25	0.25	0.25	2.25	3
0.25	0.25	2.25	2.25	5
0.25	2.25	2.25	2.25	7
2.25	2.25	2.25	2.25	9 
==============================			
0.5	0.5	0.5	0.5	2
0.5	0.5	0.5	1.5	3
0.5	0.5	1.5	1.5	4
0.5	1.5	1.5	1.5	5
1.5	1.5	1.5	1.5	6

Open in new window

0
 

Author Comment

by:choelt
Comment Utility
aikimark:

Here is my stab at the code to build the sorted array building out from target sum to lower and upper limits alternating between the 2.  This code alternates up until the point that it reaches the lower of 2 total counts in the upper and lower targets.  Let's say we've already added all of the 3396 sums to the array.  Offset +/- 1 (3395 & 3397 respectively) are both zero counts in this case, but the next set is offset +/- 2 (3394 & 3398) respectively.  3394 has 379 total items &  3398 has 361 total items.  I wrote the code to alternate up to the lower of the 2 totals.  In this case 361.  At that point I add the remaining 18 values from 3394 before continuing on to offset +/- 3 (3393 & 3399).  Is this the correct way to approach this?  Also in my main DO Loop I continue until I hit either lower or upper limit.  If there is an even number of total sum sets (as there are in this case) there will be an odd man out.  So I add it last.  I have tested the code and it seems to be working.  Let me know otherwise.
'Count total candidate sums

    lngSum = 0

    For lngLoop = 3337 To 3454

        lngSum = lngSum + CandidateSums(lngLoop).Count

    Next

    Dim lngIndexedSums() As Long

    ReDim lngIndexedSums(lngSum)

    Dim lngIndexedPosition As Long 'Current position to store next item

    Dim lngTarget, lngLowerTarget, lngUpperTarget As Long

    Dim lngTotalTargetItems, lngLowerTotal, lngUpperTotal As Long

    lngIndexedPosition = 0

    lngTarget = 3396

    '1st add all items in target index to array

    For lngLoop = 1 To CandidateSums(lngTarget).Count

        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

        lngIndexedPosition = lngIndexedPosition + 1

    Next

    lngLowerTarget = lngTarget - 1

    lngUpperTarget = lngTarget + 1

    Do While lngLowerTarget >= 3337 And lngUpperTarget <= 3454

        lngLowerTotal = CandidateSums(lngLowerTarget).Count

        lngUpperTotal = CandidateSums(lngUpperTarget).Count

        'Determine which collection has larger number of items for this offset

        If lngLowerTotal = lngUpperTotal Then

            'Count is equal for this iteration so set loop count to Count

            lngTotalTargetItems = lngLowerTotal

        Else

            'Set loop count to higher of 2

            If lngLowerTotal > lngUpperTotal Then

                lngTotalTargetItems = lngLowerTotal

            Else

                lngTotalTargetItems = lngUpperTotal

            End If

        End If

        For lngLoop = 1 To lngTotalTargetItems

            'Add next sum from lower offset target if it exists

            If lngLoop <= lngLowerTotal Then

                lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)

                lngIndexedPosition = lngIndexedPosition + 1

            End If

            'Add next sum from upper offset target if it exists

            If lngLoop <= lngUpperTotal Then

                lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)

                lngIndexedPosition = lngIndexedPosition + 1

            End If

        Next

        lngLowerTarget = lngLowerTarget - 1

        lngUpperTarget = lngUpperTarget + 1

    Loop

    

    'Check to see if there is an odd man out and add it if necessary

    If lngLowerTarget = 3337 Then

        'There is 1 more lower offset to iterate through

        For lngLoop = 1 To CandidateSums(lngLowerTarget).Count

            lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)

            lngIndexedPosition = lngIndexedPosition + 1

        Next

    End If

    If lngUpperTarget = 3454 Then

        'There is 1 more upper offset to iterate through

        For lngLoop = 1 To CandidateSums(lngUpperTarget).Count

            lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)

            lngIndexedPosition = lngIndexedPosition + 1

        Next

    End If

    

    'Validate all positions of indexed array contain sums

    Dim bZeroFound As Boolean: bZeroFound = False

    Dim lngIndexedValue As Long

    For lngLoop = 0 To UBound(lngIndexedSums()) - 1

        lngIndexedValue = lngIndexedSums(lngLoop) 'set watch to see values

        If Not lngIndexedValue > 0 Then

            bZeroFound = True

        End If

    Next

    

    Debug.Print "LowerTargetOffset: " & Str(lngLowerTarget)

    Debug.Print "UpperTargetOffset: " & Str(lngUpperTarget)

    Debug.Print "Zero Found: " & Str(bZeroFound)

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Since the target might not be evenly divisible by four, you should tweak your code, adding some preference to which of the lower/upper index gets used first.  It might help to look at a variation on our 23 item example (in the snippet).

You could also use a MOD function, where
* target MOD 4 = 1 should prefer the lower adding before the upper
* target MOD 4 = 3 should prefer the upper adding before the lower
* for 0 and 2 MOD values, there is no preference
I've done something similar where I have a two item array that is initialized as either (-1,1) or (1,-1) as my preferences.

Sum      Sum/4    Preference

------   -------- -----------

13580	3395     None

13581	3395.25  Lower

13582	3395.5   None

13583	3395.75  Upper

13584	3396     None

13585	3396.25  Lower

13586	3396.5   None

13587	3396.75  Upper

13588	3397     None

Open in new window

0
 

Author Comment

by:choelt
Comment Utility
So in this case where the sum/4 is 3395.5, I should NOT start out by adding all of the 3396 sums to the sorted array.  I should start out alternating between 3395 & 3396, although in our case 3395 has no items.

Here is how I think it should be sorted for the different cases:
Case
MOD 0 - Start by adding all exact target sums and then alternate
MOD 1 - Start by adding all target sums from the next lowest and then alternate
MOD 2 - Start by alternating
MOD 3 - Start by adding all target sums from the next highest and then alternate

Examples with above sum variations you presented:
 Sum      Sum/4    Preference
------   -------- -----------
13580      3395    add all sums from 3395 and start alternating 3394 & 3396
13581      3395.25   add all sums from 3395 and start alternating 3394 & 3396
13582      3395.5   start alternating 3394 & 3396
13583      3395.75  add all sums from 3396 and start alternating 3397 & 3395
13584      3396     add all sums from 3396 and start alternating 3395 & 3397
 .............. and so on

Is that what you had in mind?

Something else I'm a little confused on.  Hypothetically speaking lets say the sum/4 was 3395 mod 0.  So I add all items from 3395 first and then start alternating between 3394 & 3396.  I know it's not real data because our sum/4 is 3395.5, but for the sake of argument would it be better to sort these numbers from high to low, rather than alternate back and forth since Big In First is wanting biggest numbers first.

Looking the the actual first 5 items of 3394 & 3396 collections:
3394:  6554112
3394:  6423040
3394:  6324289
3394:  6299778
3394:  6294020
3396:  7340048
3396:  6815764
3396:  6299669
3396:  6295566
3396:  6291607

These could be stored in the sorted array as:
7340048  3396
6815764  3396
6554112  3394
6423040  3394
6324289  3394
6299778  3394
6299669  3396
6295566  3396
6294020  3394
6291607  3396
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I did *not* mean to alternate adding items from the +/-1 arrays.

You want to add all the items from arrays closest to your target before adding any items from arrays farther away from your target.

In our example case items from 3396 and 3395 are equidistant from 3395.5, so there is no order preference.  In this case, I don't think it matters whether you create an algorithm that gives preference to the higher value (in the case of ties).

I think you have made an interesting case for the introduction of a merge operation when adding items from equidistant (from target) arrays.  It is consistent with my earlier Big-In-First comments.  Are you familiar with merge sort algorithm?  It is quite simple -- add the largest/smallest item from the 'head' of two lists until you exhaust one of the lists and then add the remaining items from the non-empty list.  You would use a merge when
target MOD 4 = 0 or 2

I had envisioned an evaluation of candidates algorithm that would iterate through an entire 'distance set' (all items from an array with the same distance from the target).  In that case, the merging wouldn't be necessary.

From your recent comments, it seems like you are starting to get a handle on this.  How do you feel about your understanding of the problem and the path we are following toward a solution?

===================
I'm thinking about some of these shortcuts:
* create and populate a data structure array that has the lower and upper index values and the distance-from-target for all items within that range.  That will allow us to calculate the 'variance' without having to recalculate the sums of the four items we are considering.  I've included an example of part of such an data structure with 1-origin.  This is only the first five ranges closest to the target value.
* use some sort of vector to communicate the starting positions for the FindSum/FindNext functions.  This would be a two-way communication vector.

Note: the variance values get large quickly, so I would expect the optimal four item set to be found in the first few ranges.  However, that is not guaranteed.
From     To       Variance

-----    -------  ---------

1	408	0.25

409	788	2.25

789	1150	6.25

1151	1562	12.25

1563	1953	20.25

Open in new window

0
 

Author Comment

by:choelt
Comment Utility
I had no time to think about this today, but I would like to say that 'yes' I'm fairly comfortable with the understanding of the problem and the path being taken thus far.  I have to admit that I'm confused about your approach your comments on data structure containing upper/lower index and distance from target values.  I'm really not following your thought pattern here at all with the sample structure shown and what is representing.

Are we still going to need the sorted array I was working on?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
"...I'm confused..."
there are 408 items that are in the group closest to the target.
there are 379 items that are in the next closest group.
there are 361 items that are in the next closest group.
there are 411 items that are in the next closest group.

the values in the data structure are cumulative sums, representing the original counts stacked on top of each other in the single array/list.

We don't have to have the From column in this list.  I thought it made the data structure more understandable.  I was wrong.

====================
"Are we still going to need the sorted array I was working on?"
Yes.  The new data structure merely describes the sets of items added to this array.
0
 

Author Comment

by:choelt
Comment Utility
OK,

I see where you are coming up with the cumulative sums and calculating the squared variance, but I think my problem is that I don't fully understand how we are going to evaluate 4 candidate sets.

Let me see if I understand.

You stated
"You can check the mutual exclusiveness of the candidates using the AND operator.  
Example:
Bin1 AND Bin2 AND Bin3 AND Bin4 = 0

As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator.  We pass this OR'ed bit mask down to the next level along with the next position (index) in the list."

With the sorted array you're wanting to start with index 0 AND'ing index 1,2,3...N until you find a mutually exclusive set.  This gives you 2 sets of items.  Hypothetically, lets' say now you have  items (0 and 240) which have mutually exclusive sets.  So continue on AND'ing (0,240, 241....N) until we find 3 mutually exclusive items.  (0,240,319)  Continue on AND'ing (0,240,319,320.....N) until we find 4.  If a 4th item is found, store into structure with the 4 items and sum total of squared variance.  Let's say (0,240,319,689).  In our case the sum total of squared variance will be .25 + .25 + .25 + 2.25 = 3.00.
At this point what comes next?  Do we start with index 1 now and repeat the above process?

Above you state:
"As we are searching down the list, we combine the bits from the upper levels with what we have found using the OR operator.  We pass this OR'ed bit mask down to the next level along with the next position (index) in the list"

You lose me here.  Is this what you're referring too with the proposed FindNext function?

I fully understand the CandidateSums collections and that it represents the Big-In-First combinations of all combinations which sum up to totals ranging from 3337 to 3454.  I understand how to sort these items into an array starting with target sum(s) working out towards upper and lower range of 3454 and 3337 respectively.  I understand how to AND the selections to come up with 4 that are mutually exclusive, but after this is where I don't follow your thought pattern.  Can you possibly explain it in another way so that a 'DUMMY' might understand?
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
Comment Utility
In the following snippet, I'm using an array of candidate items and a depth value to communicate with the searching routine invocations both from the outer (Big Item) loop and within recursively invoked FindItem routines.

NOTE: I have not tested the snippet and know that it is incomplete.  It is posted for purposes of communication.

I think the construction of the mask from the higher levels can be constructed at the level and not passed down from the higher level.

I assume that the ItemArray you are filling is scoped at the module or public level, so it is visible by all the routines.

======================
"At this point what comes next?  Do we start with index 1 now and repeat the above process?"
If the last two items are in the same 'zone' (as found in the CandidateSums ranges data structure), then we compare the 'zones' for the second and third Vector items. -- this is best implemented as a conditional loop, starting at the end of the Vector and moving up until we find what level we might increment that could possibly give use a better four-bin combination.

In the example you posted, 319 and 689 are in two different zones, so we would increment Vector(2) = 320, set Vector(3) = 0 and reinvoke FindItems.

===================
Looking at your example, I think it is possible to have this vector (or another vector) contain some depth upper limits for searching, based on our 'zone' knowledge.  These depth upper limit values may be changed based on the fitness testing we do in the BigItem loop.  I haven't thought about the logic for these values.

In the example, we might state that once we found that 319,689 combination, we could reasonably set the upper limits for depth 3 and depth 4 to 408 before invoking the FindItems routine.
'iterating the array

For lngBestItems = 0 To Ubound(itemarray) - 4

  Vector(0) = lngBestItems

  Vector(1) = lngBestItems + 1

  Vector(2) = 0

  Vector(3) = 0

  Depth = 1

  Do

    FindItems Vector(), Depth

    If Vector(3) = 0 Then  'could also use -1

      'increment the lowest non-zero Vector item

    Else  'evaluate fitness

      If PriorFitness > Fitness(Vector()) Then

        PriorFitness = Fitness(Vector()) 

        BestVectorSoFar = Vector

      End If

      'increment the next-to-lowest Vector item

    End If

  Loop Until SomeCriteriaIsMet

Next
 

Sub FindItems(parmVector(), parmDepth)

  Dim lngMask As Long

  Dim lngLoop As Integer
 

  'Do I need to search starting at this level or somewhere lower?

  If parmVector(parmDepth+1)<>0 Then

    FindItems(parmVector(), parmDepth+1)

  End If

  If parmVector(parmDepth+1) = 0 Then

    'Look forward at this level

  End If

  'construct the bit mask of all higher combination items

  For lngLoop = 0 To ParmDepth-1

    lngMask = lngMask OR ItemArray(parmVector(lngLoop))

  Next

  For lngLoop = parmVector(parmDepth) To Ubound(ItemArray)-(4-Depth)

    If (ItemArray(lngLoop) And lngMask) = 0 Then

      parmVector(Depth) = lngLoop

      Exit For

    End If

  Next

  If Depth < 3 Then  'look lower

    FindItems(parmVector(), parmDepth+1)

  Else

    Exit Sub

  End If

Next

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
@choelt

Now you see why some of these NP problem solutions result in such long threads.
0
 

Author Comment

by:choelt
Comment Utility
OK, I'll look over the section of code posted to try and gain a better understanding of the process.  For testing purposes, I did not scope the lngIndexedSums() array (I'll change the name of it to ItemArray() for compatibility) at the module level, but will do so once complete.  It will probably be later in the evening before I have a chance to look at your previous post in detail.

Now I'm well aware of how these threads can reach such lengths.  Going through this is great exercise for the old mind though.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
correction...two old minds :-)

Since I haven't gotten around to any substantive coding, there isn't a reason to rename the array.  Just post your code and I'll know what you call the merged array.
0
 

Author Comment

by:choelt
Comment Utility
I'm just about to start tweaking the code I have up to this point.  I should be able to post it later tonight.  I've had a little fire put under me on this.  Apparently, someone else in another division is working on an Access database to try and do something along similar lines.  I guess they are using VBA, but I don't think they have anything completed with an algorithm yet.  I'll call tomorrow to see where they are at, and let them know what is being done on this end.  So at least I shouldn't have to build a GUI for entering the data.  I can just query a database and then post the results back to the database.  However, that also means that I need to wrap this up quicker than previously expected.  As always!

I was just slammed with about 4 inches of rain/small hail in about 15 minutes which ended up flooding my garage.  I'll try and cleanup that first and then get on the coding express.
0
 

Author Comment

by:choelt
Comment Utility
aikimark
Here is the latest code to sort the array.  I have taken into consideration MOD 4 as follows:

Result:
0 - Exact integer target value
Add all target values 1st and then perform merge-sort on +/- 1 target values choosing highest values
1 - .25
Add values from next lowest target, next highest, next lowest, next highest...........
2 - .5
Start by merge-sort on +/- .5 target values choosing highest values
3 - .75
Add values from next highest target, next lowest, next highest, next lowest...........

I have compiled the program and it runs in 4 seconds on my old AMD 1Ghz tower.  Pretty impressive in my opinion considering its calculating over 8191 possible combinations and sorting the results.
Option Explicit

Dim lngIndexedSums() As Long

 

Sub SumIteration()

    Dim CandidateSums(3337 To 3454) As New Collection

    Dim Groups() As String

    

    Dim lngItems(0 To 22) As Long

    Dim lngPower(0 To 22) As Long

    Dim lngLoop As Long

    Dim lngSum As Long

    For lngLoop = 0 To 22

        lngPower(lngLoop) = 2 ^ lngLoop

    Next

    lngItems(0) = 124

    lngItems(1) = 170

    lngItems(2) = 210

    lngItems(3) = 296

    lngItems(4) = 330

    lngItems(5) = 400

    lngItems(6) = 476

    lngItems(7) = 492

    lngItems(8) = 522

    lngItems(9) = 544

    lngItems(10) = 564

    lngItems(11) = 570

    lngItems(12) = 650

    lngItems(13) = 662

    lngItems(14) = 686

    lngItems(15) = 724

    lngItems(16) = 750

    lngItems(17) = 780

    lngItems(18) = 780

    lngItems(19) = 786

    lngItems(20) = 996

    lngItems(21) = 996

    lngItems(22) = 1074

    For lngLoop = 2 ^ 23 - 1 To 1 Step -1

        lngSum = SumItStatic(lngLoop, lngItems(), 3455)

        Select Case lngSum

            Case 3337 To 3454

                CandidateSums(lngSum).Add lngLoop

        End Select

    Next

    

    'Count total candidate sums

    lngSum = 0

    For lngLoop = 3337 To 3454

        lngSum = lngSum + CandidateSums(lngLoop).Count

    Next

    ReDim lngIndexedSums(lngSum)

    

    Dim lngIndexedPosition As Long 'Current position to store next item

    Dim lngTarget, lngLowerTarget, lngUpperTarget As Long

    Dim lngLowerTotal, lngUpperTotal As Long

    Dim lngLowerValue, lngUpperValue As Long

    Dim lngItemSumTotal As Long

    For lngLoop = 0 To UBound(lngItems)

        lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)

    Next

    

    Select Case lngItemSumTotal Mod 4

        Case 0 'exact integer

            lngTarget = lngItemSumTotal / 4

            lngLowerTarget = lngTarget - 1

            lngUpperTarget = lngTarget + 1

        Case 1 '.25

            lngTarget = Round(lngItemSumTotal / 4, 0)

            lngLowerTarget = lngTarget

            lngUpperTarget = lngTarget + 1

        Case 2 '.5

            lngTarget = 0

            lngLowerTarget = Int(lngItemSumTotal / 4)

            lngUpperTarget = lngLowerTarget + 1

        Case 3 '.75

            lngTarget = Round(lngItemSumTotal / 4, 0)

            lngLowerTarget = lngTarget - 1

            lngUpperTarget = lngTarget

        

    End Select

    

    'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)

    'Debug.Print "Target: " & Str(lngTarget)

    'Debug.Print "LowerTarget: " & Str(lngLowerTarget)

    'Debug.Print "UpperTarget: " & Str(lngUpperTarget)

    

    Dim lngLowerMergePosition, lngUpperMergePosition As Long

    lngIndexedPosition = 0

    Do While lngTarget <> -1

        If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then

            'MOD 1 or 3 so add all values from single target

            For lngLoop = 1 To CandidateSums(lngTarget).Count

                lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

                lngIndexedPosition = lngIndexedPosition + 1

            Next

            'Alternate to next target

            If lngTarget = lngLowerTarget Then

                lngLowerTarget = lngTarget - 1

                If lngUpperTarget <= 3454 Then

                    lngTarget = lngUpperTarget

                Else

                    If lngLowerTarget >= 3337 Then

                        lngTarget = lngLowerTarget

                    Else

                        lngTarget = -1

                    End If

                End If

            Else

                lngUpperTarget = lngTarget + 1

                If lngLowerTarget >= 3337 Then

                    lngTarget = lngLowerTarget

                Else

                    If lngUpperTarget <= 3454 Then

                        lngTarget = lngUpperTarget

                    Else

                        lngTarget = -1

                    End If

                End If

            End If

        Else

            'MOD 0 or 2

            If lngTarget > 0 Then 'MOD 0 so add exact target values first

                For lngLoop = 1 To CandidateSums(lngTarget).Count

                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

                    lngIndexedPosition = lngIndexedPosition + 1

                Next

                lngTarget = 0 'Start merging

            End If

            'Merge largest items from lower & upper target groups

            lngLowerMergePosition = 1

            lngUpperMergePosition = 1

            lngLowerTotal = CandidateSums(lngLowerTarget).Count

            lngUpperTotal = CandidateSums(lngUpperTarget).Count

            Do While lngLowerMergePosition <= lngLowerTotal And _

                     lngUpperMergePosition <= lngUpperTotal

                lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)

                lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)

                If lngLowerValue >= lngUpperValue Then

                    lngIndexedSums(lngIndexedPosition) = lngLowerValue

                    lngLowerMergePosition = lngLowerMergePosition + 1

                Else

                    lngIndexedSums(lngIndexedPosition) = lngUpperValue

                    lngUpperMergePosition = lngUpperMergePosition + 1

                End If

                lngIndexedPosition = lngIndexedPosition + 1

            Loop

            If lngLowerMergePosition <= lngLowerTotal Then

                'Add remaining lower target values

                For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count

                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)

                    lngIndexedPosition = lngIndexedPosition + 1

                Next

            End If

            If lngUpperMergePosition <= lngUpperTotal Then

                'Add remaining upper target values

                For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count

                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)

                    lngIndexedPosition = lngIndexedPosition + 1

                Next

            End If

            'Determine next upper/lower targets

            lngLowerTarget = lngLowerTarget - 1

            lngUpperTarget = lngUpperTarget + 1

            If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then

                If lngLowerTarget < 3337 Then

                    If lngUpperTarget <= 3454 Then

                        'Upper target(s) still exist so add values

                        lngTarget = lngUpperTarget

                    Else

                        'Both outside limits so exit loop

                        lngTarget = -1

                    End If

                Else

                    lngTarget = lngLowerTarget

                End If

            End If

        End If

    Loop 

        

End Sub
 

Function SumItStatic(parmMask As Long, parmItems() As Long, parmSumLimit As Long) As Long

    Static lngPower() As Long

    Static IsInitialized As Boolean

    Dim lngLoop As Long

    Dim lngSum As Long

    Dim lngUbound As Long

    

    lngUbound = UBound(parmItems)

    If IsInitialized Then

    Else

        ReDim lngPower(0 To lngUbound)

        For lngLoop = 0 To lngUbound

            lngPower(lngLoop) = 2 ^ lngLoop

        Next

        IsInitialized = True

    End If

    

    For lngLoop = lngUbound To 0 Step -1

        If (parmMask And lngPower(lngLoop)) <> 0 Then

            lngSum = lngSum + parmItems(lngLoop)

            If lngSum > parmSumLimit Then

                SumItStatic = lngSum

                Exit Function

            End If

        End If

    Next

    SumItStatic = lngSum

End Function

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
A nice bit of coding under suboptimal conditions.  Sorry to hear about the flooding.  Hope the water didn't contain alligators. :-)

If the other person is using MS-Access, does that mean that the data originates in a database?  If so, revisit my original NP discussion, where both the data and the computing took place in an Access VBA environment.  It should give you a template for sizing and filling an array.  Much more flexible than the literal numeric values in the current version of the code.

Speaking of flexibility...if you have the time, look at ways of removing the hard-coded limits, such as 3454.  Reference limits that you calculate and store in variables or in the LBound() and UBound() values applied to the array.  You will need to do this eventually.  Might as well do it while I'm working on the next bit.

One other thing about your 'competition'...for reasonably few items, you can solve such NP problems in Access using either the database engine or VBA code.  However, the interperative nature of the code soon limits the solvability size because of the time required.  In other words, your VB.Net runtime environment is going to scale much better than VBA with your 40 item max.  In an earlier test, I ran a 35 item enumeration loop in about 5 minutes.

It looks like the ball is in my court now.  Time to roll up my sleeves and get the next section fleshed out.  I should have some time this weekend to work on it.  Please be patient.
0
 

Author Comment

by:choelt
Comment Utility
Yes I realize there are alot of hard numbers in this code.  I will take care of that.  I'm going to discuss integration with the late comer to the party and come up with a plan.  I'm going to be in and out for the next several days so my response time won't be ideal, but I'll try to check-in at least once a day through about Wednesday.  It just depends on internet access availability.

Did you have a chance to look at the code I posted late last night?  I wasn't able to test it as thoroughly as I wanted to due to heavy eyelids.  Let me know if you have any issues with the way it is sorting the sums?

Thanks for all of your help on this.  I look forward to racking my brains trying to figure out your next routines.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I got compile errors when I brought your code into VS2008.

I've tweaked the code down through the
   ReDim lngIndexedSums
statement.

Although the VB.Net code isn't as simple as classic VB code, the availability of the 64-bit LONG integer data type allows enumerations beyond 31 items.

Two thoughts I had this weekend:
* we might be able to use the largest item as the upper limit if it exceeds our target value.  Not sure of this assumption, so I didn't include it in the snippet.
* we might tighten the range from: target +/-SQRT(target) to something that looks at the smallest values in the list.  For instance, if I take the largest item of the subset that will not exceed the target and use its SQRT(), I get a much smaller range = 22.85.  I also played with the difference between the largest of the small item subset and the smallest item SQRT = 19.95.  While we do not have a lot of candidate sums with our 23-item sample set, trimming the range will result in faster performance of the SumItStatic() function -- fewer iterations as well as memory footprint reductions for the largest sets.


Option Explicit On
 

Module Module1

    Public lngIndexedSums() As Long
 

    Sub SumIteration()

        Dim Groups() As String
 

        Dim lngItems(0 To 22) As Long

        Dim lngPower(0 To 22) As Long

        Dim lngEnumerate As Long

        Dim lngLoop As Integer

        Dim lngSum As Integer

        Dim lngCandidateCount As Integer
 

        For lngLoop = 0 To 22

            lngPower(lngLoop) = 2 ^ lngLoop

        Next

        lngItems(0) = 124

        lngItems(1) = 170

        lngItems(2) = 210

        lngItems(3) = 296

        lngItems(4) = 330

        lngItems(5) = 400

        lngItems(6) = 476

        lngItems(7) = 492

        lngItems(8) = 522

        lngItems(9) = 544

        lngItems(10) = 564

        lngItems(11) = 570

        lngItems(12) = 650

        lngItems(13) = 662

        lngItems(14) = 686

        lngItems(15) = 724

        lngItems(16) = 750

        lngItems(17) = 780

        lngItems(18) = 780

        lngItems(19) = 786

        lngItems(20) = 996

        lngItems(21) = 996

        lngItems(22) = 1074

        'calculate lower and upper bounds

        '  * calculate target = sum(lngItems)/4

        '  * if the largest item is > target, it should probably be the upper bound

        '  * otherwise, use the range as target +/- SQRT(target)

        '

        'initialize the sizes and lb arrays

        Dim sizes() As Integer '= {118}

        Dim lb() As Integer '= {3337}

        Dim Target As Single, LowerBound As Integer, UpperBound As Integer

        Target = lngItems.Sum / 4

        LowerBound = Target - Math.Sqrt(Target)

        UpperBound = Target + Math.Sqrt(Target)

        ReDim lb(0)

        ReDim sizes(0)

        lb(0) = LowerBound

        sizes(0) = UpperBound - LowerBound + 1

        Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)

        For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)

            CandidateSums(lngLoop) = New Collection

        Next
 

        For lngEnumerate = (2 ^ 23) - 1 To 1 Step -1

            lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)

            Select Case lngSum

                Case LowerBound To UpperBound

                    CandidateSums(lngSum).Add(lngLoop)

                    lngCandidateCount = lngCandidateCount + 1

            End Select

        Next

        'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)

        '    Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)

        'Next
 

        'Count total candidate sums

        'lngSum = 0

        'For lngLoop = LowerBound To UpperBound

        '    lngSum = lngSum + CandidateSums(lngLoop).Count

        'Next
 

        'NOTE: the following array is one larger than requred

        ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
 

        Dim lngIndexedPosition As Long 'Current position to store next item

        Dim lngTarget, lngLowerTarget, lngUpperTarget As Long

        Dim lngLowerTotal, lngUpperTotal As Long

        Dim lngLowerValue, lngUpperValue As Long

        Dim lngItemSumTotal As Long

        For lngLoop = 0 To UBound(lngItems)

            lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)

        Next
 

        Select Case lngItemSumTotal Mod 4

            Case 0 'exact integer

                lngTarget = lngItemSumTotal / 4

                lngLowerTarget = lngTarget - 1

                lngUpperTarget = lngTarget + 1

            Case 1 '.25

                lngTarget = Math.Round(lngItemSumTotal / 4, 0)

                lngLowerTarget = lngTarget

                lngUpperTarget = lngTarget + 1

            Case 2 '.5

                lngTarget = 0

                lngLowerTarget = Int(lngItemSumTotal / 4)

                lngUpperTarget = lngLowerTarget + 1

            Case 3 '.75

                lngTarget = Math.Round(lngItemSumTotal / 4, 0)

                lngLowerTarget = lngTarget - 1

                lngUpperTarget = lngTarget
 

        End Select
 

        'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)

        'Debug.Print "Target: " & Str(lngTarget)

        'Debug.Print "LowerTarget: " & Str(lngLowerTarget)

        'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
 

        Dim lngLowerMergePosition, lngUpperMergePosition As Long

        lngIndexedPosition = 0

        Do While lngTarget <> -1

            If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then

                'MOD 1 or 3 so add all values from single target

                For lngLoop = 1 To CandidateSums(lngTarget).Count

                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

                    lngIndexedPosition = lngIndexedPosition + 1

                Next

                'Alternate to next target

                If lngTarget = lngLowerTarget Then

                    lngLowerTarget = lngTarget - 1

                    If lngUpperTarget <= 3454 Then

                        lngTarget = lngUpperTarget

                    Else

                        If lngLowerTarget >= 3337 Then

                            lngTarget = lngLowerTarget

                        Else

                            lngTarget = -1

                        End If

                    End If

                Else

                    lngUpperTarget = lngTarget + 1

                    If lngLowerTarget >= 3337 Then

                        lngTarget = lngLowerTarget

                    Else

                        If lngUpperTarget <= 3454 Then

                            lngTarget = lngUpperTarget

                        Else

                            lngTarget = -1

                        End If

                    End If

                End If

            Else

                'MOD 0 or 2

                If lngTarget > 0 Then 'MOD 0 so add exact target values first

                    For lngLoop = 1 To CandidateSums(lngTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                    lngTarget = 0 'Start merging

                End If

                'Merge largest items from lower & upper target groups

                lngLowerMergePosition = 1

                lngUpperMergePosition = 1

                lngLowerTotal = CandidateSums(lngLowerTarget).Count

                lngUpperTotal = CandidateSums(lngUpperTarget).Count

                Do While lngLowerMergePosition <= lngLowerTotal And _

                         lngUpperMergePosition <= lngUpperTotal

                    lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)

                    lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)

                    If lngLowerValue >= lngUpperValue Then

                        lngIndexedSums(lngIndexedPosition) = lngLowerValue

                        lngLowerMergePosition = lngLowerMergePosition + 1

                    Else

                        lngIndexedSums(lngIndexedPosition) = lngUpperValue

                        lngUpperMergePosition = lngUpperMergePosition + 1

                    End If

                    lngIndexedPosition = lngIndexedPosition + 1

                Loop

                If lngLowerMergePosition <= lngLowerTotal Then

                    'Add remaining lower target values

                    For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                End If

                If lngUpperMergePosition <= lngUpperTotal Then

                    'Add remaining upper target values

                    For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                End If

                'Determine next upper/lower targets

                lngLowerTarget = lngLowerTarget - 1

                lngUpperTarget = lngUpperTarget + 1

                If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then

                    If lngLowerTarget < 3337 Then

                        If lngUpperTarget <= 3454 Then

                            'Upper target(s) still exist so add values

                            lngTarget = lngUpperTarget

                        Else

                            'Both outside limits so exit loop

                            lngTarget = -1

                        End If

                    Else

                        lngTarget = lngLowerTarget

                    End If

                End If

            End If

        Loop
 

    End Sub
 

    Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long

        Static lngPower() As Long

        Static IsInitialized As Boolean

        Dim lngLoop As Long

        Dim lngSum As Long

        Dim lngUbound As Long
 

        lngUbound = UBound(parmItems)

        If IsInitialized Then

        Else

            ReDim lngPower(0 To lngUbound)

            For lngLoop = 0 To lngUbound

                lngPower(lngLoop) = 2 ^ lngLoop

            Next

            IsInitialized = True

        End If
 

        For lngLoop = lngUbound To 0 Step -1

            If (parmMask And lngPower(lngLoop)) <> 0 Then

                lngSum = lngSum + parmItems(lngLoop)

                If lngSum > parmSumLimit Then

                    SumItStatic = lngSum

                    Exit Function

                End If

            End If

        Next

        SumItStatic = lngSum

    End Function

End Module

Open in new window

0
 

Author Comment

by:choelt
Comment Utility
I coded that in VB6 because that was what you were using originally.  I have loaded it into VB.NET Visual Studio 2005 and have one problem with your changes.

Line 51:  Target = lngItems.Sum / 4

Is this supported in VS 2008?  The compiler in VS 2005 shows no such method supported.  Anyways, I went ahead and just looped through the items again to calculate the sum as before and it seems to work now.  I compiled it as a console app and renamed the sumiteration sub to Main also.  It flies through the initial 23 items, but when I added 8 more items onto the end for a total of 31 it gives me an error (index out of bounds???) during the sorting phase.  This is after it calculates around 2.5 Million possible candidate sums.

I did not consider data sets where we single values larger than the target value.  Would they not have to stand on their own?  I guess it depends on the spread of data.

I'm off for now, but later this evening I will look back into this error I'm receiving.  If I switch it back to the original 23 items it runs fine.
Option Explicit On
 

Module Module1

    Public lngIndexedSums() As Long
 

    Sub Main()

        Dim Groups() As String
 

        Dim lngItems(0 To 30) As Long

        Dim lngPower(0 To 30) As Long

        Dim lngEnumerate As Long

        Dim lngLoop As Integer

        Dim lngSum As Integer

        Dim lngCandidateCount As Integer
 

        For lngLoop = 0 To 30

            lngPower(lngLoop) = 2 ^ lngLoop

        Next

        lngItems(0) = 124

        lngItems(1) = 170

        lngItems(2) = 210

        lngItems(3) = 296

        lngItems(4) = 330

        lngItems(5) = 400

        lngItems(6) = 476

        lngItems(7) = 492

        lngItems(8) = 522

        lngItems(9) = 544

        lngItems(10) = 564

        lngItems(11) = 570

        lngItems(12) = 650

        lngItems(13) = 662

        lngItems(14) = 686

        lngItems(15) = 724

        lngItems(16) = 750

        lngItems(17) = 780

        lngItems(18) = 780

        lngItems(19) = 786

        lngItems(20) = 996

        lngItems(21) = 996

        lngItems(22) = 1074

        lngItems(23) = 1079

        lngItems(24) = 1083

        lngItems(25) = 1239

        lngItems(26) = 1287

        lngItems(27) = 1391

        lngItems(28) = 1396

        lngItems(29) = 1412

        lngItems(30) = 1457
 

        For i As Integer = 0 To lngItems.GetUpperBound(0)

            lngSum += lngItems(i)

        Next

        'calculate lower and upper bounds

        '  * calculate target = sum(lngItems)/4

        '  * if the largest item is > target, it should probably be the upper bound

        '  * otherwise, use the range as target +/- SQRT(target)

        '

        'initialize the sizes and lb arrays

        Dim sizes() As Integer '= {118}

        Dim lb() As Integer '= {3337}

        Dim Target As Single, LowerBound As Integer, UpperBound As Integer

        'Target = lngItems.lngItems.lngItems.Sum / 4

        Target = lngSum / 4

        LowerBound = Target - Math.Sqrt(Target)

        UpperBound = Target + Math.Sqrt(Target)

        Console.WriteLine("Target: " + Target.ToString)

        Console.WriteLine("LowerBound: " + LowerBound.ToString)

        Console.WriteLine("UpperBound: " + UpperBound.ToString)

        Console.WriteLine()
 

        ReDim lb(0)

        ReDim sizes(0)

        lb(0) = LowerBound

        sizes(0) = UpperBound - LowerBound + 1

        Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)

        For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)

            CandidateSums(lngLoop) = New Collection

        Next
 

        For lngEnumerate = (2 ^ 31) - 1 To 1 Step -1

            lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)

            Select Case lngSum

                Case LowerBound To UpperBound

                    CandidateSums(lngSum).Add(lngLoop)

                    lngCandidateCount = lngCandidateCount + 1

                    Console.CursorLeft = 0

                    Console.CursorTop = 4

                    Console.Write("Total Candidates: " + lngCandidateCount.ToString)

            End Select

        Next

        'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)

        '    Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)

        'Next
 

        'Count total candidate sums

        'lngSum = 0

        'For lngLoop = LowerBound To UpperBound

        '    lngSum = lngSum + CandidateSums(lngLoop).Count

        'Next
 

        'NOTE: the following array is one larger than requred

        ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
 

        Dim lngIndexedPosition As Long 'Current position to store next item

        Dim lngTarget, lngLowerTarget, lngUpperTarget As Long

        Dim lngLowerTotal, lngUpperTotal As Long

        Dim lngLowerValue, lngUpperValue As Long

        Dim lngItemSumTotal As Long

        For lngLoop = 0 To UBound(lngItems)

            lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)

        Next
 

        Select Case lngItemSumTotal Mod 4

            Case 0 'exact integer

                lngTarget = lngItemSumTotal / 4

                lngLowerTarget = lngTarget - 1

                lngUpperTarget = lngTarget + 1

            Case 1 '.25

                lngTarget = Math.Round(lngItemSumTotal / 4, 0)

                lngLowerTarget = lngTarget

                lngUpperTarget = lngTarget + 1

            Case 2 '.5

                lngTarget = 0

                lngLowerTarget = Int(lngItemSumTotal / 4)

                lngUpperTarget = lngLowerTarget + 1

            Case 3 '.75

                lngTarget = Math.Round(lngItemSumTotal / 4, 0)

                lngLowerTarget = lngTarget - 1

                lngUpperTarget = lngTarget
 

        End Select
 

        'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)

        'Debug.Print "Target: " & Str(lngTarget)

        'Debug.Print "LowerTarget: " & Str(lngLowerTarget)

        'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
 

        Dim lngLowerMergePosition, lngUpperMergePosition As Long

        lngIndexedPosition = 0

        Do While lngTarget <> -1

            If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then

                'MOD 1 or 3 so add all values from single target

                For lngLoop = 1 To CandidateSums(lngTarget).Count

                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

                    lngIndexedPosition = lngIndexedPosition + 1

                Next

                'Alternate to next target

                If lngTarget = lngLowerTarget Then

                    lngLowerTarget = lngTarget - 1

                    If lngUpperTarget <= 3454 Then

                        lngTarget = lngUpperTarget

                    Else

                        If lngLowerTarget >= 3337 Then

                            lngTarget = lngLowerTarget

                        Else

                            lngTarget = -1

                        End If

                    End If

                Else

                    lngUpperTarget = lngTarget + 1

                    If lngLowerTarget >= 3337 Then

                        lngTarget = lngLowerTarget

                    Else

                        If lngUpperTarget <= 3454 Then

                            lngTarget = lngUpperTarget

                        Else

                            lngTarget = -1

                        End If

                    End If

                End If

            Else

                'MOD 0 or 2

                If lngTarget > 0 Then 'MOD 0 so add exact target values first

                    For lngLoop = 1 To CandidateSums(lngTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                    lngTarget = 0 'Start merging

                End If

                'Merge largest items from lower & upper target groups

                lngLowerMergePosition = 1

                lngUpperMergePosition = 1

                lngLowerTotal = CandidateSums(lngLowerTarget).Count

                lngUpperTotal = CandidateSums(lngUpperTarget).Count

                Do While lngLowerMergePosition <= lngLowerTotal And _

                         lngUpperMergePosition <= lngUpperTotal

                    lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)

                    lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)

                    If lngLowerValue >= lngUpperValue Then

                        lngIndexedSums(lngIndexedPosition) = lngLowerValue

                        lngLowerMergePosition = lngLowerMergePosition + 1

                    Else

                        lngIndexedSums(lngIndexedPosition) = lngUpperValue

                        lngUpperMergePosition = lngUpperMergePosition + 1

                    End If

                    lngIndexedPosition = lngIndexedPosition + 1

                Loop

                If lngLowerMergePosition <= lngLowerTotal Then

                    'Add remaining lower target values

                    For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                End If

                If lngUpperMergePosition <= lngUpperTotal Then

                    'Add remaining upper target values

                    For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                End If

                'Determine next upper/lower targets

                lngLowerTarget = lngLowerTarget - 1

                lngUpperTarget = lngUpperTarget + 1

                If lngLowerTarget < 3337 Or lngUpperTarget > 3454 Then

                    If lngLowerTarget < 3337 Then

                        If lngUpperTarget <= 3454 Then

                            'Upper target(s) still exist so add values

                            lngTarget = lngUpperTarget

                        Else

                            'Both outside limits so exit loop

                            lngTarget = -1

                        End If

                    Else

                        lngTarget = lngLowerTarget

                    End If

                End If

            End If

            Console.CursorLeft = 0

            Console.CursorTop = 5

            Console.Write("Total Indexed Sums: " + lngIndexedPosition.ToString)

        Loop

        Console.Read()

    End Sub
 

    Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long

        Static lngPower() As Long

        Static IsInitialized As Boolean

        Dim lngLoop As Long

        Dim lngSum As Long

        Dim lngUbound As Long
 

        lngUbound = UBound(parmItems)

        If IsInitialized Then

        Else

            ReDim lngPower(0 To lngUbound)

            For lngLoop = 0 To lngUbound

                lngPower(lngLoop) = 2 ^ lngLoop

            Next

            IsInitialized = True

        End If
 

        For lngLoop = lngUbound To 0 Step -1

            If (parmMask And lngPower(lngLoop)) <> 0 Then

                lngSum = lngSum + parmItems(lngLoop)

                If lngSum > parmSumLimit Then

                    SumItStatic = lngSum

                    Exit Function

                End If

            End If

        Next

        SumItStatic = lngSum

    End Function

End Module

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I think the following (snippet) is the cause of the problem.  You need to set lngTarget to a non-zero value.

Since I'm not running this as a console application, is it possible for us to compromise on your reliance on the Console for your diagnostic messages?

====================
One of the things in my code is the use of a Single data type for the Target value.  I'm currently looking at your code that fills the array and the variables used in that section of code.


            Case 2 '.5

                lngTarget = 0

Open in new window

0
 

Author Comment

by:choelt
Comment Utility
The way I'm using that variable, the lngTarget value should be set to 0.  This is a flag to denote there is no actual integer target so start merging values from lower and upper range.  Like I said the code works fine using the original 23 item set.

And yes, running as a console app sacrifices the debug window.  So I guess in that case we need to just run it as a standard app.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
Then I'll look at the hard-coded values in your code as the most likely culprit.
With the 31 items you posted, the range should be: 5904 To 6059



0
 

Author Comment

by:choelt
Comment Utility
Yes it was the hard coded values.  I don't know why I couldn't find that earlier.  I have changed the rest of the hard coded values to dynamic values based on original number of items in data.  The only hard coded value is the lngItems Array bounds.  I posted the latest code along with a change to your code I think is required in the main candidate sum calculation iteration.  Correct me if I'm wrong.
For lngEnumerate = (2 ^ 31) - 1 To 1 Step -1

       lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)

       Select Case lngSum

           Case LowerBound To UpperBound

               CandidateSums(lngSum).Add(lngLoop)

               **************CHANGED TO****************

               CandidateSums(lngsum).Add(lngEnumerate)

               ***************************************

               lngCandidateCount = lngCandidateCount + 1

     End Select

Next
 

***************************************************
 
 

Latest Modifications:
 

Option Explicit On
 

Module Module1

    Public lngIndexedSums() As Long
 

    Sub Main()

        Dim Groups() As String
 

        Dim lngItems(0 To 30) As Long

        Dim lngPower(0 To lngItems.GetUpperBound(0)) As Long

        Dim lngEnumerate As Long

        Dim lngLoop As Integer

        Dim lngSum As Integer

        Dim lngCandidateCount As Integer
 

        For lngLoop = 0 To lngItems.GetUpperBound(0)

            lngPower(lngLoop) = 2 ^ lngLoop

        Next

        lngItems(0) = 124

        lngItems(1) = 170

        lngItems(2) = 210

        lngItems(3) = 296

        lngItems(4) = 330

        lngItems(5) = 400

        lngItems(6) = 476

        lngItems(7) = 492

        lngItems(8) = 522

        lngItems(9) = 544

        lngItems(10) = 564

        lngItems(11) = 570

        lngItems(12) = 650

        lngItems(13) = 662

        lngItems(14) = 686

        lngItems(15) = 724

        lngItems(16) = 750

        lngItems(17) = 780

        lngItems(18) = 780

        lngItems(19) = 786

        lngItems(20) = 996

        lngItems(21) = 996

        lngItems(22) = 1074

        lngItems(23) = 1079

        lngItems(24) = 1083

        lngItems(25) = 1239

        lngItems(26) = 1287

        lngItems(27) = 1391

        lngItems(28) = 1396

        lngItems(29) = 1412

        lngItems(30) = 2677
 

        For i As Integer = 0 To lngItems.GetUpperBound(0)

            lngSum += lngItems(i)

        Next

        'calculate lower and upper bounds

        '  * calculate target = sum(lngItems)/4

        '  * if the largest item is > target, it should probably be the upper bound

        '  * otherwise, use the range as target +/- SQRT(target)

        '

        'initialize the sizes and lb arrays

        Dim sizes() As Integer '= {118}

        Dim lb() As Integer '= {3337}

        Dim Target As Single, LowerBound As Integer, UpperBound As Integer

        'Target = lngItems.lngItems.lngItems.Sum / 4

        Target = lngSum / 4

        LowerBound = Target - Math.Sqrt(Target)

        UpperBound = Target + Math.Sqrt(Target)

        Console.WriteLine("Target: " + Target.ToString)

        Console.WriteLine("LowerBound: " + LowerBound.ToString)

        Console.WriteLine("UpperBound: " + UpperBound.ToString)

        Console.WriteLine()
 

        ReDim lb(0)

        ReDim sizes(0)

        lb(0) = LowerBound

        sizes(0) = UpperBound - LowerBound + 1

        Dim CandidateSums As Array = Array.CreateInstance(GetType(Collection), sizes, lb)

        Console.WriteLine("CandidateSums LowerBound(0): " + CandidateSums.GetLowerBound(0).ToString)

        Console.WriteLine("CandidateSums UpperBound(0): " + CandidateSums.GetUpperBound(0).ToString)

        For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)

            CandidateSums(lngLoop) = New Collection

        Next
 

        For lngEnumerate = ((2 ^ (lngItems.GetUpperBound(0) + 1))) - 1 To 1 Step -1

            lngSum = SumItStatic(lngEnumerate, lngItems, UpperBound + 1)

            Select Case lngSum

                Case LowerBound To UpperBound

                    CandidateSums(lngSum).Add(lngEnumerate)

                    lngCandidateCount = lngCandidateCount + 1

                    Console.CursorLeft = 0

                    Console.CursorTop = 7

                    Console.Write("Total Candidates: " + lngCandidateCount.ToString)

            End Select

        Next

        'For lngLoop = CandidateSums.GetLowerBound(0) To CandidateSums.GetUpperBound(0)

        '    Debug.Print(lngLoop & vbTab & CandidateSums(lngLoop).count.ToString)

        'Next
 

        'Count total candidate sums

        'lngSum = 0

        'For lngLoop = LowerBound To UpperBound

        '    lngSum = lngSum + CandidateSums(lngLoop).Count

        'Next
 

        'NOTE: the following array is one larger than requred

        ReDim lngIndexedSums(lngCandidateCount) 'was lngSum
 

        Dim lngIndexedPosition As Long 'Current position to store next item

        Dim lngTarget, lngLowerTarget, lngUpperTarget As Long

        Dim lngLowerTotal, lngUpperTotal As Long

        Dim lngLowerValue, lngUpperValue As Long

        Dim lngItemSumTotal As Long

        For lngLoop = 0 To UBound(lngItems)

            lngItemSumTotal = lngItemSumTotal + lngItems(lngLoop)

        Next
 

        Select Case lngItemSumTotal Mod 4

            Case 0 'exact integer

                lngTarget = lngItemSumTotal / 4

                lngLowerTarget = lngTarget - 1

                lngUpperTarget = lngTarget + 1

            Case 1 '.25

                lngTarget = Math.Round(lngItemSumTotal / 4, 0)

                lngLowerTarget = lngTarget

                lngUpperTarget = lngTarget + 1

            Case 2 '.5

                lngTarget = 0

                lngLowerTarget = Int(lngItemSumTotal / 4)

                lngUpperTarget = lngLowerTarget + 1

            Case 3 '.75

                lngTarget = Math.Round(lngItemSumTotal / 4, 0)

                lngLowerTarget = lngTarget - 1

                lngUpperTarget = lngTarget
 

        End Select
 

        'Debug.Print "MOD:" & Str(lngItemSumTotal Mod 4)

        'Debug.Print "Target: " & Str(lngTarget)

        'Debug.Print "LowerTarget: " & Str(lngLowerTarget)

        'Debug.Print "UpperTarget: " & Str(lngUpperTarget)
 

        Dim lngLowerMergePosition, lngUpperMergePosition As Long

        lngIndexedPosition = 0

        Do While lngTarget <> -1

            If lngTarget = lngLowerTarget Or lngTarget = lngUpperTarget Then

                'MOD 1 or 3 so add all values from single target

                Console.CursorLeft = 0

                Console.CursorTop = 9

                'Console.Write("lngLoop: " + lngLoop.ToString)

                Console.Write("CandidateSums(" + lngTarget.ToString + ") Count: " + CandidateSums(lngTarget).count.ToString)

                For lngLoop = 1 To CandidateSums(lngTarget).Count

                    'Console.CursorLeft = 0

                    'Console.CursorTop = 6

                    'Console.Write("lngLoop: " + lngLoop.ToString)

                    'Console.Write("CandidateSums(" + lngTarget.ToString + ") Count: " + CandidateSums(lngTarget).count.ToString)

                    lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

                    lngIndexedPosition = lngIndexedPosition + 1

                Next

                'Alternate to next target

                If lngTarget = lngLowerTarget Then

                    lngLowerTarget = lngTarget - 1

                    If lngUpperTarget <= UpperBound Then

                        lngTarget = lngUpperTarget

                    Else

                        If lngLowerTarget >= LowerBound Then

                            lngTarget = lngLowerTarget

                        Else

                            lngTarget = -1

                        End If

                    End If

                Else

                    lngUpperTarget = lngTarget + 1

                    If lngLowerTarget >= LowerBound Then

                        lngTarget = lngLowerTarget

                    Else

                        If lngUpperTarget <= UpperBound Then

                            lngTarget = lngUpperTarget

                        Else

                            lngTarget = -1

                        End If

                    End If

                End If

            Else

                'MOD 0 or 2

                If lngTarget > 0 Then 'MOD 0 so add exact target values first

                    For lngLoop = 1 To CandidateSums(lngTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                    lngTarget = 0 'Start merging

                End If

                'Merge largest items from lower & upper target groups

                lngLowerMergePosition = 1

                lngUpperMergePosition = 1

                lngLowerTotal = CandidateSums(lngLowerTarget).Count

                lngUpperTotal = CandidateSums(lngUpperTarget).Count

                Do While lngLowerMergePosition <= lngLowerTotal And _

                         lngUpperMergePosition <= lngUpperTotal

                    lngLowerValue = CandidateSums(lngLowerTarget).Item(lngLowerMergePosition)

                    lngUpperValue = CandidateSums(lngUpperTarget).Item(lngUpperMergePosition)

                    If lngLowerValue >= lngUpperValue Then

                        lngIndexedSums(lngIndexedPosition) = lngLowerValue

                        lngLowerMergePosition = lngLowerMergePosition + 1

                    Else

                        lngIndexedSums(lngIndexedPosition) = lngUpperValue

                        lngUpperMergePosition = lngUpperMergePosition + 1

                    End If

                    lngIndexedPosition = lngIndexedPosition + 1

                Loop

                If lngLowerMergePosition <= lngLowerTotal Then

                    'Add remaining lower target values

                    For lngLoop = lngLowerMergePosition To CandidateSums(lngLowerTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngLowerTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                End If

                If lngUpperMergePosition <= lngUpperTotal Then

                    'Add remaining upper target values

                    For lngLoop = lngUpperMergePosition To CandidateSums(lngUpperTarget).Count

                        lngIndexedSums(lngIndexedPosition) = CandidateSums(lngUpperTarget).Item(lngLoop)

                        lngIndexedPosition = lngIndexedPosition + 1

                    Next

                End If

                'Determine next upper/lower targets

                lngLowerTarget = lngLowerTarget - 1

                lngUpperTarget = lngUpperTarget + 1

                If lngLowerTarget < LowerBound Or lngUpperTarget > UpperBound Then

                    If lngLowerTarget < LowerBound Then

                        If lngUpperTarget <= UpperBound Then

                            'Upper target(s) still exist so add values

                            lngTarget = lngUpperTarget

                        Else

                            'Both outside limits so exit loop

                            lngTarget = -1

                        End If

                    Else

                        lngTarget = lngLowerTarget

                    End If

                End If

            End If

            Console.CursorLeft = 0

            Console.CursorTop = 8

            Console.Write("Total Indexed Sums: " + lngIndexedPosition.ToString)

        Loop

        Console.Read()

    End Sub
 

    Function SumItStatic(ByVal parmMask As Long, ByRef parmItems() As Long, ByVal parmSumLimit As Long) As Long

        Static lngPower() As Long

        Static IsInitialized As Boolean

        Dim lngLoop As Long

        Dim lngSum As Long

        Dim lngUbound As Long
 

        lngUbound = UBound(parmItems)

        If IsInitialized Then

        Else

            ReDim lngPower(0 To lngUbound)

            For lngLoop = 0 To lngUbound

                lngPower(lngLoop) = 2 ^ lngLoop

            Next

            IsInitialized = True

        End If
 

        For lngLoop = lngUbound To 0 Step -1

            If (parmMask And lngPower(lngLoop)) <> 0 Then

                lngSum = lngSum + parmItems(lngLoop)

                If lngSum > parmSumLimit Then

                    SumItStatic = lngSum

                    Exit Function

                End If

            End If

        Next

        SumItStatic = lngSum

    End Function

End Module

Open in new window

0
 

Author Comment

by:choelt
Comment Utility
aikimark

Don't mean to rush, but how is that next section coming?

About your comments:
***************************************************
Two thoughts I had this weekend:
* we might be able to use the largest item as the upper limit if it exceeds our target value.  Not sure of this assumption, so I didn't include it in the snippet.
* we might tighten the range from: target +/-SQRT(target) to something that looks at the smallest values in the list.  For instance, if I take the largest item of the subset that will not exceed the target and use its SQRT(), I get a much smaller range = 22.85.  I also played with the difference between the largest of the small item subset and the smallest item SQRT = 19.95.  While we do not have a lot of candidate sums with our 23-item sample set, trimming the range will result in faster performance of the SumItStatic() function -- fewer iterations as well as memory footprint reductions for the largest sets.
**************************************************

In my head, I had questioned how to handle the large items (items larger than the target value).  If the item is much larger than the target value, then it will have to stand alone.  Hypothetically, let's say we have the following data set:

4
5
8
10
11
11
12
12
13
42

SUM = 128, MEAN = 12.8, TARGET = 128/4 = 32  SQRT TARGET = 5.66
UPPERLIMIT = 38  LOWERLIMIT = 26

In this case 42 is not going to be included in the CandidateSums collection.  If we make the UPPERLIMIT = 42, then this will increase the total number of iterations required to find the best grouping considerably - right?  In this case should we throw out 42 and make it a stand alone group.  So now we need to determine the closest 3 group matching totals with the remaining 9 data items as follows:

SUM = 86, MEAN = 9.56, TARGET = 86/3 = 28.67, SQRT TARGET = 5.35
UPPERLIMIT = 34   LOWERLIMIT = 23

In the case of 2 (although highly unlikely) very large items, then we would repeat the above process of making it a stand alone group.

So after calculating the UPPERLIMIT, any individual value greater than that should be thrown out and stand alone.  New limits will then be calculated until all individual values are within the UPPERLIMIT.  Would this work?
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I'm cutting in your corrective code with what I'd been doing.

In answer to your question, the code I'll post calculates the Sum/4 target value and then calculates the range as +/- SQRT(largest of the small items)


* the small items are summed until their sum is greater than the target.  Then the immediately prior item is used as the largest of the small items.

The upperbound value of the range is compared to the largest item in the list.  If the largest item is greater than the calculated upperbound, the upperbound is reset to the largest item.

In the example set you posted:
Sum(items) = 128
target = 32
sum(4,5,8,10,11) = 38
sum(4,5,8,10) = 27
SQRT(10) = 3.16
CalculatedRange= [28.83772234 To 35.16227766]

Since 42 > 35.16227766, the range is adjusted to [28.83772234 To 42]

I calculated a (potential) best fit of this sample set by hand as:


(4,12,13) => 29

(5,11,12) => 28

(8,10,11) => 29

(42)      => 42

Open in new window

0
 

Author Comment

by:choelt
Comment Utility
aikimark:

***********************
In answer to your question, the code I'll post calculates the Sum/4 target value and then calculates the range as +/- SQRT(largest of the small items)
********************************

That will tighten the range considerably now that I understand what  you meant by (largest of the small items).  Coincidentally, that value seems to be close to SQRT(standard deviation) for several different data sets I evaluated.

0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I've been having some system problems and am in the process of switching systems.  Seems like one step forward and two steps back.  :-)

I know what I want to do, but I have to code it and test it.  I appreciate your patience.

What has happened with the other person looking at this problem in your organization?
0
 

Author Comment

by:choelt
Comment Utility
aikimark:

Sorry to hear about the problems.  I know that can be very frustrating.

I'm a fairly patient guy, but unfortunately the persons who will be using this are becoming a little anxious for it.  This is a relatively new process and manually dividing the data sets into 4 equal groups is proving to be quite tedious.

The other party involved here is strictly working on the pre-phase of the process which involves collecting of data and sample(s) eliminations, etc.  This is being done using Access.  They would like for me to be able to interface with this database and retrieve the final data sets as they come available.  I'm very comfortable working with Access databases in VB so this should not be a problem.  I just need your continued expertise with the algorithm code.  Let me know if there is anything I can do at this point to help.
0
 

Author Comment

by:choelt
Comment Utility
aikimark:

Again - I don't mean to push, but how is it coming?  Are you going to be able to code the next section?

If it doesn't look like your going to be able to, then could you give me all of the details on your plan for iterating through the sorted array to find the best grouping.  I'm going to have some downtime towards the end of this week to spend on this project.  I could at least be attempting to code it which would help me to better understand the process.

Thanks
0
 

Author Comment

by:choelt
Comment Utility
aikimark:

I'm receiving emails informing me that I have open questions and should be actively posting.  Do you have things with your system back to normal yet?  I'm spending my time in a hospital waiting room so it can be worse. :)  I've come up with a plan for determining the best grouping and will be working on some code over the next couple of days myself, but I'm going to have to relearn some things from VB6 to VB.NET as I go.  I'm just going to try to get something to work like I want it and worry about speed optimization later.
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
I have the metacode, but haven't typed it into VS.Net
0
 

Author Closing Comment

by:choelt
Comment Utility
aikimark:
Thanks for your help getting me on the right track with this algorithm.  I haven't finished it yet, but I do have something that works about half of the time.  I know what needs to change.  I'm currently working on using a more object oriented approach with generic collections of candidates.  This is starting to look promising, but I have many more hours ahead for testing and tweaking.

Anyways, I wanted to award you the 500 points for your help thus far before the question was closed by a moderator.  I realize your busy and I really appreciate you taking the time to steer me in the right direction.  It looks like I'll be able to finish it from here.  Thanks again!
0
 
LVL 45

Expert Comment

by:aikimark
Comment Utility
thank you for the points.

would it be helpful, hurtful, or a waste of time if I posted my metacode?
0
 

Author Comment

by:choelt
Comment Utility
I have changed up the code so much at this point that I probably wouldn't use any of it, but it still may be helpful to see what you had in mind.  All-in-all I have a program that works very well with data sets smaller than 35.  I have also made huge strides in speed optimization.  I'm receiving out of memory exceptions with data sets upwards of 35-40 items.  I'm working on ways to reduce the memory footprint.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Okay. So what exactly is the problem here? How often have we come across situations where we need to know if two strings are 'similar' but not necessarily the same? I have, plenty of times. Until recently, I thought any functionality like that wo…
The greatest common divisor (gcd) of two positive integers is their largest common divisor. Let's consider two numbers 12 and 20. The divisors of 12 are 1, 2, 3, 4, 6, 12 The divisors of 20 are 1, 2, 4, 5, 10 20 The highest number among the c…
This video discusses moving either the default database or any database to a new volume.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now