Suppose we have N bins where N is maybe as large as two or three thousand. We have M items and M is as large as 5*N.

Every item must belong to one and only one bin (to keep things simple <g>), no bin is empty, a bin can hold any number of items but typically would hold 1,2,3...about 10 at maximum.

We ONLY know the characteristics of the bins and the items, and we wish to solve for the allocation of items to bins ie for every item, which bin does it belong to? We can accept some errors .. that is, a small number of misallocation of items to bins.

We are lucky in that we have TWO characteristics for each item, cube and weight (both integers), and the corresponding totals for each bin. Cube and weight vary very widely and at random, so you won't find many duplicates on these parameters at the item or the bin level. Cube and weight are not highly correlated.

OK, how do we do this.?

I hope it is not integer programming or linear programming because

a) I have no idea how to implement such things (in Delphi)

b) I am under the impression that such approaches are slow for large numbers of variables

c) I would like, eventually, to extend the solution to a situation where the variables (cube and weight) are measured with error

d) I have a suspicion that because we have two "dependent" variables, a solution should be easier to find .. and maybe I could have more in the future (ie change how things are measured, again maybe with error)

e) and in the future I would like some sort of estimate as to how likley it is that a particular item belongs to a bin (maybe that is too big an ask)

I suppose I should add that there are no items with very small values of cube and weight .. that is, you don't have to consider solutions where a "top up" can come from lots of small items (I guess this is also implied by the restriction that bins do not have more than about 10 items)

There are a couple of important pieces of information missing from your question:

How do you define or recognize a good solution?
And how good a solution do you need?

What criteria do you use to determine which bin an item "belongs to"?

Does the best solution try to make the weight, volume, and number of items in each bin the same?

If so, you know what you need to pack and you have the data for the bins, so you can calculate the weight, volume, and number of items that each bin needs.

5K bins/25K inems are not particularly large numbers. How fast a solution do you need?

If you could look at the bins as a kind of hash table but don't worry about collisions. Generate the hash key from one or both the numbers and allocate the item to the bin of the hash. The trick will be to pick a suitable hash function.

Good question. We can never know if we have the correct solution, since what we are trying to figure out is which items are in which bins. We can define a good solution as being one where there are ZERO differences between the known aggregate weight and volume and the corresponding imputed quantities for each bin .. in limited tests (brute force on a toy problem) this seems to find the correct answer. Unfortunately, as soon as we relax the criterion (eg to be a minimum sums of squares of differences) it seems we are unlilkey to get the correct answer in any reasonable number of iterations .. coplete brute force to get to zero differences seems like the only way, and that will be impractical.

And how good a solution do you need?

see above. It is important that the items be correctly imputed to fall in the bin in which they do in fact fall.

What criteria do you use to determine which bin an item "belongs to"?

see above.

Does the best solution try to make the weight, volume, and number of items in each bin the same?

no. We are not packing, we are trying to work backwards from bins (where we know the total weight and volume) to
impute which items are in that bin

5K bins/25K inems are not particularly large numbers. How fast a solution do you need?

not fast.

I am becoming convinced that this is "mission impossible". If it could be divided into subproblems, then complete enumeration might be possible, but approximate solutions are unlikely to correctly identify the elements in the bins (maybe a consequence of the central limit theorem).

It's equivalent to the travelling salesman problem. The only way to get a solution in every possible case is to evaluate every possible solution. Particularly if there are a lot of items close to the size of the bin divided by the average number of objects in each bin. If you start trying to fill one bin, then go onto the next, you'll probably end up with a few items left over. To then find a spot to put them, you may need to rearrange literally every other bin.

You may stumble on a "good" solution in some cases, but not in all. Although 25K items is not a big number these days, if you need to try every possible set of distributions, it's an enormous number.

d-glitch and JR2003 .. I don't see the relevance of hashing (although that article on Hash Sort was very interesting).
Sure, I could hash to put items into bins, but when the (proposed) bin totals don't match what I have (ie the known bin totals), what do I do then ? Hash differently?

Perhaps I am not being clear. We KNOW the bin totals, we KNOW the item weights, we want to DISCOVER which items are in which bin.

I think gbentley has it right.. that full enumeration is required.

Sorry, I thought you saying you had to put the items in any bin as long as you knew where to retrieve them from.

Are you saying that each bin is marked with the weight of its contents and the volume of its contents and you have to put the items in the bins so they exactly match the weight and volume marked on the bin?

0

Mutley2003Author Commented:

JR2000 ..
a) yes, each bin is already packed and is marked with the weight and volume of its contents
b) you know the weight and volume of each item (say you have a manifest of these)
c) you don't know which items are in which bin
d) because the bins vary markedly in weight and volume, as do the items, you figure that you might be able to deduce which items are in each bin

here is some output from a test program I just developed

these are the bins we constructed (4 bins, 5 items in each)

Now, from the bin total weights and total cube,
and the known item weights and cubes, we are trying to put together
which items go to make up each bin.

The algorithm is brute force, the criterion is the sums
of squared errors between the test totals and the known totals.

You can see that in the first iteration, when we are trying to
find the makeup of bin 1 (which has members 1 2 3 4 and totals of (777,958))
that we impute the members are 4,9,12,19 with totals of (962,759).
Eventually we get the right answer (about 10000 random trials) but all
suboptimal solutions are wildly wrong ie contain few accurate guesses
about the membership.

The algorithm can be improved, and it is possible that more constraints
(eg matches on some dimension other than weight and cube
- cannot think of anything sensible, though) would help, but it seems
unlikely that it would ever scale properly.

trying to match .. 777 958 unknown members are: 1 2 3 4
best so far .. 962 759 imputed members= 9 19 4 12
best so far .. 639 785 imputed members= 16 10 5 13
best so far .. 769 1080 imputed members= 11 20 12 5
best so far .. 719 887 imputed members= 9 10 15 3
best so far .. 776 1020 imputed members= 1 3 16 15
best so far .. 769 942 imputed members= 12 5 11 4
best so far .. 780 952 imputed members= 15 16 2 14
best so far .. 777 958 imputed members= 4 1 3 2
errorCriterion== 0
trying to match .. 854 878 unknown members are: 5 6 7 8
best so far .. 615 660 imputed members= 5 20 15 9
best so far .. 836 917 imputed members= 8 2 7 20
best so far .. 860 861 imputed members= 9 6 14 20
best so far .. 861 894 imputed members= 10 7 6 8
best so far .. 857 871 imputed members= 1 8 19 4
best so far .. 854 878 imputed members= 5 8 7 6
errorCriterion== 0
trying to match .. 601 997 unknown members are: 9 10 11 12
best so far .. 1078 853 imputed members= 9 12 19 6
best so far .. 795 723 imputed members= 9 20 12 5
best so far .. 894 986 imputed members= 3 15 20 10
best so far .. 522 752 imputed members= 10 8 9 1
best so far .. 512 1181 imputed members= 15 1 14 11
best so far .. 561 1180 imputed members= 2 19 7 5
best so far .. 703 946 imputed members= 10 4 11 8
best so far .. 663 925 imputed members= 4 18 7 3
best so far .. 604 959 imputed members= 7 12 5 1
best so far .. 600 1011 imputed members= 14 7 10 4
best so far .. 590 991 imputed members= 1 19 15 10
best so far .. 607 996 imputed members= 2 11 10 13
best so far .. 602 997 imputed members= 3 9 2 1
best so far .. 601 997 imputed members= 10 12 9 11
errorCriterion== 0
trying to match .. 930 785 unknown members are: 13 14 15 16
best so far .. 1451 741 imputed members= 17 13 3 12
best so far .. 670 726 imputed members= 12 5 2 18
best so far .. 1041 923 imputed members= 12 5 14 8
best so far .. 926 895 imputed members= 12 10 8 7
best so far .. 944 778 imputed members= 5 6 2 8
best so far .. 931 790 imputed members= 18 8 3 15
best so far .. 930 785 imputed members= 15 16 14 13
errorCriterion== 0

If you need an exact fit, and the weights and volumes are bounded, you can reduce it to a standard 1-D bin packing problem by fitting a*weight+b*volume
for a,b chosen so that a*weight+b*volume is always unique
You can do this in pseudo-polynomial time with a branch and bound algorithm

For an approximation algorithm, we'd need to know what counts as close,
transforming the problem in the above manner may not transform the closeness criterion
in a way that lets standard bin-packing approximation algorithms find close solutions to your problem when they find close solutions to the transformed problem.

ozo, that transform idea is interesting (although I don't at present have an idea as to how we could find a and b .. maybe brute force, maybe something in one of the construction of perfect hashes). Anyway, assume we can find a and b such that
J = a*weight +b*volume
and J takes on unique integer values.

The already filled bins (that is all of them) are treated as if they have CAPACITIES of J1, J2..Jn . So then we proceed to try out 1D bin packing solutions until we get a perfect fit.

Is that what you are saying?

But I don't see where "pseudo-polynomial time" comes in .. even with early exit (ie accept the first best fit) there is still the possibility of complete enumeration?

If you know weight < m then a=1, b>m will give unique values.
or if LCM(a,b) > max(weight*volume) them you should have unique values

The best known NP-Complete problems are exponential in the size of the problem,
the size of a problem containing a number n can be O(log n)
An algorithm that is polynomial in (n) is called pseudo-polynomial. it can still be exponential in (log n)
This may not help you if log n is still large.

I was actually thinking of a dynamic programming algorithm for knapsack,
but Sum of Squares is a pseudo-polynomial time algorithm for bin packing.

0

Mutley2003Author Commented:

thanks ozo, I can get started with these ideas.

and thanks to everyone else.. I will split points, OK?

0

Featured Post

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.