# Reverse bin packing

Hi all

I am hoping this is a known problem.

Suppose we have N bins where N is maybe as large as two or three thousand. We have M items and M is as large as 5*N.

Every item must belong to one and only one bin (to keep things simple <g>), no bin is empty, a bin can hold any number of items but typically would hold 1,2,3...about 10 at maximum.

We ONLY know the characteristics of the bins and the items, and we wish to solve for the allocation of items to bins ie for every item, which bin does it belong to? We can accept some errors .. that is, a small number of misallocation of items to bins.

We are lucky in that we have TWO characteristics for each item, cube and weight (both integers), and the corresponding totals for each bin. Cube and weight vary very widely and at random, so you won't find many duplicates on these parameters at the item or the bin level. Cube and weight are not highly correlated.

OK, how do we do this.?

I hope it is not integer programming or linear programming because

a)  I have no idea how to implement such things (in Delphi)

b)  I am under the impression that such approaches are slow for large numbers of variables

c) I would like, eventually, to extend the solution to a situation where the variables (cube and weight) are measured with error

d) I have a suspicion that because we have two "dependent" variables, a solution should be easier to find .. and maybe I could have more in the future (ie change how things are measured, again maybe with error)

e) and in the future I would like some sort of estimate as to how likley it is that a particular item belongs to a bin (maybe that is too big an ask)

Is a GA going to work?

ok, over to the experts
###### Who is Participating?

Commented:
If you need an exact fit, and the weights and volumes are bounded, you can reduce it to a standard  1-D bin packing problem by fitting a*weight+b*volume
for a,b chosen so that a*weight+b*volume is always unique
You can do this in pseudo-polynomial time with a branch and bound algorithm

For an approximation algorithm, we'd need to know what counts as close,
transforming the problem in the above manner may not transform the closeness criterion
in a way that lets standard bin-packing approximation algorithms find close solutions to your problem when they find close solutions to the transformed problem.
0

Author Commented:
I suppose I should add that there are no items with very small values of cube and weight .. that is, you don't have to consider solutions where a "top up" can come from lots of small items (I guess this is also implied by the restriction that bins do not have more than about 10 items)
0

Commented:
There are a couple of important pieces of information missing from your question:

How do you define or recognize a good solution?
And how good a solution do you need?

What criteria do you use to determine which bin an item "belongs to"?

Does the best solution try to make the weight, volume, and number of items in each bin the same?

If so, you know what you need to pack and you have the data for the bins, so you can calculate the weight, volume, and number of items that each bin needs.

5K bins/25K inems are not particularly large numbers.  How fast a solution do you need?

0

Commented:
If you could look at the bins as a kind of hash table but don't worry about collisions. Generate the hash key from one or both the numbers and allocate the item to the bin of the hash. The trick will be to pick a suitable hash function.

http://en.wikipedia.org/wiki/Hash_table

http://en.wikipedia.org/wiki/Associative_array

0

Author Commented:

How do you define or recognize a good solution?

Good question. We can never know if we have the correct solution, since what we are trying to figure out is which items are in which bins. We can define a good solution as being one where there are ZERO differences between the known aggregate weight and volume and the corresponding imputed quantities for each bin .. in limited tests (brute force on a toy problem) this seems to find the correct answer. Unfortunately, as soon as we relax the criterion (eg to be a minimum sums of squares of differences) it seems we are unlilkey to get the correct answer in any reasonable number of iterations .. coplete brute force to get to zero differences seems like the only way, and that will be impractical.

And how good a solution do you need?

see above. It is important that the items be correctly imputed to fall in the bin in which they do in fact fall.

What criteria do you use to determine which bin an item "belongs to"?

see above.

Does the best solution try to make the weight, volume, and number of items in each bin the same?

no. We are not packing, we are trying to work backwards from bins (where we know the total weight and volume) to
impute which items are in that bin

5K bins/25K inems are not particularly large numbers.  How fast a solution do you need?

not fast.

I am becoming convinced that this is "mission impossible". If it could be divided into subproblems, then complete enumeration might be possible, but approximate solutions are unlikely to correctly identify the elements in the bins (maybe a consequence of the central limit theorem).

Thanks for the insights

0

Commented:
It's equivalent to the travelling salesman problem. The only way to get a solution in every possible case is to evaluate every possible solution. Particularly if there are a lot of items close to the size of the bin divided by the average number of objects in each bin. If you start trying to fill one bin, then go onto the next, you'll probably end up with a few items left over. To then find a spot to put them, you may need to rearrange literally every other bin.

You may stumble on a "good" solution in some cases, but not in all. Although 25K items is not a big number these days, if you need to try every possible set of distributions, it's an enormous number.

Regards
0

Commented:
Hash Sorting methods are widely used and pretty well understood.
You should be able to find lots of info and proabably even code.

http://www.citebase.org/cgi-bin/citations?id=oai:arXiv.org:cs/0408040
0

Commented:
Sorry JR2003.
You were way ahead of me.
0

Author Commented:

d-glitch and JR2003 .. I don't see the relevance of hashing (although that article on Hash Sort was very interesting).
Sure, I could hash to put items into bins, but when the (proposed) bin totals don't match what I have (ie the known bin totals), what do I do then ? Hash differently?

Perhaps I am not being clear. We KNOW the bin totals, we KNOW the item weights, we want to DISCOVER which items are in which bin.

I think gbentley has it right.. that full enumeration is required.

thanks, people
0

Commented:
Sorry, I thought you saying you had to put the items in any bin as long as you knew where to retrieve them from.

Are you saying that each bin is marked with the weight of its contents and the volume of its contents and you have to put the items in the bins so they exactly match the weight and volume marked on the bin?

0

Author Commented:
JR2000 ..
a) yes, each bin is already packed and is marked with the weight and volume of its contents
b) you know the weight and volume of each item (say you have a manifest of these)
c) you don't know which items are in which bin
d) because the bins vary markedly in weight and volume, as do the items, you figure that you might be able to deduce which items are in each bin

here is some output from a test program I just developed

these are the bins we constructed (4 bins, 5 items in each)

** BIN ***** total weight, total cube, itemcount  777  958  4
itemID
1   weight=  32   cube=  237
2   weight=  170   cube=  224
3   weight=  336   cube=  423
4   weight=  239   cube=  74

** BIN ***** total weight, total cube, itemcount  854  878  4
itemID
5   weight=  72   cube=  229
6   weight=  355   cube=  168
7   weight=  80   cube=  324
8   weight=  347   cube=  157

** BIN ***** total weight, total cube, itemcount  601  997  4
itemID
9   weight=  64   cube=  113
10   weight=  79   cube=  245
11   weight=  38   cube=  470
12   weight=  420   cube=  169

** BIN ***** total weight, total cube, itemcount  930  785  4
itemID
13   weight=  320   cube=  57
14   weight=  202   cube=  368
15   weight=  240   cube=  106
16   weight=  168   cube=  254

** BIN ***** total weight, total cube, itemcount  861  811  4
itemID
17   weight=  375   cube=  92
18   weight=  8   cube=  104
19   weight=  239   cube=  403
20   weight=  239   cube=  212

Now, from the bin total weights and total cube,
and the known item weights and cubes, we are trying to put together
which items go to make up each bin.

The algorithm is brute force, the criterion is the sums
of squared errors between the test totals and the known totals.

You can see that in the first iteration, when we are trying to
find the makeup of bin 1 (which has members 1 2 3 4 and totals of (777,958))
that we impute the members are 4,9,12,19  with totals of (962,759).
Eventually we get the right answer (about 10000 random trials) but all
suboptimal solutions are wildly wrong ie contain few accurate guesses

The algorithm can be improved, and it is possible that more constraints
(eg matches on some dimension other than weight and cube
- cannot think of anything sensible, though) would help, but it seems
unlikely that it would ever scale properly.

trying to match ..  777  958  unknown members are:  1  2  3  4
best so far ..  962  759   imputed members=  9  19  4  12
best so far ..  639  785   imputed members=  16  10  5  13
best so far ..  769  1080   imputed members=  11  20  12  5
best so far ..  719  887   imputed members=  9  10  15  3
best so far ..  776  1020   imputed members=  1  3  16  15
best so far ..  769  942   imputed members=  12  5  11  4
best so far ..  780  952   imputed members=  15  16  2  14
best so far ..  777  958   imputed members=  4  1  3  2
errorCriterion==  0
trying to match ..  854  878  unknown members are:  5  6  7  8
best so far ..  615  660   imputed members=  5  20  15  9
best so far ..  836  917   imputed members=  8  2  7  20
best so far ..  860  861   imputed members=  9  6  14  20
best so far ..  861  894   imputed members=  10  7  6  8
best so far ..  857  871   imputed members=  1  8  19  4
best so far ..  854  878   imputed members=  5  8  7  6
errorCriterion==  0
trying to match ..  601  997  unknown members are:  9  10  11  12
best so far ..  1078  853   imputed members=  9  12  19  6
best so far ..  795  723   imputed members=  9  20  12  5
best so far ..  894  986   imputed members=  3  15  20  10
best so far ..  522  752   imputed members=  10  8  9  1
best so far ..  512  1181   imputed members=  15  1  14  11
best so far ..  561  1180   imputed members=  2  19  7  5
best so far ..  703  946   imputed members=  10  4  11  8
best so far ..  663  925   imputed members=  4  18  7  3
best so far ..  604  959   imputed members=  7  12  5  1
best so far ..  600  1011   imputed members=  14  7  10  4
best so far ..  590  991   imputed members=  1  19  15  10
best so far ..  607  996   imputed members=  2  11  10  13
best so far ..  602  997   imputed members=  3  9  2  1
best so far ..  601  997   imputed members=  10  12  9  11
errorCriterion==  0
trying to match ..  930  785  unknown members are:  13  14  15  16
best so far ..  1451  741   imputed members=  17  13  3  12
best so far ..  670  726   imputed members=  12  5  2  18
best so far ..  1041  923   imputed members=  12  5  14  8
best so far ..  926  895   imputed members=  12  10  8  7
best so far ..  944  778   imputed members=  5  6  2  8
best so far ..  931  790   imputed members=  18  8  3  15
best so far ..  930  785   imputed members=  15  16  14  13
errorCriterion==  0

0

Commented:
You might also look for multiple capacity bin packing algorithms
http://doi.ieeecomputersociety.org/10.1109/ICPP.1999.797428
0

Author Commented:
ozo, that transform idea is interesting (although I don't at present have an idea as to how we could find a and b .. maybe brute force, maybe something in one of the construction of perfect hashes). Anyway, assume we can find a and b such that
J = a*weight +b*volume
and J takes on unique integer values.

The already filled bins (that is all of them) are treated as if they have CAPACITIES of J1, J2..Jn . So then we proceed to try out 1D bin packing solutions until we get a perfect fit.

Is  that what you are saying?

But I don't see where "pseudo-polynomial time" comes in ..  even with early exit (ie accept the first best fit) there is still the possibility of complete enumeration?

0

Commented:
If you know weight < m then a=1, b>m will give unique values.
or if LCM(a,b) > max(weight*volume) them you should have unique values

The best known NP-Complete problems are exponential in the size of the problem,
the size of a problem containing a number n can be O(log n)
An algorithm that is polynomial in (n) is called pseudo-polynomial. it can still be exponential in (log n)

I was actually thinking of a dynamic programming algorithm for knapsack,
but Sum of Squares is a pseudo-polynomial time algorithm for bin packing.
0

Author Commented:
thanks ozo, I can get started with these ideas.

and thanks to everyone else.. I will split points, OK?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.