# Set theory algorithm question on
I'm trying to find a relatively efficient algorithm to solve the following problem:
I have a finite set S = {s_1, s_2, s_3,...}, where each s_i is itself a non-empty finite set. I need to find a subset of S -- call it T -- where the number of elements in the union over all elements in T is equal to the number of elements in T itself.

For example, if I have the set S = {{1,2,8}, {3,5}, {1,3,8}, {2,5,7}, {1,2,5,8}, {3,8}, {2,7}, {7}}, then the algorithm should return the set T = {{3,5}, {2,5,7}, {2,7}, {7}} because T has 4 elements and union({3,5}, {2,5,7}, {2,7}, {7}) also has 4 elements.

The only way I can think of solving this problem is examining each element of the power set of S to see if it matches the criteria, but that algorithm runs in exponential time. Can anybody think of a more efficient way to solve this problem, or point me in the right direction? Thanks.
Comment
Watch Question

Do more with EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Commented:
Something in the back of my mind tells me Dynamic programming might work here but ultimately I think this is exponential and the power set idea is probably as good as you are going to get.

This is not a trivial problem
Commented:
I don't see a general solution either (as GwynforWeb said, this isn't trivial), but if you only need a solution and not all, then you might apply some heuristics:

• Reject potential sets where any any card(s_i) > card(t)
• start with the lowest card(t), yielding {{7}} as solution (and then {{2,7},{7}} for example)
• arrange sets in decreasing cardinal order, prune as soon as the cumulative union has card(u) > card(t), and eliminate potential sets starting the same with equal or lower cardinality.

If you need all possible answers, I still feel there should be ways to optimise the search. If you can arrange all potential sets in a binary tree, there should be nodes where you can prune the search, thus eliminating entire branches. For example:

S = {{1,2,3,4},{1,2},{2,3}}

All sets containing s_1 can be eliminated without exploring them in more detail. I don't see the tree, however...

(°v°)
Commented:
Although I haven't reduced the complexity, here are some suggestions to improve performance.
Store for each s_i:
card(s_i)
Range(s_i)  ==  (Min_i, Max_i)
When considering  (s_i U s_j) , if the ranges do not overlap, then immediately you know that
card( s_i  U  s_j )  =  card( s_i )  +  card( s_j ).
If the ranges do overlap, then to determine the intersection, only the overlapped range need be considered.

Some partial matrix can be pre-computed to also improve performance. Some candidates are:
card( s_i U s_j )  and possibly    card( s_i Intersect s_j )
===
Other solutions where T has 5 elements are:
{3,5}, {3,8}, {1,2,8}, {1,3,8}, {1,2,5,8}
{7},    {2,7}, {3,5},    {3,8},    {2,5,7}

A solution where T has 6 elements is:
{7},    {2,7}, {3,5},    {3,8},    {2,5,7}, {1,2,8},

===
Sometimes, there is only one solution. For example:
S = { {2,6} {1,7} {1,2,6} {2,6,7} }
Only solution is: T = S

Commented:
... thinking about it.  Just curious, what is the motivation for solving the problem?  Where the S has got the content?  Is it just a theoretical work or does it solve a real problem?  Possibly the chance is to approach the solution from a different direction.

Commented:
Can you explain what are the limits of the task? What are the bigest elements in the subsets? (It is the 8 in the question text, but generally...).  What is the usual cardinality of S in the solved problems?  Are there any constraints that would make the task less general and possibly easier to solve?

Do more with 