Set theory algorithm question

tjgquicken used Ask the Experts™
I'm trying to find a relatively efficient algorithm to solve the following problem:
I have a finite set S = {s_1, s_2, s_3,...}, where each s_i is itself a non-empty finite set. I need to find a subset of S -- call it T -- where the number of elements in the union over all elements in T is equal to the number of elements in T itself.

For example, if I have the set S = {{1,2,8}, {3,5}, {1,3,8}, {2,5,7}, {1,2,5,8}, {3,8}, {2,7}, {7}}, then the algorithm should return the set T = {{3,5}, {2,5,7}, {2,7}, {7}} because T has 4 elements and union({3,5}, {2,5,7}, {2,7}, {7}) also has 4 elements.

The only way I can think of solving this problem is examining each element of the power set of S to see if it matches the criteria, but that algorithm runs in exponential time. Can anybody think of a more efficient way to solve this problem, or point me in the right direction? Thanks.
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Something in the back of my mind tells me Dynamic programming might work here but ultimately I think this is exponential and the power set idea is probably as good as you are going to get.

This is not a trivial problem
I don't see a general solution either (as GwynforWeb said, this isn't trivial), but if you only need a solution and not all, then you might apply some heuristics:

• Reject potential sets where any any card(s_i) > card(t)
• start with the lowest card(t), yielding {{7}} as solution (and then {{2,7},{7}} for example)
• arrange sets in decreasing cardinal order, prune as soon as the cumulative union has card(u) > card(t), and eliminate potential sets starting the same with equal or lower cardinality.

If you need all possible answers, I still feel there should be ways to optimise the search. If you can arrange all potential sets in a binary tree, there should be nodes where you can prune the search, thus eliminating entire branches. For example:

    S = {{1,2,3,4},{1,2},{2,3}}

All sets containing s_1 can be eliminated without exploring them in more detail. I don't see the tree, however...

Although I haven't reduced the complexity, here are some suggestions to improve performance.
Store for each s_i:
      Range(s_i)  ==  (Min_i, Max_i)
When considering  (s_i U s_j) , if the ranges do not overlap, then immediately you know that
      card( s_i  U  s_j )  =  card( s_i )  +  card( s_j ).
If the ranges do overlap, then to determine the intersection, only the overlapped range need be considered.

Some partial matrix can be pre-computed to also improve performance. Some candidates are:
      card( s_i U s_j )  and possibly    card( s_i Intersect s_j )
For your example...
Other solutions where T has 5 elements are:
{3,5}, {3,8}, {1,2,8}, {1,3,8}, {1,2,5,8}
{7},    {2,7}, {3,5},    {3,8},    {2,5,7}

A solution where T has 6 elements is:
{7},    {2,7}, {3,5},    {3,8},    {2,5,7}, {1,2,8},

Sometimes, there is only one solution. For example:
S = { {2,6} {1,7} {1,2,6} {2,6,7} }
Only solution is: T = S
... thinking about it.  Just curious, what is the motivation for solving the problem?  Where the S has got the content?  Is it just a theoretical work or does it solve a real problem?  Possibly the chance is to approach the solution from a different direction.
Can you explain what are the limits of the task? What are the bigest elements in the subsets? (It is the 8 in the question text, but generally...).  What is the usual cardinality of S in the solved problems?  Are there any constraints that would make the task less general and possibly easier to solve?

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial