Solved

SQL Superset subset

Posted on 2011-09-20
12
846 Views
Last Modified: 2012-05-12
I have following table table with 3 millions records where I need help to find Superset Subset relationship.

SetPackage
--------------
SetID int
PackageName varchar(35)

Say I have inserted few sample rows like following:

INSERT into SetPackage  VALUES
(1,'A')
INSERT into SetPackage VALUES
(2,'A')
INSERT into SetPackage VALUES
(2,'B')
INSERT into SetPackage VALUES
(3,'A')
INSERT into SetPackage VALUES
(3,'B')
INSERT into SetPackage VALUES
(3,'C')
INSERT into SetPackage VALUES
(3,'E')
INSERT into SetPackage VALUES
(4,'B')
INSERT into SetPackage VALUES
(4,'C')

I want to know efficient way to know SuperSetID,SubSetID combination from this large amount of data.

I wrote following query but it takes long time and fills tempdb space:

SELECT A.SetID SupersetID, B.SetID SubsetID
FROM SetPackage A, SetPackage B
WHERE A.SetID <> B.SetID
  AND A.Package = B.Package
GROUP BY A.SetID, B.SetID
HAVING NOT EXISTS
(
    SELECT * FROM SetPackage C
    WHERE C.SetID = B.SetID
       AND NOT EXISTS (
        SELECT * FROM SetPackage D
        WHERE D.SetID = A.SetID
          AND D.Package = C.Package
        )
)
0
Comment
Question by:mayankvindhya
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
  • +2
12 Comments
 
LVL 25

Expert Comment

by:jogos
ID: 36569582
The filtering part in the HAVING could be in the WHERE. Why because they will be earlier filtered out so they won't be grouped anymore.  
Wonder also if its an explicit choise not to join C and D but writing them in a separate not exists.


0
 

Author Comment

by:mayankvindhya
ID: 36569616
Do you have better query?Please share.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 36569733
>>Do you have better query?<<
Have you checked the suggestion made?  Does it return the correct results?  Does it perform adequately?
0
Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

 
LVL 74

Expert Comment

by:sdstuber
ID: 36569749
what would you want returned if the following rows are added?


INSERT INTO setpackage
VALUES (5, 'D');

INSERT INTO setpackage
VALUES (6, 'D');


each setid could be the superset of the other
0
 

Author Comment

by:mayankvindhya
ID: 36570383
I don't want set id 5,6 to be superset or subset as the values are same
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 36570454
ok,  your query above returns both, that's why I sought confirmation thanks
0
 
LVL 25

Accepted Solution

by:
jogos earned 500 total points
ID: 36571733
Seemed to me my sugestions did meet your sql scils to type out.

In example I did implemented my 2 sugestions at same time, but the join C/D could also be implemented in the HAVING-sollution
But in performance tuning it's always beter to measure (profiler) the situation before changement, make a first change and compare new measurement and result, make next change ...
So you won't end up making 10 changes don't know where your gain was or which of the 10 changes was the cause of an altered result.
SELECT A.SetID SupersetID, B.SetID SubsetID
FROM SetPackage A, SetPackage B
WHERE A.SetID <> B.SetID
  AND A.Package = B.Package
AND NOT EXISTS 
(
    SELECT * FROM SetPackage C
         inner join SetPackage D on  AND D.Package = C.Package
    WHERE C.SetID = B.SetID
       AND D.SetID = A.SetID
) 
GROUP BY A.SetID, B.SetID

Open in new window

0
 
LVL 25

Expert Comment

by:jogos
ID: 37196812
No feedback on suggestions from author on both theorecical as copy/past sollutions that can make a difference in time and tempdb-usage.
0
 
LVL 69

Expert Comment

by:Qlemo
ID: 37242424
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL Server Results to Excel File 18 76
SQL Recursion schedule 13 35
shrink table after huge delete 2 28
Substring works but need to tweak it 14 31
How to leverage one TLS certificate to encrypt Microsoft SQL traffic and Remote Desktop Services, versus creating multiple tickets for the same server.
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…
Are you ready to implement Active Directory best practices without reading 300+ pages? You're in luck. In this webinar hosted by Skyport Systems, you gain insight into Microsoft's latest comprehensive guide, with tips on the best and easiest way…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question