RichNH
asked on
Questioning SQL efficiency, JOINs vs IN
Can anyone tell me which of these would be more efficient in processing? I'm running MS SQL Server 2008.
Case 1
SELECT EF.ExtractableEmailVendorF eedName vendor
,COUNT(DISTINCT A.KeyID) companyCount
,COUNT(DISTINCT EF.DistinctID) emailCount
FROM PSExtract.dbo.vwCompany AS A
INNER JOIN PSExtract.dbo.vwExecutiveA ndExecutiv eFunction AS EF
ON A.KeyID = EF.KeyID AND EF.ExtractableEmailVendorF eedID IS NOT NULL AND
EF.OSFunctionID IN (30, 330, 333) AND
EF.OSLevelID IN (20, 30, 40, 50, 60, 85) AND
A.IndustryGroupID IN (220501, 220502, 220503, 220504, 220505, 220506, 220507, 220508, 220509, 220510, 220511, 220512,
220513, 220514, 220515, 220516, 220517, 220518, 220579, 220519, 220520, 220521, 220522, 220523, 220524, 220525,
220529, 220528, 220527, 220530, 220531, 220532, 220533, 220534, 220535, 220605, 220536, 220537, 220538, 220539,
220540, 220541, 220542, 220543, 220544, 220545, 220546, 220547, 220548, 220549, 220550, 220551, 220553, 220554,
220555, 220556, 220557, 220558, 220559, 220561, 220562, 220563, 220603, 220564, 220565, 220567, 220568, 220569,
220570, 220571, 220572, 220573, 220574, 220575, 220576, 220577, 220578, 220580, 220581, 220582, 220583, 220584,
220585, 220586, 220587, 220588, 220589, 220590, 220592, 220593, 220594, 220595, 220596, 220597, 220598, 220599,
220600, 220601, 220602) AND
A.NationalRegionID IN (10110, 10130, 10140) AND
A.Employees BETWEEN 50 AND 250
GROUP BY EF.ExtractableEmailVendorF eedName
ORDER BY EF.ExtractableEmailVendorF eedName ASC
Case 2
SELECT EF.ExtractableEmailVendorF eedName vendor
,COUNT(DISTINCT A.KeyID) companyCount
,COUNT(DISTINCT EF.DistinctID) emailCount
FROM PSExtract.dbo.vwCompany AS A
INNER JOIN PSExtract.dbo.vwExecutiveA ndExecutiv eFunction AS EF
ON A.KeyID = EF.KeyID AND EF.ExtractableEmailVendorF eedID IS NOT NULL AND
EF.OSFunctionID IN (30, 330, 333) AND
EF.OSLevelID IN (20, 30, 40, 50, 60, 85) AND
A.NationalRegionID IN (10110, 10130, 10140) AND
A.Employees BETWEEN 50 AND 250
INNER JOIN PSExtract.temp.RJZ_Essenti alNet_Indu stryGroupI Ds AS IG
ON A.IndustryGroupID = IG.IndustryGroupID
GROUP BY EF.ExtractableEmailVendorF eedName
ORDER BY EF.ExtractableEmailVendorF eedName ASC
PSExtract.temp.RJZ_Essenti alNet_Indu stryGroupI Ds is loaded with the IndustryGroupIDs you see in case 1, they are loaded as ints, they are not indexed in any way.
Rich
Case 1
SELECT EF.ExtractableEmailVendorF
,COUNT(DISTINCT A.KeyID) companyCount
,COUNT(DISTINCT EF.DistinctID) emailCount
FROM PSExtract.dbo.vwCompany AS A
INNER JOIN PSExtract.dbo.vwExecutiveA
ON A.KeyID = EF.KeyID AND EF.ExtractableEmailVendorF
EF.OSFunctionID IN (30, 330, 333) AND
EF.OSLevelID IN (20, 30, 40, 50, 60, 85) AND
A.IndustryGroupID IN (220501, 220502, 220503, 220504, 220505, 220506, 220507, 220508, 220509, 220510, 220511, 220512,
220513, 220514, 220515, 220516, 220517, 220518, 220579, 220519, 220520, 220521, 220522, 220523, 220524, 220525,
220529, 220528, 220527, 220530, 220531, 220532, 220533, 220534, 220535, 220605, 220536, 220537, 220538, 220539,
220540, 220541, 220542, 220543, 220544, 220545, 220546, 220547, 220548, 220549, 220550, 220551, 220553, 220554,
220555, 220556, 220557, 220558, 220559, 220561, 220562, 220563, 220603, 220564, 220565, 220567, 220568, 220569,
220570, 220571, 220572, 220573, 220574, 220575, 220576, 220577, 220578, 220580, 220581, 220582, 220583, 220584,
220585, 220586, 220587, 220588, 220589, 220590, 220592, 220593, 220594, 220595, 220596, 220597, 220598, 220599,
220600, 220601, 220602) AND
A.NationalRegionID IN (10110, 10130, 10140) AND
A.Employees BETWEEN 50 AND 250
GROUP BY EF.ExtractableEmailVendorF
ORDER BY EF.ExtractableEmailVendorF
Case 2
SELECT EF.ExtractableEmailVendorF
,COUNT(DISTINCT A.KeyID) companyCount
,COUNT(DISTINCT EF.DistinctID) emailCount
FROM PSExtract.dbo.vwCompany AS A
INNER JOIN PSExtract.dbo.vwExecutiveA
ON A.KeyID = EF.KeyID AND EF.ExtractableEmailVendorF
EF.OSFunctionID IN (30, 330, 333) AND
EF.OSLevelID IN (20, 30, 40, 50, 60, 85) AND
A.NationalRegionID IN (10110, 10130, 10140) AND
A.Employees BETWEEN 50 AND 250
INNER JOIN PSExtract.temp.RJZ_Essenti
ON A.IndustryGroupID = IG.IndustryGroupID
GROUP BY EF.ExtractableEmailVendorF
ORDER BY EF.ExtractableEmailVendorF
PSExtract.temp.RJZ_Essenti
Rich
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Absolutely!
ASKER
Thanks for the replies folks, to answer questions in turn:
I don't have the privs to run an execution plan right now.
This was a one shot solution as all the solutions I generate pretty much are, the question was more to figure out if there was a more efficient way. I did try running both samples and they came in neck and neck. For the data I pulled you couldn't really tell who won. My understanding is that sometimes the amount of data we pull and the operations we do will run hours. This query lasted between 4 & 5 minutes.
My understanding from one of the books I was reading was that if you can put a constraint into the JOIN, it's more efficient in execution. I read this in Beginning Microsoft SQL Server 2008 Programming by Robert Vieira. Although I do agree that from a maintenance point of view the suggested query in ACPerkins note is much better.
I don't have the privs to run an execution plan right now.
This was a one shot solution as all the solutions I generate pretty much are, the question was more to figure out if there was a more efficient way. I did try running both samples and they came in neck and neck. For the data I pulled you couldn't really tell who won. My understanding is that sometimes the amount of data we pull and the operations we do will run hours. This query lasted between 4 & 5 minutes.
My understanding from one of the books I was reading was that if you can put a constraint into the JOIN, it's more efficient in execution. I read this in Beginning Microsoft SQL Server 2008 Programming by Robert Vieira. Although I do agree that from a maintenance point of view the suggested query in ACPerkins note is much better.