asked on

Questioning SQL efficiency, JOINs vs IN

Can anyone tell me which of these would be more efficient in processing? I'm running MS SQL Server 2008.

Case 1
SELECT EF.ExtractableEmailVendorFeedName vendor
,COUNT(DISTINCT A.KeyID) companyCount
,COUNT(DISTINCT EF.DistinctID) emailCount
FROM PSExtract.dbo.vwCompany AS A
INNER JOIN PSExtract.dbo.vwExecutiveAndExecutiveFunction AS EF
ON A.KeyID = EF.KeyID AND EF.ExtractableEmailVendorFeedID IS NOT NULL AND
EF.OSFunctionID IN (30, 330, 333) AND
EF.OSLevelID IN (20, 30, 40, 50, 60, 85) AND
A.IndustryGroupID IN (220501, 220502, 220503, 220504, 220505, 220506, 220507, 220508, 220509, 220510, 220511, 220512,
220513, 220514, 220515, 220516, 220517, 220518, 220579, 220519, 220520, 220521, 220522, 220523, 220524, 220525,
220529, 220528, 220527, 220530, 220531, 220532, 220533, 220534, 220535, 220605, 220536, 220537, 220538, 220539,
220540, 220541, 220542, 220543, 220544, 220545, 220546, 220547, 220548, 220549, 220550, 220551, 220553, 220554,
220555, 220556, 220557, 220558, 220559, 220561, 220562, 220563, 220603, 220564, 220565, 220567, 220568, 220569,
220570, 220571, 220572, 220573, 220574, 220575, 220576, 220577, 220578, 220580, 220581, 220582, 220583, 220584,
220585, 220586, 220587, 220588, 220589, 220590, 220592, 220593, 220594, 220595, 220596, 220597, 220598, 220599,
220600, 220601, 220602) AND
A.NationalRegionID IN (10110, 10130, 10140) AND
A.Employees BETWEEN 50 AND 250
GROUP BY EF.ExtractableEmailVendorFeedName
ORDER BY EF.ExtractableEmailVendorFeedName ASC

Case 2
SELECT EF.ExtractableEmailVendorFeedName vendor
,COUNT(DISTINCT A.KeyID) companyCount
,COUNT(DISTINCT EF.DistinctID) emailCount
FROM PSExtract.dbo.vwCompany AS A
INNER JOIN PSExtract.dbo.vwExecutiveAndExecutiveFunction AS EF
ON A.KeyID = EF.KeyID AND EF.ExtractableEmailVendorFeedID IS NOT NULL AND
EF.OSFunctionID IN (30, 330, 333) AND
EF.OSLevelID IN (20, 30, 40, 50, 60, 85) AND
A.NationalRegionID IN (10110, 10130, 10140) AND
A.Employees BETWEEN 50 AND 250
INNER JOIN PSExtract.temp.RJZ_EssentialNet_IndustryGroupIDs AS IG
ON A.IndustryGroupID = IG.IndustryGroupID
GROUP BY EF.ExtractableEmailVendorFeedName
ORDER BY EF.ExtractableEmailVendorFeedName ASC

PSExtract.temp.RJZ_EssentialNet_IndustryGroupIDs is loaded with the IndustryGroupIDs you see in case 1, they are loaded as ints, they are not indexed in any way.

Rich

SOLUTION

wdosanjos

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER CERTIFIED SOLUTION

Anthony Perkins

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

HainKurt

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Anthony Perkins

Absolutely!

RichNH

ASKER

Thanks for the replies folks, to answer questions in turn:

I don't have the privs to run an execution plan right now.

This was a one shot solution as all the solutions I generate pretty much are, the question was more to figure out if there was a more efficient way. I did try running both samples and they came in neck and neck. For the data I pulled you couldn't really tell who won. My understanding is that sometimes the amount of data we pull and the operations we do will run hours. This query lasted between 4 & 5 minutes.

My understanding from one of the books I was reading was that if you can put a constraint into the JOIN, it's more efficient in execution. I read this in Beginning Microsoft SQL Server 2008 Programming by Robert Vieira. Although I do agree that from a maintenance point of view the suggested query in ACPerkins note is much better.