Link to home
Start Free TrialLog in
Avatar of chemass
chemassFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Need to improve query with nested SQL where...in... statements

I have a query that selects all profiles with the specified sub records.
The current query is built up in c# code using a parametrised command object and has multiple nested select statements, that will grow to a current maximum of 60 sub-queries!

The table design can be seen in the attached jpeg. There is one 'main' table (ProfilesNew), 5 'sub record' tables, each of which which are related to the main table through a 2-column many to many table, so each profile can have several DFEs for example.

Each of the numbers in the code snippet are actually parameters - just replaced by numbers for testing purposes.

I've tested the current code, and although it works (and pretty well with only a few sub-queries!) it is not the most elegant solution, and as the selected options increase, so does the execution time.

The query must select ALL profiles that have ALL sub records selected.
I also need another query that does the same thing, but returns all profiles that have at least one of the sub records.

Can anybody help?
SELECT * FROM ProfilesNew 
WHERE (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_DFEs WHERE (DFEId = 1))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_DFEs AS Profiles_DFEs_1 WHERE (DFEId = 3))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_Industries WHERE (IndustryId = 10))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_Industries WHERE (IndustryId = 4))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_PrimaryApps WHERE (PrimaryAppId = 4))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_PrimaryApps WHERE (PrimaryAppId = 10))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_VICompTools WHERE(VICompId = 7))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_VICompTools WHERE (VICompId = 8))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_Finishers WHERE (FinisherId = 2))) 
AND (ProfilesNew.ProfileId IN (SELECT ProfileId FROM Profiles_Finishers WHERE (FinisherId = 1)))

Open in new window

Tables.JPG
Avatar of derekkromm
derekkromm
Flag of United States of America image

not much you can do on this one.

you can try this instead and see if it improves performance:

select * from ProfilesNew pn where
exists(select 1 from Profiles_DFEs where DFEid = 1 and ProfileId = pn.ProfileID)
and exists(...)
...
Avatar of Jim Horn

SELECT *
FROM ProfilesNew
WHERE ProfileId IN (SELECT ProfileId FROM Profiles_DFEs WHERE DFEId IN (1, 3) )
  AND ProfileId IN (SELECT ProfileId FROM Profiles_Industries WHERE IndustryId IN (10, 4))
  AND ProfileId IN (SELECT ProfileId FROM Profiles_PrimaryApps WHERE PrimaryAppId IN (10, 4))
  AND ProfileId IN (SELECT ProfileId FROM Profiles_VICompTools WHERE VICompId IN (7, 8))
  AND ProfileId IN (SELECT ProfileId FROM Profiles_Finishers WHERE FinisherId IN (2, 1))

 
I think it can be as following
SELECT DISTINCT a.*
FROM ProfilesNew As a
    INNER JOIN Profiles_DFEs As b1 ON a.ProfileId = b1.ProfileId
    INNER JOIN Profiles_DFEs As b3 ON a.ProfileId = b3.ProfileId
    INNER JOIN Profiles_Industries As c4 ON a.ProfileId = c4.ProfileId
    INNER JOIN Profiles_Industries As c10 ON a.ProfileId = c10.ProfileId
    INNER JOIN Profiles_PrimaryApps As d4 ON a.ProfileId = d4.ProfileId
    INNER JOIN Profiles_PrimaryApps As d10 ON a.ProfileId = d10.ProfileId
    INNER JOIN Profiles_VICompTools As e7 ON a.ProfileId = e7.ProfileId
    INNER JOIN Profiles_VICompTools As e8 ON a.ProfileId = e8.ProfileId
    INNER JOIN Profiles_Finishers As f1 ON a.ProfileId = f1.ProfileId
    INNER JOIN Profiles_Finishers As f2 ON a.ProfileId = f2.ProfileId
WHERE b1.DFEId = 1 AND b3.DFEId = 3
    AND c4.IndustryId = 4 AND c10.IndustryId = 10
    AND d4.PrimaryAppId = 4 AND d10.PrimaryAppId = 10
    AND e7.VICompId = 7 AND e8.VICompId = 8
    AND f1.FinisherId = 1 AND f2.FinisherId = 2

Open in new window

jimhorn's solution won't work since it will treat the "in (1, 3)" as an OR instead of an AND

gnoon's solution should work as well
>jimhorn's solution won't work since it will treat the "in (1, 3)" as an OR instead of an AND
Correct; I mis-interpreted the question.  
Avatar of chemass

ASKER

I tried soething along these lines:

SELECT *
FROM ProfilesNew
WHERE ProfileId IN (SELECT ProfileId FROM Profiles_DFEs WHERE DFEId IN (1, 3) )
GROUP BY ProfileId HAVING COUNT(*)=2

but it was even slower!!
hi, try this
SELECT P.* FROM ProfilesNew P INNER JOIN
(SELECT ProfileId FROM Profiles_DFEs WHERE (DFEId = 1)) T1 ON (T1.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_DFEs AS Profiles_DFEs_1 WHERE (DFEId = 3)) T2 ON (T1.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_Industries WHERE (IndustryId = 10)) T3 ON (T3.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_Industries WHERE (IndustryId = 4)) T4 ON (T4.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_PrimaryApps WHERE (PrimaryAppId = 4)) T5 ON (T5.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_PrimaryApps WHERE (PrimaryAppId = 10)) T6 ON (T6.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_VICompTools WHERE(VICompId = 7)) T7 ON (T7.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_VICompTools WHERE (VICompId = 8)) T8 ON (T8.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_Finishers WHERE (FinisherId = 2)) T9 ON (T9.ProfileId=P.ProfileId) INNER JOIN
(SELECT ProfileId FROM Profiles_Finishers WHERE (FinisherId = 1)) T10 ON (T10.ProfileId=P.ProfileId)

Open in new window

Avatar of chemass

ASKER

ee_rlee -
Thanks for the suggestion.
I ran it through SMSEE as a query - 100000 iterations - and the time it takes is exactly the same as my original.
As your suggestion also uses nested selects, would it also suffer from the same massive slowdown that mine does with 30-40 subqueries?

over 100000 iterations, on this useless pc, both queries run in 1min 6secs.

When the tables become more populated though, (at the moment there's only 2 profiles and up to 5 entries in the other tables), i feat that the queries will take an inordinate amount of time to complete.

actually this code doesn't use nested select. it just joins subqueries.

in your code, the subquery is executed once for each record in table ProfilesNew. in this code, the subqueries are only executed once. you should be able to see great improvement in speed if you have large tables.

btw, i added distict in the subqueries. ;)
SELECT P.* FROM ProfilesNew P INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_DFEs WHERE (DFEId = 1)) T1 ON (T1.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_DFEs AS Profiles_DFEs_1 WHERE (DFEId = 3)) T2 ON (T1.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_Industries WHERE (IndustryId = 10)) T3 ON (T3.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_Industries WHERE (IndustryId = 4)) T4 ON (T4.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_PrimaryApps WHERE (PrimaryAppId = 4)) T5 ON (T5.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_PrimaryApps WHERE (PrimaryAppId = 10)) T6 ON (T6.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_VICompTools WHERE(VICompId = 7)) T7 ON (T7.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_VICompTools WHERE (VICompId = 8)) T8 ON (T8.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_Finishers WHERE (FinisherId = 2)) T9 ON (T9.ProfileId=P.ProfileId) INNER JOIN
(SELECT DISTINCT ProfileId FROM Profiles_Finishers WHERE (FinisherId = 1)) T10 ON (T10.ProfileId=P.ProfileId)

Open in new window

Avatar of chemass

ASKER

OK,

I tried the same thing with gnoon's answer, and again, the performance is almost exactly the same.
Which of the solutions would be the most scalable?

There's likely to be around 5000 records in the main table, with between 1-5 records per main record in each of the sub-tables.

ee_rlee's one sounds good, although all of them will be a bit of a nightmare to keep track of in code!

A big THANK YOU to all who have helped here!
is combination of DFEId and profileid in table Profiles_DFEs unique? what about the other tables?
Avatar of chemass

ASKER

Yes, in all of the linking tables, there is a unique index of both columns.
I'm not very well versed in Sql indexes, so they've all got a 2-column unique non-clustered index, and a 2-column unique clustered index on them.
any suggestions on the indexes will also be gratefully accepted!
ASKER CERTIFIED SOLUTION
Avatar of ee_rlee
ee_rlee
Flag of Philippines image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just a notice. On diagram above, I don't see any key on Profiles_Industries table and other many-to-many tables.
I think it would be better performance if you define both columns as a primary key physically on each table.
Avatar of yuching
yuching

I summarize the suggestion for
derekkromm: --> use EXISTS (guest it will be more faster as compare to IN)
gnoon: --> Create index for
Table                             KEY
Profiles_DFEs               ProfileID, DFEId
Profiles_Industries        ProfileID, IndustryId
Profiles_PrimaryApps   ProfileID, PrimaryAppId
Profiles_VICompTools   ProfileID, VICompId

SELECT * FROM ProfilesNew a
WHERE
  EXISTS (SELECT 1 FROM Profiles_DFEs WHERE DFEId IN (1,3)
          AND Profiles_DFEs.ProfileID = a.ProfileID)
  AND EXISTS (SELECT 1 FROM Profiles_Industries WHERE IndustryId IN (10,4)
          AND Profiles_Industries.ProfileID = a.ProfileID)
  AND EXISTS (SELECT 1 FROM Profiles_PrimaryApps WHERE PrimaryAppId IN (10,4)
          AND Profiles_PrimaryApps.ProfileID = a.ProfileID)
  AND EXISTS (SELECT 1 FROM Profiles_VICompTools WHERE VICompId IN (7,8)
          AND Profiles_VICompTools.ProfileID = a.ProfileID)
 AND EXISTS (SELECT 1 FROM Profiles_Finishers WHERE FinisherId IN (1,2)
          AND Profiles_Finishers.ProfileID = a.ProfileID)
yuching,

I think your query does not produce the same result as the asker's query. derekkromm already explained the reason (https://www.experts-exchange.com/questions/23209222/Need-to-improve-query-with-nested-SQL-where-in-statements.html#21031774).
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of chemass

ASKER

yuching, when i replace DFEId IN (1,3) with DFEId =1 AND DFEId = 3 (and all the other rows) i get no results whatsoever!

gnoon - I've now applied a dual primary key to each of the many-to-many tables. The performance of each of the queries is still almost exactly the same. I'm going to populate the database with around 5000 records, and i'll test them all again.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial