Starr Duskk
asked on
Converting IN to EXISTS and NOT IN to NOT EXISTS
Can someone help me convert this to an EXISTS, NOT EXISTS clause? Someone said an EXISTS performs better than an IN. (I also read this is no longer an issue with SQLServer 2008 optimization, but I want to try it both ways and see if it performs better or not):
Please help me word it correctly.
(The TrainingCourseUnit table is NOT used in my parent query because it has a one to many relationship and I don't want to reproduce all those rows. I just need to include the Course if it is in the TrainingCourseUnit table or if units are assigned to that course in the TrainingCourseUnit table.)
Thanks!
AND
(
TrainingCourse.TrainingCourseId IN (Select TrainingCourseId from TrainingCourseUnit WHERE UnitId IN (Select UnitId from EmployeeAccessUnitView where EmployeeId = Employee.EmployeeId))
OR
TrainingCourse.TrainingCourseId NOT IN (Select TrainingCourseId from TrainingCourseUnit)
)
Please help me word it correctly.
(The TrainingCourseUnit table is NOT used in my parent query because it has a one to many relationship and I don't want to reproduce all those rows. I just need to include the Course if it is in the TrainingCourseUnit table or if units are assigned to that course in the TrainingCourseUnit table.)
Thanks!
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
I tried that change and my query using IN/NOT IN ran consistently one minute, but the new one using EXIST took 1.15 seconds. So it would 15 seconds longer to run. Weird huh?
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
PortletPaul,
select * versus select 1 has been an issue in ancient releases of DBMS, but isn't anymore (since several years). MySQL and such might be an exception, though. MSSQL does not care anymore. A "*" as selector is better than using any field, as that might force less advanced optimizers to choose the wrong execution plan.
Oracle, MSSQL, and several other DBMS remove any column or expression in a subselect following (NOT) EXISTS, and the official use is EXISTS (SELECT * ...) indeed.
select * versus select 1 has been an issue in ancient releases of DBMS, but isn't anymore (since several years). MySQL and such might be an exception, though. MSSQL does not care anymore. A "*" as selector is better than using any field, as that might force less advanced optimizers to choose the wrong execution plan.
Oracle, MSSQL, and several other DBMS remove any column or expression in a subselect following (NOT) EXISTS, and the official use is EXISTS (SELECT * ...) indeed.
@Qlemo
I am an ancient of course. I don't doubt you (so re-running isn't likely to produce a difference) and it's good to know optimizers have caught up with this.
select 1 isn't a field, but a constant, and as such should avoid any IO etc.
be interesting to know if * is demonstrably better than a constant here - do you know if such evidence exists?
I am an ancient of course. I don't doubt you (so re-running isn't likely to produce a difference) and it's good to know optimizers have caught up with this.
select 1 isn't a field, but a constant, and as such should avoid any IO etc.
be interesting to know if * is demonstrably better than a constant here - do you know if such evidence exists?
No, there should be no difference whether you choose a literal or *. I wasn't after pointing out your change would make it worse - it shouldn't have any effect, that's it.
great - thanks.
ASKER
Sorry, yes 1 minute 15 seconds and that was changing only ONE Of the IN/NOT INs as a test. I will do it with my huge query on everything and then let you know the results.
Also, I had read to use "Select 1" instead of "select *" and didn't notice any difference in results. The query I'm testing typically runs 1.5 minutes, but I have another that runs around 4-9 minutes depending.
I did add more memory to the machine and CPU, which with all my posts yesterday and today trying to get better performance, I noticed no one suggested that. It was sitting at 3.84 Memory usage constantly, with 4 max. So now I have 8 and will likely go to 15 since that helped. That gave it some advantage but not a whole lot.
Anyway, I will change all my IN/NOT IN and put them in a method so I can change them back and forth and test both ways and see which is best and I'll do the same with the select 1 versus select *.
I also run the test about 5 times each because I know it can vary.
okay, thanks all!
Also, I had read to use "Select 1" instead of "select *" and didn't notice any difference in results. The query I'm testing typically runs 1.5 minutes, but I have another that runs around 4-9 minutes depending.
I did add more memory to the machine and CPU, which with all my posts yesterday and today trying to get better performance, I noticed no one suggested that. It was sitting at 3.84 Memory usage constantly, with 4 max. So now I have 8 and will likely go to 15 since that helped. That gave it some advantage but not a whole lot.
Anyway, I will change all my IN/NOT IN and put them in a method so I can change them back and forth and test both ways and see which is best and I'll do the same with the select 1 versus select *.
I also run the test about 5 times each because I know it can vary.
okay, thanks all!
Remember: You need to make the EXISTS/IN distinction for each subselect individually, and with a typical data collection.
concentrate on an current use of IN() where the contents may be large (or might grow to large)
and it is unlikely that "select 1" will make substantial (if any) difference so it seems.
and it is unlikely that "select 1" will make substantial (if any) difference so it seems.
ASKER
I have added exists for everything. Here is one example?
Here are my results in seconds or minutes an seconds for the same two queries using IN and NOT IN versus EXISTS and NOT EXISTS. The IN always beats the EXISTS.
IN EXISTS
51 51
1:20 1:36
48 51
48 49
1:12 1:29
47 49
AND
(
EXISTS (
select 1
from TrainingCourseJobType
where TrainingCourse.TrainingCourseId = TrainingCourseJobType.TrainingCourseId
and TrainingCourseJobType.JobTypeId = Employee.JobTypeId
) OR NOT EXISTS (
select 1 from TrainingCourseJobType
where TrainingCourse.TrainingCourseId = TrainingCourseJobType.TrainingCourseId
)
)
Here are my results in seconds or minutes an seconds for the same two queries using IN and NOT IN versus EXISTS and NOT EXISTS. The IN always beats the EXISTS.
IN EXISTS
51 51
1:20 1:36
48 51
48 49
1:12 1:29
47 49
I shall go and find a crow to eat...
v.interesting results, thanks for your dedication to testing
any chance of a .sqlplan file for each approach?
v.interesting results, thanks for your dedication to testing
any chance of a .sqlplan file for each approach?
>>It took 4:29 to run this query on a client with a lot of employees and fruit. Too long. It doesn't take this long on a client with less employees and fruit.
this may be a silly question, but have these comparison tests been conducted "with a lot of employees and fruit"? The differential between use of IN() -v- EXISTS() will be seen most clearly when there is significant volume in those subqueries. For low volume IN() is OK.
this may be a silly question, but have these comparison tests been conducted "with a lot of employees and fruit"? The differential between use of IN() -v- EXISTS() will be seen most clearly when there is significant volume in those subqueries. For low volume IN() is OK.
ASKER
yes, these tests are using a lot of employees and fruit because that's where the problem lies. Not millions, but definitely thousands as opposed to hundreds. I don't see us ever hitting the millions on one client.
>>any chance of a .sqlplan file for each approach?
I will do that.
>>any chance of a .sqlplan file for each approach?
I will do that.
ASKER
Here is the first plan. Change the name to:
Plan_AnonymizedExists.quer yanalysis
This was created with SQLSentry.
Plan-AnonymizedExists.txt
Plan_AnonymizedExists.quer
This was created with SQLSentry.
Plan-AnonymizedExists.txt
ASKER
Here is the second plan, using IN.
Change the extension to "queryanalysis"
It is a SQL Sentry file.
Plan-Anonymized-IN.txt
Change the extension to "queryanalysis"
It is a SQL Sentry file.
Plan-Anonymized-IN.txt
ASKER
BTW, for these queries, the results return 121,221 records for total tests taken.
I ran the two queries at the same time right now and they both completed in 48 seconds each.
(But I have more detailed queries that are taking 4-6 minutes each. They are similar but return additional data from other tables that this one does not, so I don't test with them because it takes too long to test.)
thanks.
I ran the two queries at the same time right now and they both completed in 48 seconds each.
(But I have more detailed queries that are taking 4-6 minutes each. They are similar but return additional data from other tables that this one does not, so I don't test with them because it takes too long to test.)
thanks.
:( unable to open those files using SQL Sentry (root element missing)
ASKER
What other file does it need? Or how else should I build it?
thanks.
thanks.
process for saving a .sqlplan file:
http://msdn.microsoft.com/en-us/library/ms190646(v=sql.105).aspx
http://msdn.microsoft.com/en-us/library/ms190646(v=sql.105).aspx
ASKER
Thanks all. closing this out. I may open a new one later with the sql plans in there. thanks!
Thanks, hope you are making progress on this. Cheers, Paul.
ASKER
Naww.... I have to come back to it later. Got a web service to do now. thanks!
ASKER
Open in new window