Why is this simple query so slow?

SELECT * FROM Customer WHERE Customer_ID NOT IN(SELECT DISTINCT Customer_ID FROM Booking WHERE Booking_Flag_1)

I thought this would be a really simple query, but it's running painfully slow (over 2 minutes!) and I can't fathom out why. All I'm trying to do is get a list of customer's that don't have any related booking which are flagged. The tables are related one-to-many on Booking.Customer_ID = Customer.Customer_ID. Neither of them are especially large (<6000 records in each).

I would like to know: a) why does this query run so slowly and b) what can I do to make the speed acceptable?
LVL 11
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Guy Hengel [angelIII / a3]Billing EngineerCommented:
the DISTINCT is "bad", and actually not needed.

please try this alternative
SELECT * FROM Customer c WHERE NOT EXISTS ( SELECT NULL FROM Booking b WHERE b.Customer_ID = c.Customer_ID AND b.Booking_Flag_1 )

Open in new window


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
SharathData EngineerCommented:
Just delete DISTINCT from your query and check the query run time.
SELECT * FROM Customer WHERE Customer_ID NOT IN(SELECT Customer_ID FROM Booking WHERE Booking_Flag_1)

Open in new window

Antagony1960Author Commented:
That took about 20 seconds off but it still took over 100 seconds!

Well your query works perfectly, thank you. But I'd just like a little explanation for why my original query is bad, even if I drop the DISTINCT keyword. If I run the inner query on its own it is instantaneous, so I don't understand what is going wrong. I think it may have something to do with the NOT operator, because when I drop it--so that I am getting customers with flagged bookings--it is also pretty instantaneous.
Introduction to R

R is considered the predominant language for data scientist and statisticians. Learn how to use R for your own data science projects.

Exists is faster than In... why?
Returns true if a specified value matches any value in a subquery or a list.
Returns true if a subquery contains any rows.

also see link...

one thing to note....
When using NOT IN, the query performs nested full table scans, whereas for NOT EXISTS, query can use an index within the sub-query.
Antagony1960Author Commented:
Thanks carlsiy. It's becoming clearer now.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.