Link to home
Start Free TrialLog in
Avatar of Craig Yellick
Craig YellickFlag for United States of America

asked on

Can this EXISTS query be re-written using OUTER JOIN?

I teach a SQL Server Query class and students often notice that my examples of queries that use the EXISTS operator can also be implemented using OUTER JOIN. I explain that this is due to my decision to use simple examples that illustrate the syntax.

I am looking for a query based on the Northwind sample database that can *only* be accomplished using EXISTS. That way I can point to it and say, here's a simple example that cannot be accomplished any other way.

Below, a query that lists all of the products ordered during the first week of July, 1996. Easily done with INNER JOIN.

  select distinct ProductName
  from Products as P
  inner join [Order Details] as OD on OD.ProductID = P.ProductID
  inner join Orders as O on O.OrderID = OD.OrderID
  where OrderDate between '1-July-1996' and '7-July-1996'

The challenge is to list the products that were NOT ordered during that time period. I am pretty sure this cannot be done with any variation on JOIN. But I want to be certain.  Here is my solution that uses EXISTS.

  select ProductName
  from Products as P
  where not exists (
    select * from [Order Details] as OD
    join Orders as O on O.OrderID = OD.OrderID
    where OD.ProductID = P.ProductID
      and OrderDate between '1-July-1996' and '7-July-1996'
  )

If the above can be implemented without using EXISTS I would like to know.  (By the way, doing a NOT IN( ) doesn't count... that's essentially the same as EXISTS just less efficient.)

-- Craig
ASKER CERTIFIED SOLUTION
Avatar of wstuph
wstuph

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of DireOrbAnt
DireOrbAnt

select ProductName
from Products as P
left join [Order Details] as OD ON O.OrderID = OD.OrderID AND OD.ProductID = P.ProductID AND OrderDate between '1-July-1996' and '7-July-1996'
where OD.OrderID SI NULL
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
DireOrbAnt - you need to join the orders table as well.

Truthfully, I will always end up writing it my way unless there's a huge performance difference - it was the way that I was taught and it makes a lot more sense to me then EXISTS ever has.
That's odd. Exists makes much more sense to me in writing queries such as "give me all the foos that have this type of baz, but never had this type of bar".
Avatar of Craig Yellick

ASKER

DireOrbAnt: the query is missing a reference to the Orders table, where OrderDate is found.

Ok, sure doing a sub-query with GROUP BY does do the trick. I was looking for a pure JOIN solution, along the lines that DireOrbAnt was doing, where you stick additional conditions in the ON clause.

I will award points if nobody comes up with anything else, but I should have said that fancy sub-queries are not desired if possible.

Allow me to explain:  Students love using Query-by-Example (QBE) tools because they are so easy to learn, especially when multiple tables are involved. The problem with QBE tools is that they immediately JOIN everything and make it difficult to use sub-queries or EXISTS.  If it's possible to accomplish the "products not ordered" query using JOINS I want to know.

-- Craig

Orderdate is coming from Orders and not OrderDetails?
Craig: to explain why Exists is a better way of doing it, quiz your students on the reverse (which is a very good skill to learn anyway): "explain what this query does". Exists is always much better in that department.
And besides, QBE tools are evil. They allow someone to come up with a query that "kind-of" works without ever having thought of the actual problem.
select ProductName
from Products as P
left outer join
[Order Details] as OD
on p.productid = od.productid
left outer join orders as O
on od.orderid = o.orderid
and orderdate between '1-July-1996' and '7-July-1996'
where o.orderid is null
group by productname
Plus the fact that they're pretty much unmaintainable in a production setting.  Who really wants to read through 40+ lines of full tablenames referenced constantly, joined in a manner that makes no sense to people.  
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of James0628
I assume you want to include any products that were never ordered, but if I'm wrong and you're only looking for the products that were ordered at some other time, you could just use "NOT BETWEEN" in your original WHERE.

 James
THANKS to everyone who contributed. I really appreciate it.

DireOrbAnt: your last version yields products that have *never* been ordered, ever. It does not list those that have been ordered but not during that week.

wstuph: your last version lists all products.

James0628: yes, the ideal solution should list all products not ordered during the time period, even if the product was never ordered at anytime.

James0628: NOT BETWEEN, yeah you might think it would do that, but it doesn't.

Accomplishing this query is more complicated than it initially appears to be, and gets to the heart of the idiosyncrasies of JOIN..ON.

Einstine98: I plan on running all 3-4 variations on the solution with an execution plan and noting that all the solutions are close to the same in SQL Server 2005. I did some tests a while back with SQL Server 2000 and found that it did not always produce the same plan, and sometimes it was dramatically different.

SjoerdVerweij: Totally agree on all points. The EXISTS operator makes intent much more obvious. I hammer on the fact that QBE tools cannot do everything and force you to JOIN everything even if it results in an illegible query.

-- Craig
> NOT BETWEEN, yeah you might think it would do that, but it doesn't.

 I'm not sure what you're saying there, but I tried NOT BETWEEN in an example similar to yours and it did find the sales on other dates.  But it won't find items with no sales at all.

 I _think_ DireOrbAnt's last post will work.  Try changing the WHERE to check O.OrderID instead of OD.OrderID.

 James
If you have an unmodified copy of Northwind handy you can test this pretty easily. Since starting this question I have refined the original goal of the query to be easier to test and validate.  Instead of a single week in July, 1996 I've extended the date range to all of 1996.

/* Establish the baseline */
insert Products(ProductName) values ('zz Never Ordered')
select count(*) from products  -- 78 with the above extra product

 /* 74 products were ordered in 1996 */
 select distinct ProductName
  from Products as P
  inner join [Order Details] as OD on OD.ProductID = P.ProductID
  inner join Orders as O on O.OrderID = OD.OrderID
  where year(OrderDate)=1996

/* Technique using NOT EXISTS to find products NOT ordered
    Returns 4 products including the 'never ordered' product
*/
select ProductName
  from Products as P
  where not exists (
    select * from [Order Details] as OD
    join Orders as O on O.OrderID = OD.OrderID
    where OD.ProductID = P.ProductID
      and year(OrderDate)=1996
  )

If you edit the WHERE clause of the original query to "year(OrderDate) <> 1996", which is essentially the same as "not between" you get a long list of products, not just the four.

-- Craig
Ah.  By "NOT ordered during that time period" I thought you just meant any products ordered at some other time or never ordered, but I guess you also wanted to _exclude_ the products that _were_ ordered during that time.  I didn't get the exclusion part.

 James