Link to home
Start Free TrialLog in
Avatar of Darinwc
Darinwc

asked on

SQL: "and" criteria across multiple rows

forgive me because this is very hard to explain.

I am trying to write a query that will find records based on multiple criteria. However the criteria can be spread across multiple rows, rather than multiple columns.

Here is an example:

people table: (personID, personName)
1,joe
2,sally
3,jack

purchase table: (purchaseID, ItemName, personID)
1,apples,3
2,pears,3
3,peanut butter,2
4,pears,2
5,apples,1
6,peanut butter,1
7,jelly,1


OK, so I want to know who bought BOTH peanut butter and jelly.
But if I use
WHERE itemname = "peanut butter" and itemName = "jelly"
then obviously I get nothing.

Please help thank you.
Avatar of Ryan McCauley
Ryan McCauley
Flag of United States of America image

I think you're looking for this:

SELECT p.personID, p.personName
  FROM people p
  JOIN purchase r1
      on p.personID = r1.personID
  JOIN purchase r2
      on p.personID = r2.personID
WHERE r1.itemName = 'peanut butter'
   AND r2.itemName = 'jelly'

Open in new window


This will return people who have purchased both peanut butter and jelly. I'm joining to the "Purchase" table twice - once to look for peanut butter, and again to look for jelly.
SOLUTION
Avatar of cyberkiwi
cyberkiwi
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
exists is pretty efficient and if you had 3 years of purchases, it will shortcut out and return true for the first match that it finds.
Using the first comment, you need a distinct or group by anyway
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> The query below will give you what you need much more efficiently.

Depends, doesn't it?

If you have 100 people making 1000 purchases each, and an index on purchase(personid,itemname), then exists should be faster.
Your query requires a full table scan on purchase.
If there's an index on personid and itemname, why couldn't SQL use it to resolve my query as well??

That's why I put the WHERE in there:

WHERE itemname = 'peanut butter' OR itemName = 'jelly'
It truly may depend.

But I would think your query will require at least one lookup for *every* person, whether they bought either of the items or not.
If there's *not* an index on *both* those columns, I'm certain my query would win hands down :-) .
That's why I put the WHERE in there:
WHERE itemname = 'peanut butter' OR itemName = 'jelly'
I'm sorry but I didn't see that, can you please point it out on my screen?

If the index is on (personid,itemname) in that order, it will still scan, and process all the peanut butter and jelly rows.  With exists, all it does it check that the index key exists.

If the index is on personid or itemname separately or none, then it is likely to scan the table in any case, but still I reiterate that it depends on your data distribution.
>> I'm sorry but I didn't see that, can you please point it out on my screen? <<

I just did.  If it's indexed, the WHERE should be there, if not, it doesn't matter, so I commented it out.


I would think if you're going to seach by itemname a lot, you would cluster the table on personid and create a non-clus index on (itemname) [which will also automatically then include the personid].


>> but still I reiterate that it depends on your data distribution. <<

Yes, but to make yours better you have to make **very specific** assumptions about the data distribution.

Moreover, as you seach for more items, your code will get progressively slower (can we agree on that?).

Mine will produce roughly consistent results for any reasonable number of additional items.  

Mine may be slower than optimal with very few people, AND a lot of purchases, AND a very small number of items seached for.

Internally SQL may be able to turn the EXISTS() into semi-JOINs, avoiding some of the overhead, but I don't think the optimizer is sophisticated enough to combine the conditions into a single SELECT.

I think as you get to 100s or 1000s or 10000s or 100000s of separate index seeks, one index scan may do much better.

At any rate, the poster has two different methods to try out.  Either should produce accurate results.
very interesting conversation... I am learning from it.

if we do below you think its a heavy query?

Select purchaseID From purchaseTable Where itemname In('peanut butter', 'Jelly')
Group by purchaseID
Having Count(*) = 2   ' or 3 if looking for 3 values

Also lets say if we are dealing with million rows and we add a third table with something like this..

purchaseid, flavors
-----------------------------
2, peanutbutter | jelly | etc

and use use flavors as  a "FreeText" field, we can return results much quicker (Using CONTAINS/FreeText)?
 just curious

Thanks
>> I'm sorry but I didn't see that, can you please point it out on my screen? <<

I just did. If it's indexed, the WHERE should be there, if not, it doesn't matter, so I commented it out.
I am getting old, but I don't see any WHERE clause in http:#34262950

The first I hear of it is http:#a34262990

Anyway, I would also like to point out that optimally indexed, I truly believe my query will finish faster in general cases.  Now, If it's not indexed we don't even need to discuss this, since performance must be a moot point.
Avatar of Darinwc
Darinwc

ASKER

Adikhan:
I dont think the
HAVING Count()=..  
will not work because if the person buys two peanut butters it will return true or
if they purchase 1 peanut butter and 2 jellies it will return false.

..unless the group or freetext parts affect limit the count to exactly what we want.
Avatar of Darinwc

ASKER

incidentally, the
HAVING MAX(itemname = 'peanut butter')... AND MAX(itemname = 'jelly')
did not work at all,
but i used
HAVING MAX(itemname = 'peanut butter')... AND MIN(itemname = 'jelly')
worked fine but was so slooow.

That may be due to the software I am using to test (msACCESS).
Avatar of Darinwc

ASKER

..and I didnt need the case statement in the example above = worked just fine.
Avatar of Darinwc

ASKER

[b]PEANUT BUTTAH JELLY TIME!!!
PEANUT BUTTAH JELLY TIME!!!
PEANUT BUTTAH JELLY TIME!!!

[/b]

OK both seemed to work.
the where exists..   had better performance and the having... was convenient becasue you can have 2 columns showing the example data.
Access?  I thought this was in the SQL Server zone.
Anyway, here's another form for Access

select p.*
from (
      select personid
      from
      (
      select distinct personid from purchase where itemname = 'peanut butter'
      union all
      select distinct personid from purchase where itemname = 'jelly'
      ) x
      group by personid having COUNT(personid) > 1) y
inner join person p on p.personid = y.personid