SQL: &amp;quot;and&amp;quot; criteria across multiple rows

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

exists is pretty efficient and if you had 3 years of purchases, it will shortcut out and return true for the first match that it finds.
Using the first comment, you need a distinct or group by anyway

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

> The query below will give you what you need much more efficiently.

Depends, doesn't it?

If you have 100 people making 1000 purchases each, and an index on purchase(personid,itemname), then exists should be faster.
Your query requires a full table scan on purchase.

If there's an index on personid and itemname, why couldn't SQL use it to resolve my query as well??

That's why I put the WHERE in there:

WHERE itemname = 'peanut butter' OR itemName = 'jelly'

It truly may depend.

But I would think your query will require at least one lookup for *every* person, whether they bought either of the items or not.

If there's *not* an index on *both* those columns, I'm certain my query would win hands down :-) .

That's why I put the WHERE in there:
WHERE itemname = 'peanut butter' OR itemName = 'jelly'

I'm sorry but I didn't see that, can you please point it out on my screen?

If the index is on (personid,itemname) in that order, it will still scan, and process all the peanut butter and jelly rows. With exists, all it does it check that the index key exists.

If the index is on personid or itemname separately or none, then it is likely to scan the table in any case, but still I reiterate that it depends on your data distribution.

>> I'm sorry but I didn't see that, can you please point it out on my screen? <<

I just did. If it's indexed, the WHERE should be there, if not, it doesn't matter, so I commented it out.

I would think if you're going to seach by itemname a lot, you would cluster the table on personid and create a non-clus index on (itemname) [which will also automatically then include the personid].

>> but still I reiterate that it depends on your data distribution. <<

Yes, but to make yours better you have to make **very specific** assumptions about the data distribution.

Moreover, as you seach for more items, your code will get progressively slower (can we agree on that?).

Mine will produce roughly consistent results for any reasonable number of additional items.

Mine may be slower than optimal with very few people, AND a lot of purchases, AND a very small number of items seached for.

Internally SQL may be able to turn the EXISTS() into semi-JOINs, avoiding some of the overhead, but I don't think the optimizer is sophisticated enough to combine the conditions into a single SELECT.

I think as you get to 100s or 1000s or 10000s or 100000s of separate index seeks, one index scan may do much better.

At any rate, the poster has two different methods to try out. Either should produce accurate results.

Saqib Khan

very interesting conversation... I am learning from it.

if we do below you think its a heavy query?

Select purchaseID From purchaseTable Where itemname In('peanut butter', 'Jelly')
Group by purchaseID
Having Count(*) = 2 ' or 3 if looking for 3 values

Also lets say if we are dealing with million rows and we add a third table with something like this..

purchaseid, flavors
-----------------------------
2, peanutbutter | jelly | etc

and use use flavors as a "FreeText" field, we can return results much quicker (Using CONTAINS/FreeText)?
just curious

Thanks

>> I'm sorry but I didn't see that, can you please point it out on my screen? <<

I just did. If it's indexed, the WHERE should be there, if not, it doesn't matter, so I commented it out.

I am getting old, but I don't see any WHERE clause in http:#34262950

The first I hear of it is http:#a34262990

Anyway, I would also like to point out that optimally indexed, I truly believe my query will finish faster in general cases. Now, If it's not indexed we don't even need to discuss this, since performance must be a moot point.

ASKER

Adikhan:
I dont think the
HAVING Count()=..
will not work because if the person buys two peanut butters it will return true or
if they purchase 1 peanut butter and 2 jellies it will return false.

..unless the group or freetext parts affect limit the count to exactly what we want.

ASKER

incidentally, the
HAVING MAX(itemname = 'peanut butter')... AND MAX(itemname = 'jelly')
did not work at all,
but i used
HAVING MAX(itemname = 'peanut butter')... AND MIN(itemname = 'jelly')
worked fine but was so slooow.

That may be due to the software I am using to test (msACCESS).

ASKER

..and I didnt need the case statement in the example above = worked just fine.

ASKER

[b]PEANUT BUTTAH JELLY TIME!!!
PEANUT BUTTAH JELLY TIME!!!
PEANUT BUTTAH JELLY TIME!!!

[/b]

OK both seemed to work.
the where exists.. had better performance and the having... was convenient becasue you can have 2 columns showing the example data.