Link to home
Start Free TrialLog in
Avatar of Tsoukias
Tsoukias

asked on

IF EXISTS versus IF NOT EXISTS problem.

I would like to know if IF EXISTS is better than IF NOT EXISTS.  I have a 'settings' table which I am building for my application.  There are 10 settings a member can choose from  on my site.  By default all settings are ticked, so in my 'settings' table I am trying to decide whether to insert all rows into that table when a member joins (10 rows for 10 settings).  And then in my procs I would have to do IF EXISTS if I am to allow a certain action pertaining to a setting.  So if I have 1 million members that's a potential 10 million rows in this table (it is a basic table though, just 3 fields).

On the other hand I could have no settings in this table, and when a member unticks a setting I would store it in there, so the table in effect stores the settings the member DOESN'T want to allow.  And so in my procs I would then do IF NOT EXISTS when testing for it.  

My question is, I'm assuming IF NOT EXISTS is slower than IF EXISTS, because IF NOT EXISTS has to theoritically check the whole table? Whereas IF EXISTS stops when it finds a match?  But the only way I can use IF EXISTS is if I am storing all settings by default (and removing a row when members untick them).  So the table then would be much much bigger than if I was doing the opposite and not storing the settings.  Which open is going to be quicker and better for the app?
ASKER CERTIFIED SOLUTION
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of JR2003
JR2003

There will be no difference in performance between the following 2 equivalent statements

IF EXISTS(SELECT 1 FROM Table1 WHERE Col1 = 'X')
BEGIN
    -- Code here
END


IF NOT EXISTS(SELECT 1 FROM Table1 WHERE Col1 = 'X')
BEGIN
END
ELSE
BEGIN
    -- Code here
END

This is because the hard work is done by the 'select' query not the 'if' statement. And the select query is the same in both cases.
just to complete JR2003's: There will be no difference in performance ...
... IF there is an index on field Col1

if there is NO index on the field, both queries will potentially have to scan through the entire table.
when there is no match, both have to do the full table scan.
if there is a match, both can stop at the first match
There is one way you could speed this up.  Put a clustered index on the member_id of the field (not a primary key though, as it can be in there 10 times).  That way, all 10 settings can be found in 1 lookup.  Will it be noticably faster than just regular indexes?  Not really, unless you're running the query thousands of times.
I think the answer to your question depends on the data. i.e. Is a member going to have on average more than or less than half the options selected. If the answer less than half then the table would have less rows if you store the setting and so be slightly quicker to query. If the answer is that on average more than half (5) options will be selected then storing options that aren't selected will be slightly quicker as the table will be smaller.

Myself, I would choose whatever makes it easier to handle logically (storing settings that a member has), rather than designing it for performance as you will find that it is very fast to do lookups on the table as long as you have it correctly indexed.

The choice of a good index will be the main deciding factor in performance. I think you should have a unique compound index on MemberId, SettingId. It should then be fast to lookup to see if a member has a particular setting
I just thought of something else regarding a clustered index.  Without a clustered index, you might have each setting stored in 10 different places physically on the disk.  That means the hard drive may have to read a total of up to 10 pages before having all the potential data required.  A clustered index will guarantee that you read only one page, at the most two if the data for a particular member is right near the page boundary.  By cluster indexing the member id field, you should theoretically have up to a 10x increase in speed over anything else you could do.
In this BASIC example table created below:
We are looking to find the employees who DOES NOT manage anyone

Basically the difference is that NOT IN will do a cross-join and then do a row by row comparison (SLOW...SLOW...SLOW) of the fields in subquery's WHERE stipulations...

SQL> select count(*) from emp_master where emp_nbr not in ( select mgr_nbr from emp_master );

EMP_NBR   |   Subquery.MGR_NBR    | MGR_NBR not in Subquery_MGR ?
==========|===================|==========================
1|3|TRUE (1)
2|3|TRUE (2)
3|{3}|FALSE (3)

1|NULL|NULL==FALSE(1)
2|NULL|NULL==FALSE(2)
3|NULL|NULL==FALSE(3)

1|{1}|FALSE(1)
2|1|TRUE(2)
3|1|TRUE(3)

PROCESS:
ALL (1)=FALSE
ALL (2)=FALSE
ALL (3)=FALSE
(3/3) FALSE
COUNT(*) = 0

NULLs values are treated differently in NOT IN
NULL = NULL == NULL (Boolean = FALSE)
NULL != NULL == NULL (Boolean = FALSE)


With NOT EXIST the search is done by index instead of cross join so:

SQL> select count(*) from emp_master T1 where not exists ( select 1 from emp_master T2 where t2.mgr_nbr = t1.emp_nbr );

EMP_NBR   |   Subquery.MGR_NBR    | MGR_NBR not exist in Subquery_MGR ?
==========|===================|==========================
1|3,NULL,{1}|FALSE (1)
2|3,NULL,1|TRUE (2) {Since 2!=NULL is TRUE with not EXIST}
3|{3},NULL,1|FALSE (3)

(2/3) FALSE
COUNT(*) = 1

Thus the correct result in the case would be by using NOT EXIST opposed to NOT IN







***** ULTIMATELY *****

The differences are:
IN and NOT IN use cross joins (SLOW) and treat NULL like:
value=NULL==FALSE
value!=NULL==FALSE

EXIST and NOT EXIST use indexes and treat NULL like:
value=NULL==FALSE
value!=NULL==TRUE


So when finding if in fact a record exists or not use EXIST or NOT EXIST, respectively.







***** NOTE ON DELETING RECORDS *****

When DELETING RECORDS that don't exist in another table, use EXISTS and NOT EXIST to avoid causing a SYNTAX ERROR (in mySQL) that occurs when using NOT IN in a DELETE QUERY

Classic Example:

DELETE FROM suppliers
WHERE supplier_id NOT IN (SELECT orders.supplier_id FROM orders);
{ERROR}


DELETE FROM suppliers
WHERE NOT EXISTS (SELECT * FROM orders WHERE suppliers.supplier_id = orders.supplier_id);

{Successful deletion of all suppliers who have NO orders from database}
CREATE TABLE EMP_MASTER
(
EMP_NBR NUMBER(10) NOT NULL PRIMARY KEY,
EMP_NAME VARCHAR2(20 CHAR),
MGR_NBR NUMBER(10) NULL
)
 
INSERT INTO EMP_MASTER VALUES (1, JOHN, 3);
INSERT INTO EMP_MASTER VALUES (2, JOE, NULL);
INSERT INTO EMP_MASTER VALUES (3, JANE, 1);

Open in new window