Solved

IF EXISTS versus IF NOT EXISTS problem.

Posted on 2007-04-06
10
508 Views
Last Modified: 2011-10-03
I would like to know if IF EXISTS is better than IF NOT EXISTS.  I have a 'settings' table which I am building for my application.  There are 10 settings a member can choose from  on my site.  By default all settings are ticked, so in my 'settings' table I am trying to decide whether to insert all rows into that table when a member joins (10 rows for 10 settings).  And then in my procs I would have to do IF EXISTS if I am to allow a certain action pertaining to a setting.  So if I have 1 million members that's a potential 10 million rows in this table (it is a basic table though, just 3 fields).

On the other hand I could have no settings in this table, and when a member unticks a setting I would store it in there, so the table in effect stores the settings the member DOESN'T want to allow.  And so in my procs I would then do IF NOT EXISTS when testing for it.  

My question is, I'm assuming IF NOT EXISTS is slower than IF EXISTS, because IF NOT EXISTS has to theoritically check the whole table? Whereas IF EXISTS stops when it finds a match?  But the only way I can use IF EXISTS is if I am storing all settings by default (and removing a row when members untick them).  So the table then would be much much bigger than if I was doing the opposite and not storing the settings.  Which open is going to be quicker and better for the app?
0
Comment
Question by:Tsoukias
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
  • 2
  • +2
10 Comments
 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 25 total points
ID: 18868796
> if IF EXISTS is better than IF NOT EXISTS.  
they do not the same, so how do you want to compare them?
note that exists is usally faster, as it can stop on a first match. not exists has to validate the entire range.
depending however on what condition the subquery has to fulfil vs what index is there, there is no measurable difference.

> because IF NOT EXISTS has to theoritically check the whole table?
I repeat: only the range (means that if you have a "unique" index range that supports the where clause to check, not exists will be as efficient as exists
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 25 total points
ID: 18868853
i agree with angeliii , there shouldn't really be any difference  between an exists/not exists test provided you have an index that matches the basic  test criteria. its usually down to a matter of style, in that a positive check is often easier to understand.

you can achieve better performance if the use/non use of NOT for the EXISTS means that a simpler condition test is then allowed within the subquery  (e.g. in general avoid the use of the NOT keyword in condition testting (or <>))

10 million rows properly indexed should be of no concern to your application and be a negligble space concern as well.

 
0
 
LVL 18

Expert Comment

by:JR2003
ID: 18870643
There will be no difference in performance between the following 2 equivalent statements

IF EXISTS(SELECT 1 FROM Table1 WHERE Col1 = 'X')
BEGIN
    -- Code here
END


IF NOT EXISTS(SELECT 1 FROM Table1 WHERE Col1 = 'X')
BEGIN
END
ELSE
BEGIN
    -- Code here
END

This is because the hard work is done by the 'select' query not the 'if' statement. And the select query is the same in both cases.
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 18871850
just to complete JR2003's: There will be no difference in performance ...
... IF there is an index on field Col1

if there is NO index on the field, both queries will potentially have to scan through the entire table.
when there is no match, both have to do the full table scan.
if there is a match, both can stop at the first match
0
 
LVL 8

Expert Comment

by:Brain2000
ID: 18873283
There is one way you could speed this up.  Put a clustered index on the member_id of the field (not a primary key though, as it can be in there 10 times).  That way, all 10 settings can be found in 1 lookup.  Will it be noticably faster than just regular indexes?  Not really, unless you're running the query thousands of times.
0
 
LVL 18

Expert Comment

by:JR2003
ID: 18873682
I think the answer to your question depends on the data. i.e. Is a member going to have on average more than or less than half the options selected. If the answer less than half then the table would have less rows if you store the setting and so be slightly quicker to query. If the answer is that on average more than half (5) options will be selected then storing options that aren't selected will be slightly quicker as the table will be smaller.

Myself, I would choose whatever makes it easier to handle logically (storing settings that a member has), rather than designing it for performance as you will find that it is very fast to do lookups on the table as long as you have it correctly indexed.

The choice of a good index will be the main deciding factor in performance. I think you should have a unique compound index on MemberId, SettingId. It should then be fast to lookup to see if a member has a particular setting
0
 
LVL 8

Expert Comment

by:Brain2000
ID: 18911677
I just thought of something else regarding a clustered index.  Without a clustered index, you might have each setting stored in 10 different places physically on the disk.  That means the hard drive may have to read a total of up to 10 pages before having all the potential data required.  A clustered index will guarantee that you read only one page, at the most two if the data for a particular member is right near the page boundary.  By cluster indexing the member id field, you should theoretically have up to a 10x increase in speed over anything else you could do.
0
 
LVL 19

Expert Comment

by:NerdsOfTech
ID: 24832830
In this BASIC example table created below:
We are looking to find the employees who DOES NOT manage anyone

Basically the difference is that NOT IN will do a cross-join and then do a row by row comparison (SLOW...SLOW...SLOW) of the fields in subquery's WHERE stipulations...

SQL> select count(*) from emp_master where emp_nbr not in ( select mgr_nbr from emp_master );

EMP_NBR   |   Subquery.MGR_NBR    | MGR_NBR not in Subquery_MGR ?
==========|===================|==========================
1|3|TRUE (1)
2|3|TRUE (2)
3|{3}|FALSE (3)

1|NULL|NULL==FALSE(1)
2|NULL|NULL==FALSE(2)
3|NULL|NULL==FALSE(3)

1|{1}|FALSE(1)
2|1|TRUE(2)
3|1|TRUE(3)

PROCESS:
ALL (1)=FALSE
ALL (2)=FALSE
ALL (3)=FALSE
(3/3) FALSE
COUNT(*) = 0

NULLs values are treated differently in NOT IN
NULL = NULL == NULL (Boolean = FALSE)
NULL != NULL == NULL (Boolean = FALSE)


With NOT EXIST the search is done by index instead of cross join so:

SQL> select count(*) from emp_master T1 where not exists ( select 1 from emp_master T2 where t2.mgr_nbr = t1.emp_nbr );

EMP_NBR   |   Subquery.MGR_NBR    | MGR_NBR not exist in Subquery_MGR ?
==========|===================|==========================
1|3,NULL,{1}|FALSE (1)
2|3,NULL,1|TRUE (2) {Since 2!=NULL is TRUE with not EXIST}
3|{3},NULL,1|FALSE (3)

(2/3) FALSE
COUNT(*) = 1

Thus the correct result in the case would be by using NOT EXIST opposed to NOT IN







***** ULTIMATELY *****

The differences are:
IN and NOT IN use cross joins (SLOW) and treat NULL like:
value=NULL==FALSE
value!=NULL==FALSE

EXIST and NOT EXIST use indexes and treat NULL like:
value=NULL==FALSE
value!=NULL==TRUE


So when finding if in fact a record exists or not use EXIST or NOT EXIST, respectively.







***** NOTE ON DELETING RECORDS *****

When DELETING RECORDS that don't exist in another table, use EXISTS and NOT EXIST to avoid causing a SYNTAX ERROR (in mySQL) that occurs when using NOT IN in a DELETE QUERY

Classic Example:

DELETE FROM suppliers
WHERE supplier_id NOT IN (SELECT orders.supplier_id FROM orders);
{ERROR}


DELETE FROM suppliers
WHERE NOT EXISTS (SELECT * FROM orders WHERE suppliers.supplier_id = orders.supplier_id);

{Successful deletion of all suppliers who have NO orders from database}
CREATE TABLE EMP_MASTER
(
EMP_NBR NUMBER(10) NOT NULL PRIMARY KEY,
EMP_NAME VARCHAR2(20 CHAR),
MGR_NBR NUMBER(10) NULL
)
 
INSERT INTO EMP_MASTER VALUES (1, JOHN, 3);
INSERT INTO EMP_MASTER VALUES (2, JOE, NULL);
INSERT INTO EMP_MASTER VALUES (3, JANE, 1);

Open in new window

0

Featured Post

Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to protecting Oracle Database servers and systems, there are a ton of myths out there. Here are the most common.
Recently I was talking with Tim Sharp, one of my colleagues from our Technical Account Manager team about MongoDB’s scalability. While doing some quick training with some of the Percona team, Tim brought something to my attention...
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question