Solved

IF EXISTS versus IF NOT EXISTS problem.

Posted on 2007-04-06
10
502 Views
Last Modified: 2011-10-03
I would like to know if IF EXISTS is better than IF NOT EXISTS.  I have a 'settings' table which I am building for my application.  There are 10 settings a member can choose from  on my site.  By default all settings are ticked, so in my 'settings' table I am trying to decide whether to insert all rows into that table when a member joins (10 rows for 10 settings).  And then in my procs I would have to do IF EXISTS if I am to allow a certain action pertaining to a setting.  So if I have 1 million members that's a potential 10 million rows in this table (it is a basic table though, just 3 fields).

On the other hand I could have no settings in this table, and when a member unticks a setting I would store it in there, so the table in effect stores the settings the member DOESN'T want to allow.  And so in my procs I would then do IF NOT EXISTS when testing for it.  

My question is, I'm assuming IF NOT EXISTS is slower than IF EXISTS, because IF NOT EXISTS has to theoritically check the whole table? Whereas IF EXISTS stops when it finds a match?  But the only way I can use IF EXISTS is if I am storing all settings by default (and removing a row when members untick them).  So the table then would be much much bigger than if I was doing the opposite and not storing the settings.  Which open is going to be quicker and better for the app?
0
Comment
Question by:Tsoukias
  • 2
  • 2
  • 2
  • +2
10 Comments
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 25 total points
Comment Utility
> if IF EXISTS is better than IF NOT EXISTS.  
they do not the same, so how do you want to compare them?
note that exists is usally faster, as it can stop on a first match. not exists has to validate the entire range.
depending however on what condition the subquery has to fulfil vs what index is there, there is no measurable difference.

> because IF NOT EXISTS has to theoritically check the whole table?
I repeat: only the range (means that if you have a "unique" index range that supports the where clause to check, not exists will be as efficient as exists
0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 25 total points
Comment Utility
i agree with angeliii , there shouldn't really be any difference  between an exists/not exists test provided you have an index that matches the basic  test criteria. its usually down to a matter of style, in that a positive check is often easier to understand.

you can achieve better performance if the use/non use of NOT for the EXISTS means that a simpler condition test is then allowed within the subquery  (e.g. in general avoid the use of the NOT keyword in condition testting (or <>))

10 million rows properly indexed should be of no concern to your application and be a negligble space concern as well.

 
0
 
LVL 18

Expert Comment

by:JR2003
Comment Utility
There will be no difference in performance between the following 2 equivalent statements

IF EXISTS(SELECT 1 FROM Table1 WHERE Col1 = 'X')
BEGIN
    -- Code here
END


IF NOT EXISTS(SELECT 1 FROM Table1 WHERE Col1 = 'X')
BEGIN
END
ELSE
BEGIN
    -- Code here
END

This is because the hard work is done by the 'select' query not the 'if' statement. And the select query is the same in both cases.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
just to complete JR2003's: There will be no difference in performance ...
... IF there is an index on field Col1

if there is NO index on the field, both queries will potentially have to scan through the entire table.
when there is no match, both have to do the full table scan.
if there is a match, both can stop at the first match
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 8

Expert Comment

by:Brain2000
Comment Utility
There is one way you could speed this up.  Put a clustered index on the member_id of the field (not a primary key though, as it can be in there 10 times).  That way, all 10 settings can be found in 1 lookup.  Will it be noticably faster than just regular indexes?  Not really, unless you're running the query thousands of times.
0
 
LVL 18

Expert Comment

by:JR2003
Comment Utility
I think the answer to your question depends on the data. i.e. Is a member going to have on average more than or less than half the options selected. If the answer less than half then the table would have less rows if you store the setting and so be slightly quicker to query. If the answer is that on average more than half (5) options will be selected then storing options that aren't selected will be slightly quicker as the table will be smaller.

Myself, I would choose whatever makes it easier to handle logically (storing settings that a member has), rather than designing it for performance as you will find that it is very fast to do lookups on the table as long as you have it correctly indexed.

The choice of a good index will be the main deciding factor in performance. I think you should have a unique compound index on MemberId, SettingId. It should then be fast to lookup to see if a member has a particular setting
0
 
LVL 8

Expert Comment

by:Brain2000
Comment Utility
I just thought of something else regarding a clustered index.  Without a clustered index, you might have each setting stored in 10 different places physically on the disk.  That means the hard drive may have to read a total of up to 10 pages before having all the potential data required.  A clustered index will guarantee that you read only one page, at the most two if the data for a particular member is right near the page boundary.  By cluster indexing the member id field, you should theoretically have up to a 10x increase in speed over anything else you could do.
0
 
LVL 19

Expert Comment

by:NerdsOfTech
Comment Utility
In this BASIC example table created below:
We are looking to find the employees who DOES NOT manage anyone

Basically the difference is that NOT IN will do a cross-join and then do a row by row comparison (SLOW...SLOW...SLOW) of the fields in subquery's WHERE stipulations...

SQL> select count(*) from emp_master where emp_nbr not in ( select mgr_nbr from emp_master );

EMP_NBR   |   Subquery.MGR_NBR    | MGR_NBR not in Subquery_MGR ?
==========|===================|==========================
1|3|TRUE (1)
2|3|TRUE (2)
3|{3}|FALSE (3)

1|NULL|NULL==FALSE(1)
2|NULL|NULL==FALSE(2)
3|NULL|NULL==FALSE(3)

1|{1}|FALSE(1)
2|1|TRUE(2)
3|1|TRUE(3)

PROCESS:
ALL (1)=FALSE
ALL (2)=FALSE
ALL (3)=FALSE
(3/3) FALSE
COUNT(*) = 0

NULLs values are treated differently in NOT IN
NULL = NULL == NULL (Boolean = FALSE)
NULL != NULL == NULL (Boolean = FALSE)


With NOT EXIST the search is done by index instead of cross join so:

SQL> select count(*) from emp_master T1 where not exists ( select 1 from emp_master T2 where t2.mgr_nbr = t1.emp_nbr );

EMP_NBR   |   Subquery.MGR_NBR    | MGR_NBR not exist in Subquery_MGR ?
==========|===================|==========================
1|3,NULL,{1}|FALSE (1)
2|3,NULL,1|TRUE (2) {Since 2!=NULL is TRUE with not EXIST}
3|{3},NULL,1|FALSE (3)

(2/3) FALSE
COUNT(*) = 1

Thus the correct result in the case would be by using NOT EXIST opposed to NOT IN







***** ULTIMATELY *****

The differences are:
IN and NOT IN use cross joins (SLOW) and treat NULL like:
value=NULL==FALSE
value!=NULL==FALSE

EXIST and NOT EXIST use indexes and treat NULL like:
value=NULL==FALSE
value!=NULL==TRUE


So when finding if in fact a record exists or not use EXIST or NOT EXIST, respectively.







***** NOTE ON DELETING RECORDS *****

When DELETING RECORDS that don't exist in another table, use EXISTS and NOT EXIST to avoid causing a SYNTAX ERROR (in mySQL) that occurs when using NOT IN in a DELETE QUERY

Classic Example:

DELETE FROM suppliers
WHERE supplier_id NOT IN (SELECT orders.supplier_id FROM orders);
{ERROR}


DELETE FROM suppliers
WHERE NOT EXISTS (SELECT * FROM orders WHERE suppliers.supplier_id = orders.supplier_id);

{Successful deletion of all suppliers who have NO orders from database}
CREATE TABLE EMP_MASTER

(

EMP_NBR NUMBER(10) NOT NULL PRIMARY KEY,

EMP_NAME VARCHAR2(20 CHAR),

MGR_NBR NUMBER(10) NULL

)
 

INSERT INTO EMP_MASTER VALUES (1, JOHN, 3);

INSERT INTO EMP_MASTER VALUES (2, JOE, NULL);

INSERT INTO EMP_MASTER VALUES (3, JANE, 1);

Open in new window

0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
JSON is being used more and more, besides XML, and you surely wanted to parse the data out into SQL instead of doing it in some Javascript. The below function in SQL Server can do the job for you, returning a quick table with the parsed data.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now