Link to home
Start Free TrialLog in
Avatar of Becky Edwards
Becky EdwardsFlag for United States of America

asked on

Dealing with NuLL Values returns no data

I want to filter out patients who have a 1, or a 2 in the following field, leaving patients who either have a NULL value or any of the other values.

When I put in this filter
pt.PATIENT_TYPE_C not in (1,2) 

Open in new window

it doesn't return anyone because the patients currently have all NULL values in that field.

PATIENT_TYPE_C      NAME
1      Anonymous
2      Confidential
3      Employee
4      Dismissed
5      Verified with no Medicare
6      Prisoner
7      VIP

How can I write my filter to account for the NULL value?
ASKER CERTIFIED SOLUTION
Avatar of Éric Moreau
Éric Moreau
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
or you  could do

pt.Patient_Type_C is not null and pt.Patient_Type_C not in (1,2)

Open in new window

>How can I write my filter to account for the NULL value?
The above expert answers are correct, but just to add some context NULL cannot be compared to anything, so NULL NOT IN (1,2) does not return True, hence why your query returns no rows.

So to pull this off you'll have to convert every single time to a non-NULL value that can participate in expressions.
NULL and NOT IN do not get along well. When you see NOT IN against a nullable field, expect trouble. The workarounds above are reasonable -- I use them both. Ideally, one would design the tables to not use NULLs where they are not required and avoid all the issues that NULL can introduce.

Another workaround is to create a non-nullable persisted calculated column that mirrors the original column but replaces NULL with an otherwise invalid value such as zero. Queries that were written for the original column still work, and you can query against the new one when you need to. If the table is huge, however, you might want to think twice before doing that -- you do need to know what you are doing, and be aware of the potential for performance impact and fragmentation.
Don't use function(s) against a column in the WHERE clause (unless absolutely required) because it can hurt performance.

You must check for NULL directly, using IS NULL (or IS NOT NULL):

(pt.PATIENT_TYPE is null OR pt.PATIENT_TYPE_C not in (1,2) )
The OR tends to impact performance about as much as wrapping the column in a function, if there is an otherwise a suitable index for the predicate. If there is no such index, it probably isn't going to matter anyway. If there is a suitable index, SQL Server might do a partial scan (whether using OR or the function, either way), which can work out fine, sometimes. As always, "it depends."

An index against a non-persisted computed column that copies the original column but eliminates the NULL, along with any other columns needed, is another possibility, if all the restrictions on doing that can be met. It has the advantage of not messing with the clustered index rows. (Disclaimer: I haven't actually done this - I always ran into restrictions.) Not available before SQL 2012, I don't believe.
The OR tends to impact performance about as much as wrapping the column in a function

Actually it doesn't.  That's easy to prove for yourself by building a reasonably large sample table, say 1M rows, and testing column IS NULL vs NULL(column, ...)
It's not so easy, because it can depend on so many factors. I tested on a 30,000,000 row table with a 2-predicate ANDed search chosen to return just 4-6 rows, after adding a matching index to the table, and it actually read 4-6 index rows, sometimes. Switching to OR (the predicate was chosen to return few enough rows in this case to not run out of memory) caused it to read 30,000,000 rows. Changing back to AND and wrapping column 1 (in the index) with a do-nothing function - COALESCE(xxx, NULL) - had the same effect (30 million).

Wrapping the column matching column 2 of the index had less effect due to a partial index scan, although the size of the scan (up to 19,000 rows) varied depending on the exact value of the 2nd test due, apparently, to the fact that the second column was nullable. I thought that was a little weird, but not unusual.

The 30,000,000 row reads correspond to non-SARGable predicates which prevent use of the index. A non-SARGable predicate that still allows some of the index columns to be used may have less effect. If table sizes are small, none of this may matter at all.

On my machine, the 30 million reads took about the same real time as the 4 reads. Apparently, the index fit entirely in the cache, and there was no noticeable performance degradation.

I didn't try the exact scenarios above because I didn't have a database sitting around that contained large amounts of the right data to do that. It is a different case and it could behave quite differently because IN is at least partially SARGable. Which is interesting, because it is a form of OR, and there may be other situations where OR can use an index.

Performance is also, obviously, highly dependent upon which indexes have been provided. If you think it might work, run it, look at the execution plan, and if it does work then for today, at least for a little while, it works. If it doesn't, add or change indexes and repeat, trying not to break anything else.
Best approach use below

SELECT [all_column_rquired]
From [yourtable] x
OUTER APPLY
(
	SELECT * FROM 
	(
		SELECT 1 a
		UNION ALL
		SELECT 2
	)t
	WHERE x.[pt.PATIENT_TYPE_C] = t.a
)r
WHERE x.[pt.PATIENT_TYPE_C] IS NULL

Open in new window

In a nutshell ...

NOT In pathetic in terms of performance,...
Avatar of Becky Edwards

ASKER

Thanks everyone.  I will try these when I get to a computer.
@Megan Brooks

You need to provide the specific code and test(s) you ran.  An "IS NULL OR" condition does not prevent index seeks.
Jim You are correct.  I am not worried so much about performance issues for this one because the database has thousands of tiny tables rather than large ones.  I typically have to join 15 to 20 tables to get any decent data.  Pretty sure this table is not indexed either.
This worked perfectly.  I am now getting data.  Thank you Eric, and everyone else for your great comments and assistance!   I know a WHOLE lot more about NULL values than ever....  :)
If performance isn't an issue, just use IS NULL OR or ISNULL(<column>, <something not used>) keep NOT IN from seeing the NULL.

If you do a lot of reporting on the database then you might want to consider extracting to separate reporting database that is customized for your reporting requirements. Writing reports against a highly normalized or highly hacked-on database can be a lot of trouble, especially if the schema changes periodically, as might be the case if you are querying against a vendor-supplied system, and in particular any EHR system I have seen.