Solved

Dealing with NuLL Values returns no data

Posted on 2016-09-14
16
51 Views
Last Modified: 2016-09-16
I want to filter out patients who have a 1, or a 2 in the following field, leaving patients who either have a NULL value or any of the other values.

When I put in this filter
pt.PATIENT_TYPE_C not in (1,2) 

Open in new window

it doesn't return anyone because the patients currently have all NULL values in that field.

PATIENT_TYPE_C      NAME
1      Anonymous
2      Confidential
3      Employee
4      Dismissed
5      Verified with no Medicare
6      Prisoner
7      VIP

How can I write my filter to account for the NULL value?
0
Comment
Question by:Becky Edwards
  • 4
  • 3
  • 3
  • +4
16 Comments
 
LVL 69

Accepted Solution

by:
Éric Moreau earned 500 total points
ID: 41798578
try this:
isnull(pt.PATIENT_TYPE_C, 0) not in (1,2) 

Open in new window

0
 
LVL 40

Expert Comment

by:Kyle Abrahams
ID: 41798591
or you  could do

pt.Patient_Type_C is not null and pt.Patient_Type_C not in (1,2)

Open in new window

0
 
LVL 65

Expert Comment

by:Jim Horn
ID: 41798611
>How can I write my filter to account for the NULL value?
The above expert answers are correct, but just to add some context NULL cannot be compared to anything, so NULL NOT IN (1,2) does not return True, hence why your query returns no rows.

So to pull this off you'll have to convert every single time to a non-NULL value that can participate in expressions.
0
 
LVL 13

Expert Comment

by:Megan Brooks
ID: 41798672
NULL and NOT IN do not get along well. When you see NOT IN against a nullable field, expect trouble. The workarounds above are reasonable -- I use them both. Ideally, one would design the tables to not use NULLs where they are not required and avoid all the issues that NULL can introduce.

Another workaround is to create a non-nullable persisted calculated column that mirrors the original column but replaces NULL with an otherwise invalid value such as zero. Queries that were written for the original column still work, and you can query against the new one when you need to. If the table is huge, however, you might want to think twice before doing that -- you do need to know what you are doing, and be aware of the potential for performance impact and fragmentation.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 41798819
Don't use function(s) against a column in the WHERE clause (unless absolutely required) because it can hurt performance.

You must check for NULL directly, using IS NULL (or IS NOT NULL):

(pt.PATIENT_TYPE is null OR pt.PATIENT_TYPE_C not in (1,2) )
0
 
LVL 13

Expert Comment

by:Megan Brooks
ID: 41798853
The OR tends to impact performance about as much as wrapping the column in a function, if there is an otherwise a suitable index for the predicate. If there is no such index, it probably isn't going to matter anyway. If there is a suitable index, SQL Server might do a partial scan (whether using OR or the function, either way), which can work out fine, sometimes. As always, "it depends."

An index against a non-persisted computed column that copies the original column but eliminates the NULL, along with any other columns needed, is another possibility, if all the restrictions on doing that can be met. It has the advantage of not messing with the clustered index rows. (Disclaimer: I haven't actually done this - I always ran into restrictions.) Not available before SQL 2012, I don't believe.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 41798865
The OR tends to impact performance about as much as wrapping the column in a function

Actually it doesn't.  That's easy to prove for yourself by building a reasonably large sample table, say 1M rows, and testing column IS NULL vs NULL(column, ...)
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 13

Expert Comment

by:Megan Brooks
ID: 41799022
It's not so easy, because it can depend on so many factors. I tested on a 30,000,000 row table with a 2-predicate ANDed search chosen to return just 4-6 rows, after adding a matching index to the table, and it actually read 4-6 index rows, sometimes. Switching to OR (the predicate was chosen to return few enough rows in this case to not run out of memory) caused it to read 30,000,000 rows. Changing back to AND and wrapping column 1 (in the index) with a do-nothing function - COALESCE(xxx, NULL) - had the same effect (30 million).

Wrapping the column matching column 2 of the index had less effect due to a partial index scan, although the size of the scan (up to 19,000 rows) varied depending on the exact value of the 2nd test due, apparently, to the fact that the second column was nullable. I thought that was a little weird, but not unusual.

The 30,000,000 row reads correspond to non-SARGable predicates which prevent use of the index. A non-SARGable predicate that still allows some of the index columns to be used may have less effect. If table sizes are small, none of this may matter at all.

On my machine, the 30 million reads took about the same real time as the 4 reads. Apparently, the index fit entirely in the cache, and there was no noticeable performance degradation.

I didn't try the exact scenarios above because I didn't have a database sitting around that contained large amounts of the right data to do that. It is a different case and it could behave quite differently because IN is at least partially SARGable. Which is interesting, because it is a form of OR, and there may be other situations where OR can use an index.

Performance is also, obviously, highly dependent upon which indexes have been provided. If you think it might work, run it, look at the execution plan, and if it does work then for today, at least for a little while, it works. If it doesn't, add or change indexes and repeat, trying not to break anything else.
0
 
LVL 18

Expert Comment

by:Pawan Kumar Khowal
ID: 41799050
Best approach use below

SELECT [all_column_rquired]
From [yourtable] x
OUTER APPLY
(
	SELECT * FROM 
	(
		SELECT 1 a
		UNION ALL
		SELECT 2
	)t
	WHERE x.[pt.PATIENT_TYPE_C] = t.a
)r
WHERE x.[pt.PATIENT_TYPE_C] IS NULL

Open in new window

0
 
LVL 18

Expert Comment

by:Pawan Kumar Khowal
ID: 41799051
In a nutshell ...

NOT In pathetic in terms of performance,...
0
 

Author Comment

by:Becky Edwards
ID: 41801531
Thanks everyone.  I will try these when I get to a computer.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 41801582
@Megan Brooks

You need to provide the specific code and test(s) you ran.  An "IS NULL OR" condition does not prevent index seeks.
0
 

Author Comment

by:Becky Edwards
ID: 41802059
Jim You are correct.  I am not worried so much about performance issues for this one because the database has thousands of tiny tables rather than large ones.  I typically have to join 15 to 20 tables to get any decent data.  Pretty sure this table is not indexed either.
0
 

Author Closing Comment

by:Becky Edwards
ID: 41802077
This worked perfectly.  I am now getting data.  Thank you Eric, and everyone else for your great comments and assistance!   I know a WHOLE lot more about NULL values than ever....  :)
0
 
LVL 13

Expert Comment

by:Megan Brooks
ID: 41802087
If performance isn't an issue, just use IS NULL OR or ISNULL(<column>, <something not used>) keep NOT IN from seeing the NULL.

If you do a lot of reporting on the database then you might want to consider extracting to separate reporting database that is customized for your reporting requirements. Writing reports against a highly normalized or highly hacked-on database can be a lot of trouble, especially if the schema changes periodically, as might be the case if you are querying against a vendor-supplied system, and in particular any EHR system I have seen.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Via a live example, show how to shrink a transaction log file down to a reasonable size.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now