Solved

Dealing with NuLL Values returns no data

Posted on 2016-09-14
16
53 Views
Last Modified: 2016-09-16
I want to filter out patients who have a 1, or a 2 in the following field, leaving patients who either have a NULL value or any of the other values.

When I put in this filter
pt.PATIENT_TYPE_C not in (1,2) 

Open in new window

it doesn't return anyone because the patients currently have all NULL values in that field.

PATIENT_TYPE_C      NAME
1      Anonymous
2      Confidential
3      Employee
4      Dismissed
5      Verified with no Medicare
6      Prisoner
7      VIP

How can I write my filter to account for the NULL value?
0
Comment
Question by:Becky Edwards
  • 4
  • 3
  • 3
  • +4
16 Comments
 
LVL 69

Accepted Solution

by:
Éric Moreau earned 500 total points
ID: 41798578
try this:
isnull(pt.PATIENT_TYPE_C, 0) not in (1,2) 

Open in new window

0
 
LVL 40

Expert Comment

by:Kyle Abrahams
ID: 41798591
or you  could do

pt.Patient_Type_C is not null and pt.Patient_Type_C not in (1,2)

Open in new window

0
 
LVL 65

Expert Comment

by:Jim Horn
ID: 41798611
>How can I write my filter to account for the NULL value?
The above expert answers are correct, but just to add some context NULL cannot be compared to anything, so NULL NOT IN (1,2) does not return True, hence why your query returns no rows.

So to pull this off you'll have to convert every single time to a non-NULL value that can participate in expressions.
0
 
LVL 14

Expert Comment

by:Megan Brooks
ID: 41798672
NULL and NOT IN do not get along well. When you see NOT IN against a nullable field, expect trouble. The workarounds above are reasonable -- I use them both. Ideally, one would design the tables to not use NULLs where they are not required and avoid all the issues that NULL can introduce.

Another workaround is to create a non-nullable persisted calculated column that mirrors the original column but replaces NULL with an otherwise invalid value such as zero. Queries that were written for the original column still work, and you can query against the new one when you need to. If the table is huge, however, you might want to think twice before doing that -- you do need to know what you are doing, and be aware of the potential for performance impact and fragmentation.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 41798819
Don't use function(s) against a column in the WHERE clause (unless absolutely required) because it can hurt performance.

You must check for NULL directly, using IS NULL (or IS NOT NULL):

(pt.PATIENT_TYPE is null OR pt.PATIENT_TYPE_C not in (1,2) )
0
 
LVL 14

Expert Comment

by:Megan Brooks
ID: 41798853
The OR tends to impact performance about as much as wrapping the column in a function, if there is an otherwise a suitable index for the predicate. If there is no such index, it probably isn't going to matter anyway. If there is a suitable index, SQL Server might do a partial scan (whether using OR or the function, either way), which can work out fine, sometimes. As always, "it depends."

An index against a non-persisted computed column that copies the original column but eliminates the NULL, along with any other columns needed, is another possibility, if all the restrictions on doing that can be met. It has the advantage of not messing with the clustered index rows. (Disclaimer: I haven't actually done this - I always ran into restrictions.) Not available before SQL 2012, I don't believe.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 41798865
The OR tends to impact performance about as much as wrapping the column in a function

Actually it doesn't.  That's easy to prove for yourself by building a reasonably large sample table, say 1M rows, and testing column IS NULL vs NULL(column, ...)
0
Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

 
LVL 14

Expert Comment

by:Megan Brooks
ID: 41799022
It's not so easy, because it can depend on so many factors. I tested on a 30,000,000 row table with a 2-predicate ANDed search chosen to return just 4-6 rows, after adding a matching index to the table, and it actually read 4-6 index rows, sometimes. Switching to OR (the predicate was chosen to return few enough rows in this case to not run out of memory) caused it to read 30,000,000 rows. Changing back to AND and wrapping column 1 (in the index) with a do-nothing function - COALESCE(xxx, NULL) - had the same effect (30 million).

Wrapping the column matching column 2 of the index had less effect due to a partial index scan, although the size of the scan (up to 19,000 rows) varied depending on the exact value of the 2nd test due, apparently, to the fact that the second column was nullable. I thought that was a little weird, but not unusual.

The 30,000,000 row reads correspond to non-SARGable predicates which prevent use of the index. A non-SARGable predicate that still allows some of the index columns to be used may have less effect. If table sizes are small, none of this may matter at all.

On my machine, the 30 million reads took about the same real time as the 4 reads. Apparently, the index fit entirely in the cache, and there was no noticeable performance degradation.

I didn't try the exact scenarios above because I didn't have a database sitting around that contained large amounts of the right data to do that. It is a different case and it could behave quite differently because IN is at least partially SARGable. Which is interesting, because it is a form of OR, and there may be other situations where OR can use an index.

Performance is also, obviously, highly dependent upon which indexes have been provided. If you think it might work, run it, look at the execution plan, and if it does work then for today, at least for a little while, it works. If it doesn't, add or change indexes and repeat, trying not to break anything else.
0
 
LVL 23

Expert Comment

by:Pawan Kumar
ID: 41799050
Best approach use below

SELECT [all_column_rquired]
From [yourtable] x
OUTER APPLY
(
	SELECT * FROM 
	(
		SELECT 1 a
		UNION ALL
		SELECT 2
	)t
	WHERE x.[pt.PATIENT_TYPE_C] = t.a
)r
WHERE x.[pt.PATIENT_TYPE_C] IS NULL

Open in new window

0
 
LVL 23

Expert Comment

by:Pawan Kumar
ID: 41799051
In a nutshell ...

NOT In pathetic in terms of performance,...
0
 

Author Comment

by:Becky Edwards
ID: 41801531
Thanks everyone.  I will try these when I get to a computer.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 41801582
@Megan Brooks

You need to provide the specific code and test(s) you ran.  An "IS NULL OR" condition does not prevent index seeks.
0
 

Author Comment

by:Becky Edwards
ID: 41802059
Jim You are correct.  I am not worried so much about performance issues for this one because the database has thousands of tiny tables rather than large ones.  I typically have to join 15 to 20 tables to get any decent data.  Pretty sure this table is not indexed either.
0
 

Author Closing Comment

by:Becky Edwards
ID: 41802077
This worked perfectly.  I am now getting data.  Thank you Eric, and everyone else for your great comments and assistance!   I know a WHOLE lot more about NULL values than ever....  :)
0
 
LVL 14

Expert Comment

by:Megan Brooks
ID: 41802087
If performance isn't an issue, just use IS NULL OR or ISNULL(<column>, <something not used>) keep NOT IN from seeing the NULL.

If you do a lot of reporting on the database then you might want to consider extracting to separate reporting database that is customized for your reporting requirements. Writing reports against a highly normalized or highly hacked-on database can be a lot of trouble, especially if the schema changes periodically, as might be the case if you are querying against a vendor-supplied system, and in particular any EHR system I have seen.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I'm trying, I really am. But I've seen so many wrong approaches involving date(time) boundaries I despair about my inability to explain it. I've seen quite a few recently that define a non-leap year as 364 days, or 366 days and the list goes on. …
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

932 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now