Link to home
Start Free TrialLog in
Avatar of sifugreg
sifugregFlag for United States of America

asked on

Help me understand how Microsoft SQL handles Nulls

Help me understand something.  I have a database of patients.  A small percentage (15%) will actually have a value in the field "MedicalRecordNumber".  Of that small percent, many will start with the characters "IN" or "IG" which denotes a particular status.  Recently I've been asked to modify a query to omit these users.  Seems simple enough but the results are messing with my head.  

Total records (select * from patientprofile where pstatus = 'A'; (Active patients) = 4,724

Count of all patients with IG or IN as a prefix

select count(*) from patientprofile
        where pstatus = 'A'
        and left(MedicalRecordNumber,2) in ('IG','IN')
Returns 531

Logically speaking, the inverse should be obvious and return the inverse of 4,724 - 531 = 4193 but by simply changing the IN to NOT IN returns 248 which I didn't expect.  This apparently has to do with the NULL values being ignored and I have to add "or MedicalRecordNumber is null" to the value.  Why is that?  And where the heck did the value of 248 come from??  A null value is not in the list values and should therefore be included. (By the way, I get the same behavior if I try to use <>  or != instead of the "IN".   I've worked with Oracle for many years and this doesn't seem to be the case.  I really could use an explanation or understanding of why this scenario occurs.
Avatar of Mike Eghtebas
Mike Eghtebas
Flag of United States of America image

See if you get 248 for

select count(*) from patientprofile
        where pstatus IS NULL

MedicalRecordNumber  is either PK (never is null) or FK (by definition could be null) but highly unlikely in medical business for this field.

Null is empty (missing) entry.
ASKER CERTIFIED SOLUTION
Avatar of Valliappan AN
Valliappan AN
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If there is a patient with MRN = 100 just brought in to emergency and needs to be assigned to a room

1. This new patient's data gets entered into Patient table

A. Just patient info has been entered into table Patient
Patient table                Room table                                Doctor table
==============              ===========================            =================
MRN       Name   etc.       Room#    MRN     Vacant  DrID          DoctorID    DotcorName
---       ---               ---      ---     ---       ---          ---       ---
100       Joe                1      Null      Yes     Null          199     DR. A
                             2       150      No      199           200     DR. B

Open in new window

Because Room# 1 is available has no patient is assigned to it MRN is null so is DrID

B. Now it has been decide to put the new patient in room 1 (so MRN as FK in table Room is no longer null.
Patient table                Room table                                Doctor table
==============              ===========================            =================
MRN       Name   etc.       Room#    MRN     Vacant  DrID          DoctorID    DotcorName
---       ---               ---      ---     ---       ---          ---       ---
100       Joe                1      100      Yes       Null          199     DR. A
                             2       150      No      199           200     DR. B

Open in new window

But still DrID in  Room table is null (meaning no doctor assigned to visit MRN =100

B. Now Dr B assigned to visit the patient.
Patient table                Room table                                Doctor table
==============              ===========================            =================
MRN       Name   etc.       Room#    MRN     Vacant  DrID          DoctorID    DotcorName
---       ---               ---      ---     ---       ---          ---       ---
100       Joe                1      100      Yes       200          199     DR. A
                             2       150      No      199           200     DR. B

Open in new window


Null means not assigned here. Null is not the same as empty string or zero

And we never can say null in one cell is equal to null elsewhere. We cannot compare what doesn't exist.
Also try:

select count(*) from patientprofile
        where pstatus Not IN('IG','IN','A')

to see 248 records. They may have a different status you don't know of.
Avatar of sifugreg

ASKER

Thanks, it makes sense but it is interesting that querying without specifications shows all fields with null values but when you specify a NOT it omits them.  A little confusing.  If ind it interesting that MS states "No two null values are equal." which lends itself toward supporting being included in a query that states "show me all cars that aren't red" to include cars that have not yet been painted.  But what do I know ;-).  I'll tear through the document you provided to see if it makes more sense.
eghtebas,

PSTATUS can't be null and is only A,I,O or X.  IG and IN were the first two characters of the MedicalRecordNumber.
Avatar of PortletPaul
NULL is the absence of value (in that location)

Because NULL has no value it cannot be equal to any compared value.
Nor can NULL be unequal to a compared value because you have to know if it equals something before you can decide it does not equal a value.

MedicalRecordNumber IN ('IG','IN')

is comparing the field to values, locating just those values that are equal to the listed values

Using IN ('IG','IN') it is the equivalent of:

( MedicalRecordNumber  = 'IG' OR MedicalRecordNumber  = 'IN' )

The reverse:
MedicalRecordNumber NOT IN ('IG','IN')

is comparing the field to values, locating just those values that are NOT equal to the listed values
but because NULL cannot be equal to anything it also cannot be NOT equal either

Using NOT IN ('IG','IN') it is the equivalent of:

NOT ( MedicalRecordNumber  = 'IG' OR MedicalRecordNumber  = 'IN' )

You have 531 records that meet IN ('IG','IN') and 248 records that meet NOT IN ('IG','IN')

531 + 248 = 779 records where MedicalRecordNumber  IS NOT NULL

4,724 - 779 = 3945 records where MedicalRecordNumber  IS NULL

try this:

select count(*) from patientprofile
where pstatus = 'A'
        and MedicalRecordNumber  IS NULL


See this blog:
http://weblogs.sqlteam.com/markc/archive/2009/06/08/60929.aspx

----------------------------------------
Note:
Using LEFT() or ISNULL() on MedicalRecordNumber  in the where clause will remove the ability of the query to use an index on the affected field. Try to avoid these "non-sargable" constructs.

sargable: Search ARGument ABLE
http://en.wikipedia.org/wiki/Sargable

Rules of thumb
Avoid applying functions on data values in a sql condition.
Avoid non-sargable predicates and replace them with sargable equivalents.
too late is seems

You have chosen a non-sargable solution regrettably.

compactness does not immediately mean better