sifugreg
asked on
Help me understand how Microsoft SQL handles Nulls
Help me understand something. I have a database of patients. A small percentage (15%) will actually have a value in the field "MedicalRecordNumber". Of that small percent, many will start with the characters "IN" or "IG" which denotes a particular status. Recently I've been asked to modify a query to omit these users. Seems simple enough but the results are messing with my head.
Total records (select * from patientprofile where pstatus = 'A'; (Active patients) = 4,724
Count of all patients with IG or IN as a prefix
Logically speaking, the inverse should be obvious and return the inverse of 4,724 - 531 = 4193 but by simply changing the IN to NOT IN returns 248 which I didn't expect. This apparently has to do with the NULL values being ignored and I have to add "or MedicalRecordNumber is null" to the value. Why is that? And where the heck did the value of 248 come from?? A null value is not in the list values and should therefore be included. (By the way, I get the same behavior if I try to use <> or != instead of the "IN". I've worked with Oracle for many years and this doesn't seem to be the case. I really could use an explanation or understanding of why this scenario occurs.
Total records (select * from patientprofile where pstatus = 'A'; (Active patients) = 4,724
Count of all patients with IG or IN as a prefix
select count(*) from patientprofile
where pstatus = 'A'
and left(MedicalRecordNumber,2 ) in ('IG','IN')
Returns 531where pstatus = 'A'
and left(MedicalRecordNumber,2
Logically speaking, the inverse should be obvious and return the inverse of 4,724 - 531 = 4193 but by simply changing the IN to NOT IN returns 248 which I didn't expect. This apparently has to do with the NULL values being ignored and I have to add "or MedicalRecordNumber is null" to the value. Why is that? And where the heck did the value of 248 come from?? A null value is not in the list values and should therefore be included. (By the way, I get the same behavior if I try to use <> or != instead of the "IN". I've worked with Oracle for many years and this doesn't seem to be the case. I really could use an explanation or understanding of why this scenario occurs.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If there is a patient with MRN = 100 just brought in to emergency and needs to be assigned to a room
1. This new patient's data gets entered into Patient table
A. Just patient info has been entered into table Patient
B. Now it has been decide to put the new patient in room 1 (so MRN as FK in table Room is no longer null.
B. Now Dr B assigned to visit the patient.
Null means not assigned here. Null is not the same as empty string or zero
And we never can say null in one cell is equal to null elsewhere. We cannot compare what doesn't exist.
1. This new patient's data gets entered into Patient table
A. Just patient info has been entered into table Patient
Patient table Room table Doctor table
============== =========================== =================
MRN Name etc. Room# MRN Vacant DrID DoctorID DotcorName
--- --- --- --- --- --- --- ---
100 Joe 1 Null Yes Null 199 DR. A
2 150 No 199 200 DR. B
Because Room# 1 is available has no patient is assigned to it MRN is null so is DrID B. Now it has been decide to put the new patient in room 1 (so MRN as FK in table Room is no longer null.
Patient table Room table Doctor table
============== =========================== =================
MRN Name etc. Room# MRN Vacant DrID DoctorID DotcorName
--- --- --- --- --- --- --- ---
100 Joe 1 100 Yes Null 199 DR. A
2 150 No 199 200 DR. B
But still DrID in Room table is null (meaning no doctor assigned to visit MRN =100 B. Now Dr B assigned to visit the patient.
Patient table Room table Doctor table
============== =========================== =================
MRN Name etc. Room# MRN Vacant DrID DoctorID DotcorName
--- --- --- --- --- --- --- ---
100 Joe 1 100 Yes 200 199 DR. A
2 150 No 199 200 DR. B
Null means not assigned here. Null is not the same as empty string or zero
And we never can say null in one cell is equal to null elsewhere. We cannot compare what doesn't exist.
Also try:
select count(*) from patientprofile
where pstatus Not IN('IG','IN','A')
to see 248 records. They may have a different status you don't know of.
select count(*) from patientprofile
where pstatus Not IN('IG','IN','A')
to see 248 records. They may have a different status you don't know of.
ASKER
Thanks, it makes sense but it is interesting that querying without specifications shows all fields with null values but when you specify a NOT it omits them. A little confusing. If ind it interesting that MS states "No two null values are equal." which lends itself toward supporting being included in a query that states "show me all cars that aren't red" to include cars that have not yet been painted. But what do I know ;-). I'll tear through the document you provided to see if it makes more sense.
ASKER
eghtebas,
PSTATUS can't be null and is only A,I,O or X. IG and IN were the first two characters of the MedicalRecordNumber.
PSTATUS can't be null and is only A,I,O or X. IG and IN were the first two characters of the MedicalRecordNumber.
NULL is the absence of value (in that location)
Because NULL has no value it cannot be equal to any compared value.
Nor can NULL be unequal to a compared value because you have to know if it equals something before you can decide it does not equal a value.
MedicalRecordNumber IN ('IG','IN')
is comparing the field to values, locating just those values that are equal to the listed values
The reverse:
MedicalRecordNumber NOT IN ('IG','IN')
is comparing the field to values, locating just those values that are NOT equal to the listed values
but because NULL cannot be equal to anything it also cannot be NOT equal either
You have 531 records that meet IN ('IG','IN') and 248 records that meet NOT IN ('IG','IN')
531 + 248 = 779 records where MedicalRecordNumber IS NOT NULL
4,724 - 779 = 3945 records where MedicalRecordNumber IS NULL
try this:
select count(*) from patientprofile
where pstatus = 'A'
and MedicalRecordNumber IS NULL
See this blog:
http://weblogs.sqlteam.com/markc/archive/2009/06/08/60929.aspx
-------------------------- ---------- ----
Note:
Using LEFT() or ISNULL() on MedicalRecordNumber in the where clause will remove the ability of the query to use an index on the affected field. Try to avoid these "non-sargable" constructs.
Because NULL has no value it cannot be equal to any compared value.
Nor can NULL be unequal to a compared value because you have to know if it equals something before you can decide it does not equal a value.
MedicalRecordNumber IN ('IG','IN')
is comparing the field to values, locating just those values that are equal to the listed values
Using IN ('IG','IN') it is the equivalent of:
( MedicalRecordNumber = 'IG' OR MedicalRecordNumber = 'IN' )
( MedicalRecordNumber = 'IG' OR MedicalRecordNumber = 'IN' )
The reverse:
MedicalRecordNumber NOT IN ('IG','IN')
is comparing the field to values, locating just those values that are NOT equal to the listed values
but because NULL cannot be equal to anything it also cannot be NOT equal either
Using NOT IN ('IG','IN') it is the equivalent of:
NOT ( MedicalRecordNumber = 'IG' OR MedicalRecordNumber = 'IN' )
NOT ( MedicalRecordNumber = 'IG' OR MedicalRecordNumber = 'IN' )
You have 531 records that meet IN ('IG','IN') and 248 records that meet NOT IN ('IG','IN')
531 + 248 = 779 records where MedicalRecordNumber IS NOT NULL
4,724 - 779 = 3945 records where MedicalRecordNumber IS NULL
try this:
select count(*) from patientprofile
where pstatus = 'A'
and MedicalRecordNumber IS NULL
See this blog:
http://weblogs.sqlteam.com/markc/archive/2009/06/08/60929.aspx
--------------------------
Note:
Using LEFT() or ISNULL() on MedicalRecordNumber in the where clause will remove the ability of the query to use an index on the affected field. Try to avoid these "non-sargable" constructs.
sargable: Search ARGument ABLE
http://en.wikipedia.org/wiki/Sargable
Rules of thumb
Avoid applying functions on data values in a sql condition.
Avoid non-sargable predicates and replace them with sargable equivalents.
http://en.wikipedia.org/wiki/Sargable
Rules of thumb
Avoid applying functions on data values in a sql condition.
Avoid non-sargable predicates and replace them with sargable equivalents.
too late is seems
You have chosen a non-sargable solution regrettably.
compactness does not immediately mean better
You have chosen a non-sargable solution regrettably.
compactness does not immediately mean better
select count(*) from patientprofile
where pstatus IS NULL
MedicalRecordNumber is either PK (never is null) or FK (by definition could be null) but highly unlikely in medical business for this field.
Null is empty (missing) entry.