Expiring Today—Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

date of death fact tbl or dimension?

Posted on 2016-08-17
15
Medium Priority
?
104 Views
Last Modified: 2016-08-18
I need to report on date of death.  I think it goes into a fact table since it's something that has happened. But what if the date of death is null? How would I populate it. I don't want a null in fact tbl.   I really don't want to use a dummy date like 9999-12-31. Any suggestions?
0
Comment
Question by:elucero
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
  • 2
  • +1
15 Comments
 
LVL 66

Expert Comment

by:Jim Horn
ID: 41760075
NULL would be appropriate here, as by definition it is the absence of a value.

>I think it goes into a fact table since it's something that has happened.
Not necessarily, as fact tables are measurements, of which date of death would be one.

Unless you want DimDeath, but then your Fact would still join on that table...
1
 

Author Comment

by:elucero
ID: 41760089
Yes, ok NULL does make sense, but won't I have problem with NULL value in a cube?
0
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 41760102
It would depend almost entirely on your database design.

If it's a traditional OLTP, date_of_death would probably just be a column it the table that describes people.

In a data warehouse, the structure of the fact table(s) usually resembles the OLTP's transaction table(s).  Depending on the nature of the business the transactions may be split over multiple fact tables.  An insurance company would probably have 2 fact tables in their D/W.  One for policies/premiums and another for claims.  The two tables would have almost nothing in common except the policy ID so separate tables has advantages.

Does your D/W record multiple events for people?  Birth, graduation, marriage, etc.?  If so, a death record in the fact table is appropriate.
1
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:elucero
ID: 41760163
It's healthcare industry.  If death occurred based on specific disease.   I think it goes into fact tbl??
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 41760219
the date of death is a fact

NULL (or no row) is a fact too (i.e. not dead, or no death recorded)

death as a dimension would not be very helpful as it is a boolean condition (is dead/is not dead)

a time dimension (I assume you have one) would be relevant to the date of death
(e.g. to answer "how many deaths in fiscal year 2014")
1
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 41760228
I can envision a lot of scenarios and the possibility of several fact tables.

How may fact tables are there?  How may of them record people specific events?
1
 
LVL 49

Expert Comment

by:PortletPaul
ID: 41760338
I can envision a lot of scenarios and the possibility of several fact tables.
agreed!
1
 

Author Comment

by:elucero
ID: 41760375
Thanks for the help.  I have one fact so far with case number, bunch of dates including dt of death which will mapped up to datedim.  I'm worried about putting the dt of death in fact because of how analysis services handles NULLS in a cube.  I don't want it to default to a dummy value like '9999-12-31'.
0
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 41760478
If the patient survived, don't put a row into the fact table that marks the date of death.  That would seem to solve the problem.
0
 

Author Comment

by:elucero
ID: 41761026
I would have put the row in the fact if patient survived since other facts in fact table are caseID ,diaganosisdt, submitteddt, deathdt. So I still have the null issue.  Weather they died or survived I would still have values for the rest - caseID ,diaganosisdt, submitteddt.
0
 
LVL 46

Accepted Solution

by:
Kent Olsen earned 2000 total points
ID: 41761086
Ah.  So death is a byproduct of another event, not an event unto itself.  (Never thought I'd hear death described that way, but what the heck....)

With that as a backdrop, the choices would seem to be:

1) Accept that an event that doesn't result in death will have a NULL in the date_of_death column.
2) Default the column to a specific non-null value.  (Very bad idea....)
3) Insert another row in the fact table to provide the death details.

So going back to the original problem, if there is a date_of_death column you're going to have to use 1 or 2 above.  It's not a classic star-schema design, but adding a dimension table on the date_of_death column solves your NULL issue as you'll only include rows with a non-null date in the dimension.  One of the byproduct will be that the optimal tests for survived/died will be slightly different as you can test the dimension table for died, but will have to test the fact table (or outer join the died dimension) to test for survived.  Personally, I'd include the NULL values in the dimension but that seems counter to your wants/needs.

Kent
0
 

Author Comment

by:elucero
ID: 41761149
Thanks, so you're suggesting a death dim which would be like this
caseID
DateofDeath

Then I would do an outer join from factCases to deathdim for my schema fact.caseid = deathdim.caseid.  I think this will work than I can avoid the null values.  Why would add the null values in the deathdim? They only need to report deaths not survived.
0
 

Author Comment

by:elucero
ID: 41761156
I'm thinking now that this would be a good junk dimension since I have other yes/no attributes like case closed, died, hospitalized?  So it would be like this

caseid
dateofdeath
death y/n
datehospitalized
hospitalized y/n
dateclosed
caseclosed y/n
0
 
LVL 46

Expert Comment

by:Kent Olsen
ID: 41761265
If there's exactly on row in the fact table with any given caseid value, this can work.  If a case produce multiple rows any counts (or other aggregate function results) that are based on the dimension will be wrong.
1
 

Author Comment

by:elucero
ID: 41761639
Yes, I think will work.  I'm going to go w/junk dimension.  Thanks for your help, Elizabeth
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When trying to connect from SSMS v17.x to a SQL Server Integration Services 2016 instance or previous version, you get the error “Connecting to the Integration Services service on the computer failed with the following error: 'The specified service …
Ready to get certified? Check out some courses that help you prepare for third-party exams.
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question