Solved

Dimensional Model: How Can I Improve a Healthcare Call Centre Star Schema?

Posted on 2014-04-11
10
857 Views
Last Modified: 2016-02-18
ERD---Data-Flow---New-Page--1-.png
Hello experts!
The image shows part of a Dimensional Model design a colleague and I are working on. We are building a data warehouse and BI application using SQL Server and SSAS Tabular Model respectively.

Intro

My colleague settled on the above design because:
CCG Population and GP Population contain numeric data that will be used to calculate “Number of calls per 1000 population”
CCG & GP Population tables contains numeric data used in a measures and should be considered as fact tables.
This method avoids a monster dimension

I feel that there is perhaps a better approach because:
Fact Call Log is a business process, the other two fact tables are not.
GP Population and CCG Population should become dimensional attributes within DIM GP Practice.
In terms of super-simple user interface, I wish to have a single dimension holding this data.

The Challenge

DIM GP Practice is a dimension with many rows.

GP Practice Name, GP Population and CCG population all change independently of each other at different times and frequencies.

Consequently, we need to track 3 changing variables in a single dimension – which may lead to an explosion in the number of rows in DIM GP Practice.

We also need to keep track of history, so the measure uses population figures in 2012 when viewing 2012 data and population figures for 2014 when viewing 2014 data and so forth.

Any thoughts on how best to refine this dimensional model?

Priorities are very easy user interface i.e. end-users have the smallest number of dimensions that they slice and dice by and ease of development to ensure rapid updates and changes going forward with minimal messing around with DAX statements.

Looking forward to your feedback!

Cheers!
0
Comment
Question by:JohnAeris
  • 6
  • 4
10 Comments
 

Author Comment

by:JohnAeris
Comment Utility

Definitions

Tables
Fact Call Log: Every call for a medical centre referral is a transaction.
DIM GP Practice:  a Slowly Changing Dimension with > 10,000 rows of Medical Centre data. Updated quarterly.
FACT CCG Population: Population is updated yearly. Sourced from another business unit.
FACT GP Population: Population is updated quarterly. Sourced from another business unit.

Terms
A CCG consists of many GPs in a specific geographic area. So GP > CCG is a natural hierarchy.
CCG population:  total population in the CCG area.
GP Population: number of registered patients a GP serves.
0
 
LVL 45

Expert Comment

by:Kdo
Comment Utility
Hi John,

Some things to start.  :)

A)  A data warehouse is an algebraic expression that is designed to answer a specific question.  It may answer numerous questions, but it's core is a single answer.
2)  Most data warehouses include a time dimension and utilize count() or sum() over time.
D)  A star schema is typically a fact table with multiple dimensions around it.  There can be multiple fact tables and the fact tables can "share" some/all of the dimension tables.  I don't see that shown in your model.


With that as backdrop, it does look like you're trying to design the database around answering some key questions.  :)

Your colleague's approach may well serve your stated needs, but it's not a true star schema.  GP Population and CCG Population are dimensions (as you suggest) and describe attributes of the calls.

It is the call data that should be the fact table.  When a new call comes in, it can be populated into the fact table as long as the dimension tables contain the items that describe the call.  (The time dimension will have to be pre-populated, any "location" dimension will have to have the city, state, zip, for the caller location, etc.)

“Number of calls per 1000 population” is then computed by joining one (or both) of the population dimensions and the time dimension to the fact table (calls).



I believe that your approach is superior to your colleague's, and is much closer to being a true star schema.


Good Luck,
Kent
0
 

Author Comment

by:JohnAeris
Comment Utility
Hi Kent!

Thanks for your feedback.

The current model includes 30 Dimensions (including Time of day and Date) around a central fact table called CALL LOG. I removed this to make the image clearer.

“multiple fact tables and the fact tables can "share" some/all of the dimension tables”.
Agreed.

My colleague appears to be using the Fact CCG Population and Fact GP Population fact tables as “helper/lookup” tables in order to create the measures. As stated, it works, but I feel this is not the best approach.

“It is the call data that should be the fact table” – perhaps I didn’t make this clear, the Call Log Fact Table holds all the call data.

“I believe that your approach is superior to your colleague's, and is much closer to being a true star schema.” Thank you, not quite there yet though. :)

I still require assistance to create an alternative model that we can agree upon.

The key challenge:
CCG Population and GP population are updated at different times and quite frequently.
A CCG is made up of many GPs and this creates a natural hierarchy and so my initial thought is to incorporate all of GP, CCG and all populations in a single dimension.
However, if I do this, then we will end up with a dimension that changes 3 variables every month or so so it will grow exponentially.

Will it make more sense to separate that GP and CCG populations into a separate dimension even though they relate directly to GP and CCGs in the GP Practice dimension?


Any thoughts?
0
 
LVL 45

Expert Comment

by:Kdo
Comment Utility
Can you describe the CCG Population and GP population tables a bit more?  What data is contained in them, what is their relationship to other tables?  What is the link between them?  Etc...
0
 

Author Comment

by:JohnAeris
Comment Utility

GP Population Output

FACT GP Population Sample OutputUpdate Frequency: Monthly

NB > 10,000 rows of data, for about 500 GPs.

CCG Population output

FACT CCG Population Sample OutputUpdate Frequency: Yearly


DIM GP Practice output

DIM GP Practice Sample OutputUpdate Frequency: Weekly


GP_Practice Code joins to GP_Practice Code in FACT GP Population.
GP_Practice_CCG_Code joins to CCG_Code in FACT CCG Population.


More Info

At present, FACT GP Population and FACT CCG Population are not displayed to users and are merely used as “helper / lookup tables” for population calculations, however, it breaks the star schema construct.
Also, Business Users would like to use CCG population as a report header / label. As it is not a dimension, this is not possible.
At present, we are using SSAS Tabular DAX to do the calculations. So it works, but I am looking for a better way to complete the dimensional modelling.
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 500 total points
Comment Utility
It looks like the GP Population and CCG Population tables are summary tables of other data.  Does the data derive from the CALL fact table or some place else?

--    At present, FACT GP Population and FACT CCG Population are not displayed to users and are merely used as “helper / lookup tables” for population calculations, however, it breaks the star schema construct.

From what you've described, this may be fine.  From my vantage point it looks like you may have multiple stars with a common time dimension.  (I don't know how anything else might relate.)  But I'm still struggling with calling the * Population tables fact tables.  They look like summaries of fact tables, which could make them fact tables or dimension tables, depending on their usage.

--    Also, Business Users would like to use CCG population as a report header / label. As it is not a dimension, this is not possible.

What has to happen to treat the CCG Population table as a dimension?  Does it join to the CALL fact table on anything other than time?
0
 

Author Comment

by:JohnAeris
Comment Utility
Hi Kent,

It looks like the GP Population and CCG Population tables are summary tables of other data.  Does the data derive from the CALL fact table or some place else?

Good call. GP Population and CCG Population are populated by the Central Data Warehouse Teams for other business units. We also require the same data, so we are using these tables.

What has to happen to treat the CCG Population table as a dimension?  Does it join to the CALL fact table on anything other than time?

FACT CCG Population joins to DIM GP Practice table via GP_Practice_CCG_Code.
0
 

Author Comment

by:JohnAeris
Comment Utility
Perhaps, I should look at an Outrigger or mini-dimension approach? Hmmm...
0
 
LVL 45

Accepted Solution

by:
Kdo earned 500 total points
Comment Utility
There is probably a 1-to-many or even a many-to-many relationship between FACT CCG Population and DIM GP Practice.

Several options.  The big 3 would seem to be:

- You could build the FACT CCG Population data into the DIM GP Practice table, but that could explode the number of rows in the dimension.
- Could you create a DIM CCG Population table and link it to the FACT Calls table?
- You could leave it as is and treat it as an auxiliary table.
0
 

Author Closing Comment

by:JohnAeris
Comment Utility
Decided to stick with the original layout. This looks like a many-to-many issue, so, we're sticking with a TYPE 4 mini-dimension until we can figure out a more effect means.

THANKS!!!
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Complex SQL 10 32
Calculating Business Hours 18 58
Do we need servers??? 5 109
MSSQL Frequency of Years From Days Field 2 10
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to shrink a transaction log file down to a reasonable size.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now