Data Warehousing : star vs snowflake and denormalisation.

Posted on 2010-11-29
Last Modified: 2016-02-15

I've just started building our Data Warehouse schema.

One of the Facts I'm handling is number and value of sales (nothing too original there!).

I have multiple databases which are the same structure, but due to limitations in the original shrink wrapped app, each DB is a different currency/region.

I intend to bring all the data into 1 warehouse and handle base and foreign currency values. That's fairly easy to do.

My question relates to the structure of the schema for the following elements.

Database : This is nothing more than ID and the name of the SQL database which is really the country (UK, Holland, Ireland, Germany, India, Spain, etc.).
Customer : A customer is normally only in 1 database, but if they are in more than 1 database, they are not considered to be the same entity.
Customer Period : This is the customer's year/period/week. We hold these so that we can produce financial reports which match their operation.

I intend to use surrogate keys for all the dimension tables.

Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

I'm stuck trying to NOT normalize the data (Database, Customer, CustomerPeriod).

The fact table would link to a single Customer Period which would in turn link to the Customer and then to the Database.

We will have a LOT of entries for the Customer Period table.

The data warehouse is going to be used by SSAS/OLAP. Something I also know little about, but on page 5 of the Microsoft SQL Server 2005 Analysis Services book it says "... data warehousing is beyond the scope of this book ..." (more or less).

So, pointers, suggestions, etc. all gratefully received.

Question by:Richard Quadling
  • 2
  • 2
LVL 142

Accepted Solution

Guy Hengel [angelIII / a3] earned 167 total points
ID: 34232424
>Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

it depends, as so very often in DW databases.
the Report queries/filters shall dictate if you need to have multiple dimensions or 1/2 dimensions only.

for example, if all the reports are filtering for 1 single customer/database (but multiple periods), a single dimension over all 3 fields is just perfect.

if you have reports to compare the same period, but for different customers, still that 1 single dimension can do, as basically the period is depending of the customer.

so, possibly, you might have 2 dimensions:
database + customer
period + customer + database  

or this combination:
database + customer

your fact table having 2 ID's to those 2 dimensions


Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 333 total points
ID: 34237177
I would suggest two dimensions:

1: customer + database
2: time period

As you say, each customer is unique to a database, therefore it does not really make sense to keep the database as a separate dimension.

Time period is not specific to a customer or database and therefore merits its own dimension.

Both dimensions (customer + database, time period) are generic enough that you can use them later on to connect your sales fact table to any other customer or time period related fact tables.
LVL 40

Author Comment

by:Richard Quadling
ID: 34237558
Thanks to you both.

The periods are customer specific. But, in looking at a period by itself (Year/Period/Week), it seems to have nothing in it which is really customer specific. Multiple customers may be following the same pattern for their accounts. In the main we seem to have 3 patterns. Financial year starts in January, April or October.

I think my main issue is that for OLTP, I've learnt about database normalization. And that assists in the construction of the DB for the purposes of storing the data efficiently with some thought to the data retrieval (views, indexes, etc.).

But with data warehousing, I'm not seeing anything as formal for the design. 2 schemas with what really doesn't seem to be a lot between them. Not exactly a lot of help in themselves.

We have multiple dates (order date, despatch date, invoice date) and at least 2 date types (customer year/period/week, our year/period). Can I use 1 dimension table for the dates? Or, based upon this, do I need to have 5 dimension tables?

I realise I have more questions that can be easily answered without an understanding of the end requirement. No different to any other project really. I need to talk to others here to find out what they really want the system to do.

Thanks so far.

Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 333 total points
ID: 34237605
I would in general try not to become the slave of any ideology, whether dimensional modeling or fully normalized 3NF. In practice you find lots of situations where you can fulfill your requirements only by doing some kind of hybrid data model which does not strictly adhere to either "gospel".

In your case I would probably implement a base table which contains the raw sales data in a flat table with no normalization.

In a second step you can make a dimensional model as I suggested above. I would not start mixing dates in one dimension. If you need to analyze your data according to multiple dates, make one dimension per date. Otherwise you risk really blowing up your dimension table.

Having the raw data in the base table you can always go back and create new dimensional data marts on top, or redefine your existing data marts.

Hope this helps
LVL 40

Author Comment

by:Richard Quadling
ID: 34238611
Moving forward ...

Thank you all.

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Can Unique column have more than one Null? 8 53
Unable to save view in SSMS 21 69
RDBMS and No sql database 4 58
Find results from sql within a time span 11 30
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Read about achieving the basic levels of HRIS security in the workplace.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question