Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 853
  • Last Modified:

Data Warehousing : star vs snowflake and denormalisation.


I've just started building our Data Warehouse schema.

One of the Facts I'm handling is number and value of sales (nothing too original there!).

I have multiple databases which are the same structure, but due to limitations in the original shrink wrapped app, each DB is a different currency/region.

I intend to bring all the data into 1 warehouse and handle base and foreign currency values. That's fairly easy to do.

My question relates to the structure of the schema for the following elements.

Database : This is nothing more than ID and the name of the SQL database which is really the country (UK, Holland, Ireland, Germany, India, Spain, etc.).
Customer : A customer is normally only in 1 database, but if they are in more than 1 database, they are not considered to be the same entity.
Customer Period : This is the customer's year/period/week. We hold these so that we can produce financial reports which match their operation.

I intend to use surrogate keys for all the dimension tables.

Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

I'm stuck trying to NOT normalize the data (Database, Customer, CustomerPeriod).

The fact table would link to a single Customer Period which would in turn link to the Customer and then to the Database.

We will have a LOT of entries for the Customer Period table.

The data warehouse is going to be used by SSAS/OLAP. Something I also know little about, but on page 5 of the Microsoft SQL Server 2005 Analysis Services book it says "... data warehousing is beyond the scope of this book ..." (more or less).

So, pointers, suggestions, etc. all gratefully received.

Richard Quadling
Richard Quadling
  • 2
  • 2
3 Solutions
Guy Hengel [angelIII / a3]Billing EngineerCommented:
>Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

it depends, as so very often in DW databases.
the Report queries/filters shall dictate if you need to have multiple dimensions or 1/2 dimensions only.

for example, if all the reports are filtering for 1 single customer/database (but multiple periods), a single dimension over all 3 fields is just perfect.

if you have reports to compare the same period, but for different customers, still that 1 single dimension can do, as basically the period is depending of the customer.

so, possibly, you might have 2 dimensions:
database + customer
period + customer + database  

or this combination:
database + customer

your fact table having 2 ID's to those 2 dimensions

Christoffer SwanströmPartnerCommented:
I would suggest two dimensions:

1: customer + database
2: time period

As you say, each customer is unique to a database, therefore it does not really make sense to keep the database as a separate dimension.

Time period is not specific to a customer or database and therefore merits its own dimension.

Both dimensions (customer + database, time period) are generic enough that you can use them later on to connect your sales fact table to any other customer or time period related fact tables.
Richard QuadlingSenior Software DeveloperAuthor Commented:
Thanks to you both.

The periods are customer specific. But, in looking at a period by itself (Year/Period/Week), it seems to have nothing in it which is really customer specific. Multiple customers may be following the same pattern for their accounts. In the main we seem to have 3 patterns. Financial year starts in January, April or October.

I think my main issue is that for OLTP, I've learnt about database normalization. And that assists in the construction of the DB for the purposes of storing the data efficiently with some thought to the data retrieval (views, indexes, etc.).

But with data warehousing, I'm not seeing anything as formal for the design. 2 schemas with what really doesn't seem to be a lot between them. Not exactly a lot of help in themselves.

We have multiple dates (order date, despatch date, invoice date) and at least 2 date types (customer year/period/week, our year/period). Can I use 1 dimension table for the dates? Or, based upon this, do I need to have 5 dimension tables?

I realise I have more questions that can be easily answered without an understanding of the end requirement. No different to any other project really. I need to talk to others here to find out what they really want the system to do.

Thanks so far.
Christoffer SwanströmPartnerCommented:
I would in general try not to become the slave of any ideology, whether dimensional modeling or fully normalized 3NF. In practice you find lots of situations where you can fulfill your requirements only by doing some kind of hybrid data model which does not strictly adhere to either "gospel".

In your case I would probably implement a base table which contains the raw sales data in a flat table with no normalization.

In a second step you can make a dimensional model as I suggested above. I would not start mixing dates in one dimension. If you need to analyze your data according to multiple dates, make one dimension per date. Otherwise you risk really blowing up your dimension table.

Having the raw data in the base table you can always go back and create new dimensional data marts on top, or redefine your existing data marts.

Hope this helps
Richard QuadlingSenior Software DeveloperAuthor Commented:
Moving forward ...

Thank you all.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Build your data science skills into a career

Are you ready to take your data science career to the next step, or break into data science? With Springboard’s Data Science Career Track, you’ll master data science topics, have personalized career guidance, weekly calls with a data science expert, and a job guarantee.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now