Data Warehousing : star vs snowflake and denormalisation.

Posted on 2010-11-29
Medium Priority
Last Modified: 2016-02-15

I've just started building our Data Warehouse schema.

One of the Facts I'm handling is number and value of sales (nothing too original there!).

I have multiple databases which are the same structure, but due to limitations in the original shrink wrapped app, each DB is a different currency/region.

I intend to bring all the data into 1 warehouse and handle base and foreign currency values. That's fairly easy to do.

My question relates to the structure of the schema for the following elements.

Database : This is nothing more than ID and the name of the SQL database which is really the country (UK, Holland, Ireland, Germany, India, Spain, etc.).
Customer : A customer is normally only in 1 database, but if they are in more than 1 database, they are not considered to be the same entity.
Customer Period : This is the customer's year/period/week. We hold these so that we can produce financial reports which match their operation.

I intend to use surrogate keys for all the dimension tables.

Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

I'm stuck trying to NOT normalize the data (Database, Customer, CustomerPeriod).

The fact table would link to a single Customer Period which would in turn link to the Customer and then to the Database.

We will have a LOT of entries for the Customer Period table.

The data warehouse is going to be used by SSAS/OLAP. Something I also know little about, but on page 5 of the Microsoft SQL Server 2005 Analysis Services book it says "... data warehousing is beyond the scope of this book ..." (more or less).

So, pointers, suggestions, etc. all gratefully received.

Question by:Richard Quadling
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
LVL 143

Accepted Solution

Guy Hengel [angelIII / a3] earned 668 total points
ID: 34232424
>Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

it depends, as so very often in DW databases.
the Report queries/filters shall dictate if you need to have multiple dimensions or 1/2 dimensions only.

for example, if all the reports are filtering for 1 single customer/database (but multiple periods), a single dimension over all 3 fields is just perfect.

if you have reports to compare the same period, but for different customers, still that 1 single dimension can do, as basically the period is depending of the customer.

so, possibly, you might have 2 dimensions:
database + customer
period + customer + database  

or this combination:
database + customer

your fact table having 2 ID's to those 2 dimensions


Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 1332 total points
ID: 34237177
I would suggest two dimensions:

1: customer + database
2: time period

As you say, each customer is unique to a database, therefore it does not really make sense to keep the database as a separate dimension.

Time period is not specific to a customer or database and therefore merits its own dimension.

Both dimensions (customer + database, time period) are generic enough that you can use them later on to connect your sales fact table to any other customer or time period related fact tables.
LVL 40

Author Comment

by:Richard Quadling
ID: 34237558
Thanks to you both.

The periods are customer specific. But, in looking at a period by itself (Year/Period/Week), it seems to have nothing in it which is really customer specific. Multiple customers may be following the same pattern for their accounts. In the main we seem to have 3 patterns. Financial year starts in January, April or October.

I think my main issue is that for OLTP, I've learnt about database normalization. And that assists in the construction of the DB for the purposes of storing the data efficiently with some thought to the data retrieval (views, indexes, etc.).

But with data warehousing, I'm not seeing anything as formal for the design. 2 schemas with what really doesn't seem to be a lot between them. Not exactly a lot of help in themselves.

We have multiple dates (order date, despatch date, invoice date) and at least 2 date types (customer year/period/week, our year/period). Can I use 1 dimension table for the dates? Or, based upon this, do I need to have 5 dimension tables?

I realise I have more questions that can be easily answered without an understanding of the end requirement. No different to any other project really. I need to talk to others here to find out what they really want the system to do.

Thanks so far.

Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 1332 total points
ID: 34237605
I would in general try not to become the slave of any ideology, whether dimensional modeling or fully normalized 3NF. In practice you find lots of situations where you can fulfill your requirements only by doing some kind of hybrid data model which does not strictly adhere to either "gospel".

In your case I would probably implement a base table which contains the raw sales data in a flat table with no normalization.

In a second step you can make a dimensional model as I suggested above. I would not start mixing dates in one dimension. If you need to analyze your data according to multiple dates, make one dimension per date. Otherwise you risk really blowing up your dimension table.

Having the raw data in the base table you can always go back and create new dimensional data marts on top, or redefine your existing data marts.

Hope this helps
LVL 40

Author Comment

by:Richard Quadling
ID: 34238611
Moving forward ...

Thank you all.

Featured Post

Learn how to optimize MySQL for your business need

With the increasing importance of apps & networks in both business & personal interconnections, perfor. has become one of the key metrics of successful communication. This ebook is a hands-on business-case-driven guide to understanding MySQL query parameter tuning & database perf

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
Your data is at risk. Probably more today that at any other time in history. There are simply more people with more access to the Web with bad intentions.
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question