Data Warehousing : star vs snowflake and denormalisation.

Posted on 2010-11-29
Medium Priority
Last Modified: 2016-02-15

I've just started building our Data Warehouse schema.

One of the Facts I'm handling is number and value of sales (nothing too original there!).

I have multiple databases which are the same structure, but due to limitations in the original shrink wrapped app, each DB is a different currency/region.

I intend to bring all the data into 1 warehouse and handle base and foreign currency values. That's fairly easy to do.

My question relates to the structure of the schema for the following elements.

Database : This is nothing more than ID and the name of the SQL database which is really the country (UK, Holland, Ireland, Germany, India, Spain, etc.).
Customer : A customer is normally only in 1 database, but if they are in more than 1 database, they are not considered to be the same entity.
Customer Period : This is the customer's year/period/week. We hold these so that we can produce financial reports which match their operation.

I intend to use surrogate keys for all the dimension tables.

Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

I'm stuck trying to NOT normalize the data (Database, Customer, CustomerPeriod).

The fact table would link to a single Customer Period which would in turn link to the Customer and then to the Database.

We will have a LOT of entries for the Customer Period table.

The data warehouse is going to be used by SSAS/OLAP. Something I also know little about, but on page 5 of the Microsoft SQL Server 2005 Analysis Services book it says "... data warehousing is beyond the scope of this book ..." (more or less).

So, pointers, suggestions, etc. all gratefully received.

Question by:Richard Quadling
  • 2
  • 2
LVL 143

Accepted Solution

Guy Hengel [angelIII / a3] earned 668 total points
ID: 34232424
>Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

it depends, as so very often in DW databases.
the Report queries/filters shall dictate if you need to have multiple dimensions or 1/2 dimensions only.

for example, if all the reports are filtering for 1 single customer/database (but multiple periods), a single dimension over all 3 fields is just perfect.

if you have reports to compare the same period, but for different customers, still that 1 single dimension can do, as basically the period is depending of the customer.

so, possibly, you might have 2 dimensions:
database + customer
period + customer + database  

or this combination:
database + customer

your fact table having 2 ID's to those 2 dimensions


Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 1332 total points
ID: 34237177
I would suggest two dimensions:

1: customer + database
2: time period

As you say, each customer is unique to a database, therefore it does not really make sense to keep the database as a separate dimension.

Time period is not specific to a customer or database and therefore merits its own dimension.

Both dimensions (customer + database, time period) are generic enough that you can use them later on to connect your sales fact table to any other customer or time period related fact tables.
LVL 40

Author Comment

by:Richard Quadling
ID: 34237558
Thanks to you both.

The periods are customer specific. But, in looking at a period by itself (Year/Period/Week), it seems to have nothing in it which is really customer specific. Multiple customers may be following the same pattern for their accounts. In the main we seem to have 3 patterns. Financial year starts in January, April or October.

I think my main issue is that for OLTP, I've learnt about database normalization. And that assists in the construction of the DB for the purposes of storing the data efficiently with some thought to the data retrieval (views, indexes, etc.).

But with data warehousing, I'm not seeing anything as formal for the design. 2 schemas with what really doesn't seem to be a lot between them. Not exactly a lot of help in themselves.

We have multiple dates (order date, despatch date, invoice date) and at least 2 date types (customer year/period/week, our year/period). Can I use 1 dimension table for the dates? Or, based upon this, do I need to have 5 dimension tables?

I realise I have more questions that can be easily answered without an understanding of the end requirement. No different to any other project really. I need to talk to others here to find out what they really want the system to do.

Thanks so far.

Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 1332 total points
ID: 34237605
I would in general try not to become the slave of any ideology, whether dimensional modeling or fully normalized 3NF. In practice you find lots of situations where you can fulfill your requirements only by doing some kind of hybrid data model which does not strictly adhere to either "gospel".

In your case I would probably implement a base table which contains the raw sales data in a flat table with no normalization.

In a second step you can make a dimensional model as I suggested above. I would not start mixing dates in one dimension. If you need to analyze your data according to multiple dates, make one dimension per date. Otherwise you risk really blowing up your dimension table.

Having the raw data in the base table you can always go back and create new dimensional data marts on top, or redefine your existing data marts.

Hope this helps
LVL 40

Author Comment

by:Richard Quadling
ID: 34238611
Moving forward ...

Thank you all.

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article shows how to get a list of available printers for display in a drop-down list, and then to use the selected printer to print an Access report or a Word document filled with Access data, using different syntax as needed for working with …
If you try to migrate from Elastix to Issabel, you will face a lot of issues. These problems are inevitable but fortunately, you can fix them. In the guide below, I will explain how I performed the migration while keeping all data and successfully t…
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …
Planning to migrate your EDB file(s) to a new or an existing Outlook PST file? This video will guide you how to convert EDB file(s) to PST. Besides this, it also describes, how one can easily search any item(s) from multiple folders or mailboxes…
Suggested Courses

601 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question