Solved

Data Warehousing : star vs snowflake and denormalisation.

Posted on 2010-11-29
5
834 Views
Last Modified: 2016-02-15
Hi.

I've just started building our Data Warehouse schema.

One of the Facts I'm handling is number and value of sales (nothing too original there!).

I have multiple databases which are the same structure, but due to limitations in the original shrink wrapped app, each DB is a different currency/region.

I intend to bring all the data into 1 warehouse and handle base and foreign currency values. That's fairly easy to do.


My question relates to the structure of the schema for the following elements.

Database : This is nothing more than ID and the name of the SQL database which is really the country (UK, Holland, Ireland, Germany, India, Spain, etc.).
Customer : A customer is normally only in 1 database, but if they are in more than 1 database, they are not considered to be the same entity.
Customer Period : This is the customer's year/period/week. We hold these so that we can produce financial reports which match their operation.

I intend to use surrogate keys for all the dimension tables.

Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

I'm stuck trying to NOT normalize the data (Database, Customer, CustomerPeriod).

The fact table would link to a single Customer Period which would in turn link to the Customer and then to the Database.

We will have a LOT of entries for the Customer Period table.

The data warehouse is going to be used by SSAS/OLAP. Something I also know little about, but on page 5 of the Microsoft SQL Server 2005 Analysis Services book it says "... data warehousing is beyond the scope of this book ..." (more or less).

So, pointers, suggestions, etc. all gratefully received.

Richard.
0
Comment
Question by:Richard Quadling
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 143

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 167 total points
ID: 34232424
>Based upon the above elements, should I use 1 dimension table with all the elements in? Or should I use multiple dimension tables.

it depends, as so very often in DW databases.
the Report queries/filters shall dictate if you need to have multiple dimensions or 1/2 dimensions only.

for example, if all the reports are filtering for 1 single customer/database (but multiple periods), a single dimension over all 3 fields is just perfect.

if you have reports to compare the same period, but for different customers, still that 1 single dimension can do, as basically the period is depending of the customer.

so, possibly, you might have 2 dimensions:
database + customer
period + customer + database  

or this combination:
database + customer
period

your fact table having 2 ID's to those 2 dimensions

0
 
LVL 8

Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 333 total points
ID: 34237177
I would suggest two dimensions:

1: customer + database
2: time period

As you say, each customer is unique to a database, therefore it does not really make sense to keep the database as a separate dimension.

Time period is not specific to a customer or database and therefore merits its own dimension.

Both dimensions (customer + database, time period) are generic enough that you can use them later on to connect your sales fact table to any other customer or time period related fact tables.
0
 
LVL 40

Author Comment

by:Richard Quadling
ID: 34237558
Thanks to you both.

The periods are customer specific. But, in looking at a period by itself (Year/Period/Week), it seems to have nothing in it which is really customer specific. Multiple customers may be following the same pattern for their accounts. In the main we seem to have 3 patterns. Financial year starts in January, April or October.

I think my main issue is that for OLTP, I've learnt about database normalization. And that assists in the construction of the DB for the purposes of storing the data efficiently with some thought to the data retrieval (views, indexes, etc.).

But with data warehousing, I'm not seeing anything as formal for the design. 2 schemas with what really doesn't seem to be a lot between them. Not exactly a lot of help in themselves.


We have multiple dates (order date, despatch date, invoice date) and at least 2 date types (customer year/period/week, our year/period). Can I use 1 dimension table for the dates? Or, based upon this, do I need to have 5 dimension tables?



I realise I have more questions that can be easily answered without an understanding of the end requirement. No different to any other project really. I need to talk to others here to find out what they really want the system to do.

Thanks so far.
0
 
LVL 8

Assisted Solution

by:Christoffer Swanström
Christoffer Swanström earned 333 total points
ID: 34237605
I would in general try not to become the slave of any ideology, whether dimensional modeling or fully normalized 3NF. In practice you find lots of situations where you can fulfill your requirements only by doing some kind of hybrid data model which does not strictly adhere to either "gospel".

In your case I would probably implement a base table which contains the raw sales data in a flat table with no normalization.

In a second step you can make a dimensional model as I suggested above. I would not start mixing dates in one dimension. If you need to analyze your data according to multiple dates, make one dimension per date. Otherwise you risk really blowing up your dimension table.

Having the raw data in the base table you can always go back and create new dimensional data marts on top, or redefine your existing data marts.

Hope this helps
0
 
LVL 40

Author Comment

by:Richard Quadling
ID: 34238611
Moving forward ...

Thank you all.
0

Featured Post

Free eBook: Backup on AWS

Everything you need to know about backup and disaster recovery with AWS, for FREE!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question