[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Data warehouse design

Posted on 2009-04-23
3
Medium Priority
?
500 Views
Last Modified: 2013-11-16
Hello experts,

I'm digging into designing a data warehouse for our ERP system, and I have a question about the fact tables. I know that the data should be split up into several fact tables (grains), depending on which data the user wants to extract from the data warehouse.

But I'm struggling with master-detail tables. I assume it's good practice to split up both the header and the detail table into 2 seperate fact tables, as the granularity is different. Is this true ?
And if so, is it advised to store summerized line information on the header fact table (like total quantity, revenue, ...) ? In my opinion this makes it more easy for the end-user to make his own queries on the fact tables, like average quantity per order per customer per period. I know this query can be based on the detail lines fact table, but this requires a more complex SQL statement as the info has to be grouped per order first.

Kind regards
Andy





0
Comment
Question by:ACAE
  • 2
3 Comments
 
LVL 6

Expert Comment

by:agandau
ID: 24215339
There's no magic bullet here, but I've found that it's best to create a fact table at the smallest grain available.  There are cases where I can't avoid creating an order header fact and an order line fact table.  This decision for me is based entirely on whether or not any measures are stored in the order header table that aren't merely summarized order line data.  If that's not the case I de-normalize any dimension keys from the order header (bill to cust, ship to cust, dates, whatever) down to the order line level and live with it there.

If the order header and order line are split into two tables, then pretty much every query against the orders has to perform the join between the two.  If the volume of data is large (say 50 million order lines) then this join becomes a painful thing and the impact it has on the performance of user queries and ETL isn't worth it.

If you plan on only making the order header level information available, then one week after you get the reports up and running, someone is going to ask for your quarterly customer average quantities to be broken out by product line, and then you'll be back to building the order line fact table.

If you do make order header and order lines fact tables separate, make sure that they're both clustered on the column(s) that they're joined on, so that they'll do a merge join rather than a nested loop join or a hash match.  

Hope this helps.
0
 
LVL 6

Accepted Solution

by:
agandau earned 2000 total points
ID: 24215567
I re-read your post, and left a couple of questions unanswered.

"I assume it's good practice to split up both the header and the detail table into 2 seperate fact tables, as the granularity is different. Is this true ? "
That I commented on above.
And if so, is it advised to store summerized line information on the header fact table (like total quantity, revenue, ...) ?
If you do have both header and line tables, then yes.  I would go ahead and put whatever useful summarizations in the header table, and have the ETL process perform and store the summarizations.  I had said above that my main criterion for deciding whether or not I'd have a header table at all was based on whether or not it had order header level specific measures, but I would have to say that if performance of queries that had to summarize line level information was miserable, that too is a compelling reason to have the order header level table.
In my opinion this makes it more easy for the end-user to make his own queries on the fact tables, like average quantity per order per customer per period. I know this query can be based on the detail lines fact table, but this requires a more complex SQL statement as the info has to be grouped per order first.

In fact, this is probably your most compelling reason to have two fact tables.  I was going to suggest that you simply make a view of the order lines to perform the complicated SQL, but then I thought, "I hate it when people suggest we can always just make a view that does such and such".  What you'd end up with likely is a case where the summarization would happen to all the rows in the order line fact table before any filtration takes place.
I suggest you have two fact tables both clustered on the same columns, and make any keys in the order header cascade down to the order line so that in reports that need order line information you don't have to join to the order header just to know, for example, who the customer is, or what the order date it.  They should be right there in the order line table.
What DBMS are you using by the way?
0
 

Author Comment

by:ACAE
ID: 24215682
Thank you for the complete answer, It seems that we have more or less the same ideas. I already analysed my order lines to copy most of the data from the header (like customer, order type, order date, ..). I would like to create my fact tables from a user point-of-view, and not from a normalized IT view. The reason is that I want them to be able to create as much queries as possible without much IT intervention.

We are using SQL Server 2005 as the DBMS, and are thinking to use SSIS to feed the data warehouse. However, my first tests are not encouraging (annoying DTS_E_INDUCEDTRANSFORMFAILUREONERROR bug), so this is not definitive.

Andy
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating a Cordova application which allow user to save to/load from his Dropbox account the application database.
How much do you know about the future of data centers? If you're like 50% of organizations, then it's probably not enough. Read on to get up to speed on this emerging field.
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
Suggested Courses

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question