asked on

Data warehouse design

Hello experts,

I'm digging into designing a data warehouse for our ERP system, and I have a question about the fact tables. I know that the data should be split up into several fact tables (grains), depending on which data the user wants to extract from the data warehouse.

But I'm struggling with master-detail tables. I assume it's good practice to split up both the header and the detail table into 2 seperate fact tables, as the granularity is different. Is this true ?
And if so, is it advised to store summerized line information on the header fact table (like total quantity, revenue, ...) ? In my opinion this makes it more easy for the end-user to make his own queries on the fact tables, like average quantity per order per customer per period. I know this query can be based on the detail lines fact table, but this requires a more complex SQL statement as the info has to be grouped per order first.

Kind regards
Andy

agandau

There's no magic bullet here, but I've found that it's best to create a fact table at the smallest grain available. There are cases where I can't avoid creating an order header fact and an order line fact table. This decision for me is based entirely on whether or not any measures are stored in the order header table that aren't merely summarized order line data. If that's not the case I de-normalize any dimension keys from the order header (bill to cust, ship to cust, dates, whatever) down to the order line level and live with it there.

If the order header and order line are split into two tables, then pretty much every query against the orders has to perform the join between the two. If the volume of data is large (say 50 million order lines) then this join becomes a painful thing and the impact it has on the performance of user queries and ETL isn't worth it.

If you plan on only making the order header level information available, then one week after you get the reports up and running, someone is going to ask for your quarterly customer average quantities to be broken out by product line, and then you'll be back to building the order line fact table.

If you do make order header and order lines fact tables separate, make sure that they're both clustered on the column(s) that they're joined on, so that they'll do a merge join rather than a nested loop join or a hash match.

Hope this helps.

ASKER CERTIFIED SOLUTION

agandau

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ACAE

ASKER

Thank you for the complete answer, It seems that we have more or less the same ideas. I already analysed my order lines to copy most of the data from the header (like customer, order type, order date, ..). I would like to create my fact tables from a user point-of-view, and not from a normalized IT view. The reason is that I want them to be able to create as much queries as possible without much IT intervention.

We are using SQL Server 2005 as the DBMS, and are thinking to use SSIS to feed the data warehouse. However, my first tests are not encouraging (annoying DTS_E_INDUCEDTRANSFORMFAILUREONERROR bug), so this is not definitive.

Andy