As an aside, I'm glad to see others using the "ELT" terminology. I'm a big believer in NOT having all the magic take place in tools like informatica or SSIS. Just get it in the database and do the rest with stored procs. Anyway.
My approach to this is that Geography and Customer are two different things, and therefore two different dimensions.
Combining the two and trying to keep track of who lives where and when is an enormous pain even when you have a nice operational database to source from, but in your case where everything is sourced from a monolithic log file I think the effort is well beyond the point of diminishing returns.
To me it's the *fact* table that's already doing all that work for you. It knows we shipped this iteitem to this customer at this address on this date, and billed it to this customer at this address on this date.
Instead keep the geography, product, and customer dimensions separate, and let the "slowly changing dimensionality" be handled merely by crossjoining these dimensions at query runtime.
I hope this makes sense.
Main Topics
Browse All Topics





by: dportasPosted on 2009-10-29 at 05:55:58ID: 25693055
You'll need to aggregate the customer attributes for the customer table, so that you have one row per change of address but don't aggregate the transaction details for the sales table (quantity, value, etc).
Depending on what loading approach you take the aggregation could be done at the time of the load (E.T.L.) by using the SSIS Data Flow for example, OR the raw data can be loaded and then transformed (E.L.T.) in SQL. There are pros and cons to either approach. SSIS Data Flow transformations require more expertise in SSIS, so the state of knowledge and practice in your organisation or team may be a factor.