Solved

SSAS: dimension tables: 50 million rows x 50 columns

Posted on 2009-04-15
6
1,072 Views
Last Modified: 2013-11-16
SQL SERVER 2005 / 2008 ANALYSIS SERVICES

I have 3 tables (more I have to consider later) and all 3 contain approx. 50 million rows. The first table contains details of tranactions (and will be a fact table) and the other two tables are dimension tables called 'Client' and 'Policy'. 'Client' and 'Policy' both contain around 50 columns. Many of the columns contain codes of a sort which suit the use of look-up tables, for example Client contains codes for 'occupation category' and 'employer occupation category', etc.

Would it be better to:
a. simply create dimensions based upon the Client and Policy tables as they are, or
b. create separate dimension tables attached to the Client and Policy tables for all the different codes (i.e. look-ups for their descriptions) and therefore create dimensions for these tables as well ?
(This would equate to a snow-flake design in which the Client and Policy tables have lots of lookups hanging off them).

Also, there is a direct relation (one to many) between the Client and Policy tables. Would I also set this relation to each other (i.e between the Client and Policy tables) as well as the relation of Client and Policy to the fact table ?


Des
0
Comment
Question by:DerekRoberts
  • 2
  • 2
6 Comments
 
LVL 6

Expert Comment

by:agandau
ID: 24150570
Is the Policy table something more like a "Client Has Policy" table, where a record will regardless even if there aren't any records in the transaction table?
0
 

Author Comment

by:DerekRoberts
ID: 24155179
there will be always be records in the transactions table for a policy table entry.  The transactions table is an amalgamation of a stats (events) and ledger table which records something for everything.
Clients have policies and policies always have transactions associated with them.

0
 
LVL 6

Accepted Solution

by:
agandau earned 500 total points
ID: 24159293
Based on your description, I would create two dimensions.  One Policy dimension and one Client dimension.  In each of these two dimensions, if the performance isn't miserable, I would snowflake in the lookups and use the codes as member keys and the descriptions as member names creating an attribute in the dimension of any of those that are necessary for reporting.

Since it sounds like the relationship exists between Client and Transaction I would avoid using the referenced dimension relationship and have the fact table be regularly related to both the Client and Policy dimensions separately for performance reasons, especially considering the number of members you'll likely have in both of these dimensions.  In the DSV then I wouldn't bother with having that relationship between Client and Policy defined.

My main concern with what you're describing is what kinds of changes happen in the dimensions.  I'm guessing a policy doesn't change.  If a client changes a policy perhaps the existing is expired and a new one created?

A Client on the other hand, may change demographics (get married or move to another state), or their employer might change from one occupation category to another, etc.  What happens to the records in the Client when that happens.  Is a new record cut with the changes, and then do subsequent transactions point to that new record, or is it updated in place?  It's the whole "slowly changing dimension" debacle, which has to be taken care of via ETL in the database.
0
 

Author Comment

by:DerekRoberts
ID: 24267740
The above is pretty much what I've been thinking (dreading).  Any innovative ideas are always welcome but I suspect in this case the real requirement is for more powerful hardware to accommodate the above scenario with regard to snow-flaking.

Fortunately we don't have to deal with the issues surrounding slowly changing dimensions.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now