Solved

Data warehouse design, fact to dimention row count ratio

Posted on 2007-11-16
7
772 Views
Last Modified: 2013-11-16
We are working on a data warehouse project for one of our customers. We are now towards the end of the project and have real data going into the warehouse.  The database has 13 dimensions and a fact table with 62 attributes. Currently the fact table has 7 million records in it and most of the dimensions are less than 10k rows in size. However, 2 of the dimensions have 400k rows which is about 5.5% of the rows in the fact table i have been monitoring both tables for a month now and conclude that they are always around 5 or 6% of the fact table.

Is this an acceptable growth for a dimension table? I am worried that the growth of these tables will become an issue later down the line as the data in the system now is only a snippet whats to be loaded before production. The only solution would be to merge the 2 big dimensions into the fact table, but this would mean having some non numeric attributes in the call fact table.

Please advise.
0
Comment
Question by:andy_routledge
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
7 Comments
 
LVL 18

Accepted Solution

by:
PFrog earned 113 total points
ID: 20299496
I would say this are too few records to warrent inclusion into the fact table. About 5% for a slowly changing dimension doesn't seem too unreasonable to me.

The most important factor is to make sure it correctly maps your business structure. If you can explain the content of these dimensions and what is changing in them I could advise in more detail.
0
 
LVL 4

Author Comment

by:andy_routledge
ID: 20299679
Well the system is for a telecoms company, the fact granularity is at call level e.g. 1 call 1 record. Each call has 4 telephone numebrs assosiated with it, dailer number called number and 2 which relate to routing information most of the time these are the same. One of the dimensions tracks these individual numbers and the info that goes with them national international etc. The other large dimention is a "junk" dimention it contains rarely used information but due to the nature of the buisness all call information has to be recorded for at least 4 years. The junk dimention bascilly stores some technical info for each call which is the same for alot of calls, trunk route, ciricute, bearer etc.
0
 
LVL 12

Assisted Solution

by:RWrigley
RWrigley earned 112 total points
ID: 20303116
From a reporting standpoint, I'd be more more concerned about the ratio between the categories in the different levels in your dimensions, because you'll hit performance problems if you have to write reports that deal with large sets of data.  
0
 
LVL 4

Author Comment

by:andy_routledge
ID: 20311266
I'm happy with the first answer seems thanks guys.
0
 
LVL 18

Expert Comment

by:PFrog
ID: 20311272
It looks like your first large dimension is modelled correctly.
I'm not too sure about the junk dimension though. The way you described it, it sounds like it could/should be split up into a number of dimensions - is there a relationship beteeen trunk route, circuit bearer etc? If they are related then you've got it right already, if they're not related then they should be split up.
0

Featured Post

MIM Survival Guide for Service Desk Managers

Major incidents can send mastered service desk processes into disorder. Systems and tools produce the data needed to resolve these incidents, but your challenge is getting that information to the right people fast. Check out the Survival Guide and begin bringing order to chaos.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

These days, all we hear about hacktivists took down so and so websites and retrieved thousands of user’s data. One of the techniques to get unauthorized access to database is by performing SQL injection. This article is quite lengthy which gives bas…
Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

739 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question