Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Data warehouse design, fact to dimention row count ratio

Posted on 2007-11-16
7
Medium Priority
?
791 Views
Last Modified: 2013-11-16
We are working on a data warehouse project for one of our customers. We are now towards the end of the project and have real data going into the warehouse.  The database has 13 dimensions and a fact table with 62 attributes. Currently the fact table has 7 million records in it and most of the dimensions are less than 10k rows in size. However, 2 of the dimensions have 400k rows which is about 5.5% of the rows in the fact table i have been monitoring both tables for a month now and conclude that they are always around 5 or 6% of the fact table.

Is this an acceptable growth for a dimension table? I am worried that the growth of these tables will become an issue later down the line as the data in the system now is only a snippet whats to be loaded before production. The only solution would be to merge the 2 big dimensions into the fact table, but this would mean having some non numeric attributes in the call fact table.

Please advise.
0
Comment
Question by:andy_routledge
  • 2
  • 2
7 Comments
 
LVL 18

Accepted Solution

by:
PFrog earned 452 total points
ID: 20299496
I would say this are too few records to warrent inclusion into the fact table. About 5% for a slowly changing dimension doesn't seem too unreasonable to me.

The most important factor is to make sure it correctly maps your business structure. If you can explain the content of these dimensions and what is changing in them I could advise in more detail.
0
 
LVL 4

Author Comment

by:andy_routledge
ID: 20299679
Well the system is for a telecoms company, the fact granularity is at call level e.g. 1 call 1 record. Each call has 4 telephone numebrs assosiated with it, dailer number called number and 2 which relate to routing information most of the time these are the same. One of the dimensions tracks these individual numbers and the info that goes with them national international etc. The other large dimention is a "junk" dimention it contains rarely used information but due to the nature of the buisness all call information has to be recorded for at least 4 years. The junk dimention bascilly stores some technical info for each call which is the same for alot of calls, trunk route, ciricute, bearer etc.
0
 
LVL 12

Assisted Solution

by:RWrigley
RWrigley earned 448 total points
ID: 20303116
From a reporting standpoint, I'd be more more concerned about the ratio between the categories in the different levels in your dimensions, because you'll hit performance problems if you have to write reports that deal with large sets of data.  
0
 
LVL 4

Author Comment

by:andy_routledge
ID: 20311266
I'm happy with the first answer seems thanks guys.
0
 
LVL 18

Expert Comment

by:PFrog
ID: 20311272
It looks like your first large dimension is modelled correctly.
I'm not too sure about the junk dimension though. The way you described it, it sounds like it could/should be split up into a number of dimensions - is there a relationship beteeen trunk route, circuit bearer etc? If they are related then you've got it right already, if they're not related then they should be split up.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
How much do you know about the future of data centers? If you're like 50% of organizations, then it's probably not enough. Read on to get up to speed on this emerging field.
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question