Solved

Data warehouse design, fact to dimention row count ratio

Posted on 2007-11-16
7
761 Views
Last Modified: 2013-11-16
We are working on a data warehouse project for one of our customers. We are now towards the end of the project and have real data going into the warehouse.  The database has 13 dimensions and a fact table with 62 attributes. Currently the fact table has 7 million records in it and most of the dimensions are less than 10k rows in size. However, 2 of the dimensions have 400k rows which is about 5.5% of the rows in the fact table i have been monitoring both tables for a month now and conclude that they are always around 5 or 6% of the fact table.

Is this an acceptable growth for a dimension table? I am worried that the growth of these tables will become an issue later down the line as the data in the system now is only a snippet whats to be loaded before production. The only solution would be to merge the 2 big dimensions into the fact table, but this would mean having some non numeric attributes in the call fact table.

Please advise.
0
Comment
Question by:andy_routledge
  • 2
  • 2
7 Comments
 
LVL 18

Accepted Solution

by:
PFrog earned 113 total points
ID: 20299496
I would say this are too few records to warrent inclusion into the fact table. About 5% for a slowly changing dimension doesn't seem too unreasonable to me.

The most important factor is to make sure it correctly maps your business structure. If you can explain the content of these dimensions and what is changing in them I could advise in more detail.
0
 
LVL 4

Author Comment

by:andy_routledge
ID: 20299679
Well the system is for a telecoms company, the fact granularity is at call level e.g. 1 call 1 record. Each call has 4 telephone numebrs assosiated with it, dailer number called number and 2 which relate to routing information most of the time these are the same. One of the dimensions tracks these individual numbers and the info that goes with them national international etc. The other large dimention is a "junk" dimention it contains rarely used information but due to the nature of the buisness all call information has to be recorded for at least 4 years. The junk dimention bascilly stores some technical info for each call which is the same for alot of calls, trunk route, ciricute, bearer etc.
0
 
LVL 12

Assisted Solution

by:RWrigley
RWrigley earned 112 total points
ID: 20303116
From a reporting standpoint, I'd be more more concerned about the ratio between the categories in the different levels in your dimensions, because you'll hit performance problems if you have to write reports that deal with large sets of data.  
0
 
LVL 4

Author Comment

by:andy_routledge
ID: 20311266
I'm happy with the first answer seems thanks guys.
0
 
LVL 18

Expert Comment

by:PFrog
ID: 20311272
It looks like your first large dimension is modelled correctly.
I'm not too sure about the junk dimension though. The way you described it, it sounds like it could/should be split up into a number of dimensions - is there a relationship beteeen trunk route, circuit bearer etc? If they are related then you've got it right already, if they're not related then they should be split up.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
mysql database, schema and table creation 13 57
Currency in SQL? 2 30
MIcrosoft SQL 2014 Database Copy Question 16 51
subtr returning incorrect value 8 27
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
Many companies are looking to get out of the datacenter business and to services like Microsoft Azure to provide Infrastructure as a Service (IaaS) solutions for legacy client server workloads, rather than continuing to make capital investments in h…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question