My first observation would be that you need to get someone to help you with this project. A data warehouse project is no place to experiment with design. If no-one at your organisation has experience of building a DW then I suggest you hire someone.
Secondly, the golden rule of a Data Warehouse is to conform your data. Make sure that the data is conformed to a consistent set of attributes, codes, values etc regardless of source system. You can still present several different views with different selections of rows and attributes to different users. Those different views of the warehouse are called Data Marts. But in the warehouse you should aim conform to a single, subject-oriented and consistent view of the truth without redundant tables, attributes, etc.
Main Topics
Browse All Topics





by: PxPPosted on 2009-10-26 at 22:20:04ID: 25669320
I have designed a small data warehouse for project management financial, budget and forecast reporting. The dw is built and works fine and is fast. The total current number of records in each of the 4 main fact table does not exceed 120k. I expect to carry 5 yrs worth of data so each fact table will eventually get to probably 1 million records each.
I designed the dw in a Snowflake "like" design that essentially has every single field (attribute) in the fact tables broken out into separate Dimension tables. The reason why I did it this way was b.c it was fast and easy to design and the proof-of-concept application I am working on is due in a few weeks.
I have 4 important data sources that feed my dw. All 4 come from different sources and I developed a quick ETL process to load the flat files extracts from the sources into import tables in the DW and then I run stored procs against them to format the imported data and create derived fields, etc.
Once the data is formated properly I begin to separate the data from the import tables into the dimension tables (or lookup tables) depending on how you want to define it. As I said, I am essentially separating every field into its own dimension (lookup) b.c the data changes and it was easy to keep track of everything this way as I didn't have time to truly plan and design something a little more business logical. I really am going overboard probably in my normalization of the dw.
Another main reason why I did this was b.c all of the 4 main fact tables overlap about 50% of their fields, so one dimension table with multiple attributes (fields) might work well for one fact table, but it won't for the other and I would have to create multiple lookup tables (as I guess I already am doing) that would constitute a dimension table for one fact table, but then I would have to repeat the step using some of the same lookup tables and different lookup tables to create a dimension table for another fact table.
Ok :-), only two of the sources actually load data frequently, one is weekly and the other is monthly (but will probably also become weekly), and the total amount of records in the weekly extract is only about 25k while the monthly is only about 15k.
So as you can imagine the business people I support want to see the 4 sources separately, but in the reporting app the data is combined. So it is very easy to keep the 4 sources separate.
So, I have 4 important fact tables (names are more descriptive for this message): Employee_TimeWorked, Financial_TimeWorked, Budget_TimeWorked and Forecast_TimeWorked. These fact tables are essentially the import tables except all of the fields that aren't some employee metric (hrs worked, cost/hr,etc.) are integers and foreign keys related to the surrounding dimension tables' primary surrogate keys.
So I import the data, have the data be lookedup against the dimension tables and then I insert the import data into the fact table with all foreign keys that relate back to the dimension tables.
Getting a little tired and and I am going to skip to questions...
I guess my first q is: Is it acceptable practice to insert data into fact tables with primary keys of the dimension tables (foreign keys) for a data warehouse?
I guess another question is if it would be acceptable practice to have Dimension tables that actually have overlapping data in them to accomodate each of the 4 fact tables?
So instead of having 20+ dimension tables alone, I could, from a conceptual standpoint, keep 10 of the dimension tables and "conceptually" rename tham as Lookup tables to create 5 true dimension tables that have say 3 to 7 attributes (fields) and many of those fields would have repetitive data. But in the grand scheme of things, 4 of the 5 the dimension tables would really only have absolutely no more 1000 records and the remaining dimension tables would have something like 15k records (employees). I would have to join the Employees dimension with the EmployeeDescription dimension (includes type of employee, compensation code, onshore, off, company employee, etc) and the EmployeeDescription dimension table wouldn't be more than 1000 records.
Any info you can provide would be helpful.
Thanks,