eviglotti
asked on
Star schema daily updates
Hello,
We have an enterprise datawarehouse with a number of dimensions and facts. The biggest dimension is about 1.6mil rows and biggest fact is about 10 mil rows. Our primary update happens overnight and involves both a truncate/insert of some of this data in some fact tables and an insert only into some periodic snapshot tables. We also have some insert/update and other steps for the dimensions. So far we have gotten away with all of this happening on the single production database, but that means the database has questionable results should anyone run queries during this load. So we want to do an "offline" load and then swap to make it online. We know that many folks will use temp tables within the production database and then just issue rename/drop commands on the individual tables.
However, we are concerned that the management of the DW primary/foreign keys between the facts and dimensions becomes extremely complicated. Also for periodic snapshot tables, this seems even more challenging. So we are proceeding by creating two identical copies of the full database, one called production and one called stage. We are doing all our ETL on the stage copy and then issuing a database command at the end to "swap" production with stage so magically the users see all of the production data instantly with no risk of key joins breaking.
We are seeing that this should work but there are times where different pieces of the data update at different times and thus this is challenging this idea.
Does anyone have links or information on best practices on how to handle this based on different project constraints?
Thanks!
We have an enterprise datawarehouse with a number of dimensions and facts. The biggest dimension is about 1.6mil rows and biggest fact is about 10 mil rows. Our primary update happens overnight and involves both a truncate/insert of some of this data in some fact tables and an insert only into some periodic snapshot tables. We also have some insert/update and other steps for the dimensions. So far we have gotten away with all of this happening on the single production database, but that means the database has questionable results should anyone run queries during this load. So we want to do an "offline" load and then swap to make it online. We know that many folks will use temp tables within the production database and then just issue rename/drop commands on the individual tables.
However, we are concerned that the management of the DW primary/foreign keys between the facts and dimensions becomes extremely complicated. Also for periodic snapshot tables, this seems even more challenging. So we are proceeding by creating two identical copies of the full database, one called production and one called stage. We are doing all our ETL on the stage copy and then issuing a database command at the end to "swap" production with stage so magically the users see all of the production data instantly with no risk of key joins breaking.
We are seeing that this should work but there are times where different pieces of the data update at different times and thus this is challenging this idea.
Does anyone have links or information on best practices on how to handle this based on different project constraints?
Thanks!
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you.