keplan
asked on
Fact table incremental insert/update best practice
Hi,
I would like to ask question from expert in data warehousing. Please give some good solution to update/insert fact table in a data warehouse where there are no seuential number from the source system to tract the new rows. I'm kind of looking at some date column or any best practice in the inductry.
I would like to ask question from expert in data warehousing. Please give some good solution to update/insert fact table in a data warehouse where there are no seuential number from the source system to tract the new rows. I'm kind of looking at some date column or any best practice in the inductry.
Hi Keplan, are you asking the best way to uniquely identify a row in a fact table when you're being provided a business/natural key from the source system?
If so then there are a few questions here, firstly I would be very surprised if the source system had absolutely no way of uniquely identifying each row, they just may not be supplying it to you, can you speak to the owner of the source system?
The main approach to determine if a fact row is unique is usually the combination of its dimension member values, so could you not simply check each dimension value in the fact and check to see if it exists?
As for the previous posters suggestion of "upsert", I think he is referring to the MERGE statement in SQL Server. This would do the task but may not best your ETL process depending on how it is implemented
Thanks
If so then there are a few questions here, firstly I would be very surprised if the source system had absolutely no way of uniquely identifying each row, they just may not be supplying it to you, can you speak to the owner of the source system?
The main approach to determine if a fact row is unique is usually the combination of its dimension member values, so could you not simply check each dimension value in the fact and check to see if it exists?
As for the previous posters suggestion of "upsert", I think he is referring to the MERGE statement in SQL Server. This would do the task but may not best your ETL process depending on how it is implemented
Thanks
ASKER
This is just an explain little bit on my requirment. The source system has a key called transaction number which has a unique key for the source. However, the transaction number is not the attribute we store in the data warehouse enviroment. So there is no tracking based on that.
I've added last load time on the fact table and also onto a staging enviroment.
But I need to trust this last load time is more consistence or reliable to rely on for new records.
Any help on this for best approach
I've added last load time on the fact table and also onto a staging enviroment.
But I need to trust this last load time is more consistence or reliable to rely on for new records.
Any help on this for best approach
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
http://www.databasejournal.com/features/mssql/article.php/3739131/UPSERT-Functionality-in-SQL-Server-2008.htm
You could find more material through google or bing.