[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2709
  • Last Modified:

Fact table incremental insert/update best practice

Hi,

I would like to ask question from expert in data warehousing. Please give some good solution to update/insert fact table in a data warehouse where there are no seuential number from the source system to tract the new rows. I'm kind of looking at some date column or any best practice in the inductry.
0
keplan
Asked:
keplan
  • 2
  • 2
2 Solutions
 
sachitjainCommented:
SQL Server 2008 has got a feature named as 'Upsert', explore more around it.

http://www.databasejournal.com/features/mssql/article.php/3739131/UPSERT-Functionality-in-SQL-Server-2008.htm

You could find more material through google or bing.
0
 
DavidMorrisonCommented:
Hi Keplan,  are you asking the best way to uniquely identify a row in a fact table when you're being provided a business/natural key from the source system?

If so then there are a few questions here, firstly I would be very surprised if the source system had absolutely no way of uniquely identifying each row, they just may not be supplying it to you, can you speak to the owner of the source system?

The main approach to determine if a fact row is unique is usually the combination of its dimension member values, so could you not simply check each dimension value in the fact and check to see if it exists?

As for the previous posters suggestion of "upsert", I think he is referring to the MERGE statement in SQL Server. This would do the task but may not best your ETL process depending on how it is implemented


Thanks
0
 
keplanAuthor Commented:
This is just an explain little bit on my requirment. The source system has a key called transaction number which has a unique key for the source. However, the transaction number is not the attribute we store in the data warehouse enviroment. So there is no tracking based on that.
I've added last load time on the fact table and also onto a staging enviroment.
But I need to trust this last load time is more consistence or reliable to rely on for new records.
Any help on this for best approach
0
 
DavidMorrisonCommented:
Hi Kaplan, ok so then you shouldnt need the date inserted should you? once a row comes in your ETL, you check it against the fact table on all of its dimension values (not your measures), if that row matches an existing row based on all those dimension values then its an update to that existing row, if it does not match on all dimension values then it is a new row and should be inserted

0
 
sachitjainCommented:
So Kaplan, this is what I could understand about your design, correct me if I am wrong. You have a source db lying on instance and you have staging db (with same schema as that of source db) and dwh db lying on some other SQL instance. Now first you pull all updated records from source db to staging db then you update your facts tables in dwh db from staging db. Is that correct?

Can't you do one thing, flag every record in staging tables based on its status in source db. You could easily do it here as you would be having transaction numbers in both source and staging dbs. Like if record is newly inserted into source or updated after last batch run (from source to staging), flag it as 'U'. Then as you run the other batch to refresh your dwh db from staging db, consider all Us in staging db and reflag them to 'N' then again with next batch run from source to staging, some of them would get reflagged to 'U' with some more new Us from staging and so on. Does it make sense?
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now