Fact table incremental insert/update best practice

Posted on 2011-10-17
Last Modified: 2013-11-10

I would like to ask question from expert in data warehousing. Please give some good solution to update/insert fact table in a data warehouse where there are no seuential number from the source system to tract the new rows. I'm kind of looking at some date column or any best practice in the inductry.
Question by:keplan
    LVL 12

    Expert Comment

    SQL Server 2008 has got a feature named as 'Upsert', explore more around it.

    You could find more material through google or bing.
    LVL 5

    Expert Comment

    Hi Keplan,  are you asking the best way to uniquely identify a row in a fact table when you're being provided a business/natural key from the source system?

    If so then there are a few questions here, firstly I would be very surprised if the source system had absolutely no way of uniquely identifying each row, they just may not be supplying it to you, can you speak to the owner of the source system?

    The main approach to determine if a fact row is unique is usually the combination of its dimension member values, so could you not simply check each dimension value in the fact and check to see if it exists?

    As for the previous posters suggestion of "upsert", I think he is referring to the MERGE statement in SQL Server. This would do the task but may not best your ETL process depending on how it is implemented


    Author Comment

    This is just an explain little bit on my requirment. The source system has a key called transaction number which has a unique key for the source. However, the transaction number is not the attribute we store in the data warehouse enviroment. So there is no tracking based on that.
    I've added last load time on the fact table and also onto a staging enviroment.
    But I need to trust this last load time is more consistence or reliable to rely on for new records.
    Any help on this for best approach
    LVL 5

    Assisted Solution

    Hi Kaplan, ok so then you shouldnt need the date inserted should you? once a row comes in your ETL, you check it against the fact table on all of its dimension values (not your measures), if that row matches an existing row based on all those dimension values then its an update to that existing row, if it does not match on all dimension values then it is a new row and should be inserted

    LVL 12

    Accepted Solution

    So Kaplan, this is what I could understand about your design, correct me if I am wrong. You have a source db lying on instance and you have staging db (with same schema as that of source db) and dwh db lying on some other SQL instance. Now first you pull all updated records from source db to staging db then you update your facts tables in dwh db from staging db. Is that correct?

    Can't you do one thing, flag every record in staging tables based on its status in source db. You could easily do it here as you would be having transaction numbers in both source and staging dbs. Like if record is newly inserted into source or updated after last batch run (from source to staging), flag it as 'U'. Then as you run the other batch to refresh your dwh db from staging db, consider all Us in staging db and reflag them to 'N' then again with next batch run from source to staging, some of them would get reflagged to 'U' with some more new Us from staging and so on. Does it make sense?

    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Join & Write a Comment

    In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
    International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
    Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
    Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    19 Experts available now in Live!

    Get 1:1 Help Now