Link to home
Start Free TrialLog in
Avatar of Morpheus7
Morpheus7

asked on

Deterimine old rows to delete

Hi, I am currently building a data warehouse and have an issue with records to delete from the fact table.
The source data does not update records but rather creates a new row for the change and uses an audit number together with other columns to form a composite key. I am using the following code to get the most recent record as the row may have been changed several times since the last load:

SELECT       * FROM (
 SELECT      *  ROW_NUMBER() OVER (PARTITION BY [SampleCode] ORDER BY [AuditNumber] desc) rn
 FROM [DataMart].[Dim_Sample]
) tmp WHERE rn = 1
AND AuditNumber IS NOT NULL;

(In the live code the column are listed)

If this returns the 4th version of the row, I need to be able to identify the previous 3 and remove them from the fact table.

Anyone any ideas?

Many thanks
ASKER CERTIFIED SOLUTION
Avatar of Máté Farkas
Máté Farkas
Flag of Hungary image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
A completely different idea is to keep audit data separate anyway, have one main table with current data and a history table with audit data. That way you don't need the initial query for rn=1 at all, you have a table with current rows and no others. For whatever reason you choose to not go that route, it'd enable you to keep much more audit data, as that will have no effect on the performance of queries of the main table with only current data.

I can understand a single audit table if you also sometimes want to query older states. SQL Server also offers a new feature of temporal tables since version 2016, they work with that very concept of a separate history table: https://www.mssqltips.com/sqlservertip/4674/managing-temporal-table-history-in-sql-server-2016/

You may consider using that replacing your own concept.

And last not least,you might like some new query clauses for temporal tables,especially querying data AS OF a certain date:
https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table

Bye, Olaf.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Question abandoned
Provided solution