Solved

Delete duplicates from table and leave one record with the latest date_modified

Posted on 2006-06-16
2
275 Views
Last Modified: 2008-02-01
I have a table with lots of duplicates. I need to delete them based on [record id#]. I use the code below for this.
I have a few columns in the table. My previous procedure instead of updating records, appended new records to the table creating duplicates. I have a column [Date Modified] and I need to delete all duplicates (record id should be unique) but leave one record with the latest [Date Modified] (i.e. I have 50 records with the same record id but different [Date Modified]. I need only one record for this Record ID but it should have the latest [Date Modified].
I would really appreciate your advice.

Code to delete records without checking for the latest [Date Modified]

DECLARE @id INT

DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT [Record Id#], COUNT(*) AS numDups
FROM [infoteam].[Invoice_Recon_Detail_Original]
GROUP BY [Record Id#]
HAVING COUNT(*) > 1
DECLARE @numDups INT

OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @id, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
     SET @numDups = @numDups - 1
     SET ROWCOUNT @numDups
     DELETE FROM [infoteam].[Invoice_Recon_Detail_Original]
     WHERE [Record Id#] = @id
     FETCH NEXT FROM dupsCsr INTO @id, @numDups
END
CLOSE dupsCsr
DEALLOCATE dupsCsr
SET ROWCOUNT 0
0
Comment
Question by:InfoTeam
2 Comments
 
LVL 1

Expert Comment

by:asduth
ID: 16921797
If there is only one record with the max time-stamp, then you can do the whole thing in a single delete statement, rather then using a cursor :
The basic idea is to do a sub-query which returns the latest records, and then delete from the main table all rows which do not join to this sub-query, hence :


delete [all]
from [infoteam].[Invoice_Recon_Detail_Original] [all]
left join (
      select [Record Id#], max([Date Modified]) [Date Modified]
      from [infoteam].[Invoice_Recon_Detail_Original]
      group by  [Record Id#]
) [latest]
on [all].[Record Id#] = [latest].[Record Id#] and [all].[Date Modified] = [latest].[Date Modified]
where [latest].[Record ID#] is NULL

0
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 500 total points
ID: 16921805
just this 1 statement needed....

delete from [infoteam].[Invoice_Recon_Detail_Original]
  where exists (select x.[record id#] from [infoteam].[Invoice_Recon_Detail_Original] as x
               where x.[record id#] =[infoteam].[Invoice_Recon_Detail_Original].[record id#]
                  and x.[Date Modified] > [infoteam].[Invoice_Recon_Detail_Original].[date modified]
                        )


ie Delete the row if the same recordid exists with a higher modification date.


0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
The Delta outage: 650 cancelled flights, more than 1200 delayed flights, thousands of frustrated customers, tens of millions of dollars in damages – plus untold reputational damage to one of the world’s most trusted airlines. All due to a catastroph…
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Via a live example, show how to setup several different housekeeping processes for a SQL Server.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question