Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 284
  • Last Modified:

Delete duplicates from table and leave one record with the latest date_modified

I have a table with lots of duplicates. I need to delete them based on [record id#]. I use the code below for this.
I have a few columns in the table. My previous procedure instead of updating records, appended new records to the table creating duplicates. I have a column [Date Modified] and I need to delete all duplicates (record id should be unique) but leave one record with the latest [Date Modified] (i.e. I have 50 records with the same record id but different [Date Modified]. I need only one record for this Record ID but it should have the latest [Date Modified].
I would really appreciate your advice.

Code to delete records without checking for the latest [Date Modified]

DECLARE @id INT

DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT [Record Id#], COUNT(*) AS numDups
FROM [infoteam].[Invoice_Recon_Detail_Original]
GROUP BY [Record Id#]
HAVING COUNT(*) > 1
DECLARE @numDups INT

OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @id, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
     SET @numDups = @numDups - 1
     SET ROWCOUNT @numDups
     DELETE FROM [infoteam].[Invoice_Recon_Detail_Original]
     WHERE [Record Id#] = @id
     FETCH NEXT FROM dupsCsr INTO @id, @numDups
END
CLOSE dupsCsr
DEALLOCATE dupsCsr
SET ROWCOUNT 0
0
InfoTeam
Asked:
InfoTeam
1 Solution
 
asduthCommented:
If there is only one record with the max time-stamp, then you can do the whole thing in a single delete statement, rather then using a cursor :
The basic idea is to do a sub-query which returns the latest records, and then delete from the main table all rows which do not join to this sub-query, hence :


delete [all]
from [infoteam].[Invoice_Recon_Detail_Original] [all]
left join (
      select [Record Id#], max([Date Modified]) [Date Modified]
      from [infoteam].[Invoice_Recon_Detail_Original]
      group by  [Record Id#]
) [latest]
on [all].[Record Id#] = [latest].[Record Id#] and [all].[Date Modified] = [latest].[Date Modified]
where [latest].[Record ID#] is NULL

0
 
LowfatspreadCommented:
just this 1 statement needed....

delete from [infoteam].[Invoice_Recon_Detail_Original]
  where exists (select x.[record id#] from [infoteam].[Invoice_Recon_Detail_Original] as x
               where x.[record id#] =[infoteam].[Invoice_Recon_Detail_Original].[record id#]
                  and x.[Date Modified] > [infoteam].[Invoice_Recon_Detail_Original].[date modified]
                        )


ie Delete the row if the same recordid exists with a higher modification date.


0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now