• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 309
  • Last Modified:

How to copy Rows with duplicate IDs to a separate table and then delete them from the current table.

What SQL Statement would find duplicate SID rows, copy the entire row of data to a separate table, then delete them from the current table?

9767 Rows
105 =  count (SID) > 1  

Sample Data:

        SID                RD           TN   TU  DOS  ODB      IDB
      2400200        6/28/2012      1      0      16      22              0
      2400200        7/12/2012      1      0      30      22              0
      2400202        7/12/2012      1      71      30      0              0
      2400206        7/12/2012      1      0      30      1037      0
      2400208        7/12/2012      1      52      30      0              0
      2400210        7/12/2012      1      66      30      0              0
0
H2Omike
Asked:
H2Omike
1 Solution
 
Helen FeddemaCommented:
I think it might be best to do this in code.  Set up a recordset, sorted by SID, and go record by record, using code (something like the code sample below, though you would need to modify it) to determine whether the current SID is a duplicate.  If it is a duplicate, add a record to the other table, and delete the current record.  
   Set rstSource = CurrentDb.OpenRecordset(strQuery, dbOpenDynaset)
   Set rstTarget = CurrentDb.OpenRecordset(strTable, dbOpenDynaset)
   strPrevLastName = ""
   lngLastNameCount = 1
   
   With rstSource
       Do While Not .EOF
         strThisLastName = ![LastName]
         strFullName = ![FullName]
         Debug.Print "Current LastName: " & strThisLastName
         Debug.Print "Current FullName: " & strFullName
         
         If strThisLastName <> strPrevLastName Then
            'New LastName
            Debug.Print "On new LastName record"
            
         ElseIf strThisLastName = strPrevLastName Then
           'Same LastName
            Debug.Print "On another record for same last name"
            
            With rstTarget
               .AddNew
               ![FullName] = strFullName
               lngOutputID = ![OutputID]
               .Update
            End With
            
            'Delete current record
            .Delete
          End If
         
         lngLastNameCount = lngLastNameCount + 1
         strPrevLastName = strThisLastName
         .MoveNext
      Loop
   End With
   
   rstSource.Close
   rstTarget.Close
   Set rstSource = Nothing
   Set rstTarget = Nothing

Open in new window

0
 
H2OmikeAuthor Commented:
I found a way, not sure it's the most efficient- three separate statements:

SELECT SID,
 COUNT(SID) AS DUPS
INTO T1_1206_NUM_DUPS
FROM T1_1206
GROUP BY SID
HAVING ( COUNT(SID) > 1 )

SELECT T1_1206.*
INTO T1_1206_DUPS
FROM T1_1206_NUM_DUPS
Left JOIN T1_1206
ON T1_1206_NUM_DUPS.SID=T1_1206.SID

DELETE FROM T1_1206
WHERE EXISTS
      (SELECT T1_1206_NUM_DUPS.SID
       FROM T1_1206_NUM_DUPS
      WHERE T1_1206_NUM_DUPS.SID = T1_1206.SID)
0
 
Scott PletcherSenior DBACommented:
You're deleting all duplicates.  Is it possible you want to retain one of them in the original table?

Also, for the DELETE, I'd check that the row exists in the export table,  T1_1206_DUPS, rather than the keys-only table, T1_1206_NUM_DUPS, to insure that I had a full copy of the data before it was deleted.
0
 
chaauCommented:
You can combine first two queries into one:
SELECT T1_1206.*
INTO T1_1206_DUPS
FROM T1_1206
INNER JOIN (SELECT SID
FROM T1_1206
GROUP BY SID
HAVING ( COUNT(SID) > 1 ) T1_1206_NUM_DUPS
ON T1_1206_NUM_DUPS.SID=T1_1206.SID

Open in new window

Now, when you are deleting the rows you would want to retain at least one of the rows that was duplicate. Looking at the sample data you have provided I can see that for duplicate entries you have different RD dates. So, assuming that the record with the latest RD is the one that needs to be retained modify your delete statement to this:
DELETE FROM T1_1206
WHERE EXISTS
      (SELECT T1_1206_NUM_DUPS.SID
       FROM T1_1206_NUM_DUPS
      WHERE T1_1206_NUM_DUPS.SID = T1_1206.SID)
AND  T1_1206.RD <> 
    (SELECT MAX(RD) FROM T1_1206_NUM_DUPS
      WHERE T1_1206_NUM_DUPS.SID = T1_1206.SID)

Open in new window

0
 
PatHartmanCommented:
As others have already mentioned, you probably want to keep at least one of the duplicates.  The problem is determining which one to keep.  And the second problem is keeping duplicates from reoccurring.  A simple way to eliminate duplicates as long as you don't care which of the "duplicate" records survive is to rename the existing table.  Create a new table using the original name and copying just the schema but no data.  Then modify the new version to have a unique index on SID (or make it the primary key).  Then run an append query.  At the end, you'll get a message telling you that some records weren't added.  Just say OK continue and the unique values will be added and the duplicates will be discarded.

Using the above method also solves the second problem since your new table now has a unique index to prevent a recurrence.

Warning - DANGER!  DANGER!  Whenever you do something like this, your first step will always be to make one or more backups of the database being modified.  In this particular case, your second step MUST be to turn off Name Auto Correct or you will be really unhappy with the results.
0

Featured Post

NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now