How to delete duplicate rows

Hi all:
I've some duplicated rows in my table (tktrans).  the only different field between duplicated record is (Sn). I would like to delete on of the duplicate rows.
khaled salemSoftware DeveloperAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Guy Hengel [angelIII / a3]Billing EngineerCommented:
you posted in both ORACLE and MS SQL server, please clarify ...

you may want to read up this article, though it doesn't explain the DELETE part, but helps to understand the basic concept nevertheless:

so,you need to decide which of the 2 (+) rows you want to keep, let's say lowest value of Sn, and which field(s) are to be taken as "key" fields to consider the records "duplicated"

the below code assumes it's 1 field to determin the "duplication", you can add more condition if it's several columns
 FROM yourtable T
                     WHERE n.Sn > t.SN
                        AND n.keyfield = t.keyfield

Open in new window

before the delete, you may want to run some SELECT along the same lines

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
This should do the trick:

With CTE as (Select *, RN = Row_Number()  OVER (Partition by Col1, ..., Coln Order BY Col1,...Coln, Sn) From tktrans)
Delete CTE Where RN > 1;

Col1,..., Coln are all columns in your table except SN.
awking00Information Technology SpecialistCommented:
Just a little more info. Can there be multiple duplicates (i.e. triplicates, quadruplicates, etc.)? If so, do you just want to delete one or do you want to delete all but? We would need to know the criteria for the one(s) you want to keep. Perhaps you can provide some sample data and how you would want it to look like after the deletes.
Protecting & Securing Your Critical Data

Considering 93 percent of companies file for bankruptcy within 12 months of a disaster that blocked access to their data for 10 days or more, planning for the worst is just smart business. Learn how Acronis Backup integrates security at every stage

awking00Information Technology SpecialistCommented:
>>the only different field between duplicated record is (Sn).<<
This is also unclear to me. Do you mean that the Sn value could be different, but if all the other fields are the same, it would be considered a duplicate to be deleted? Or do you mean that, if the same Sn value appears more than once, then the repeated values are considered duplicate records? Again, some sample data and expected results would help greatly in clarifying your intent.
The advantage of the method in my previous post is that even if all columns in the duplicate rows are identical it'll still get rid of all duplicates.
awking00Information Technology SpecialistCommented:
ProjectChampion, I agree that your method would remove all duplicate rows, but the asker stated he wanted to remove one of the duplicate rows. If he means that literally (which I doubt but still would like clarification), your method would also work by changing to "where rn = 1". The other unclear issue is just how the Sn field relates to the duplicates. Additionally, your method would perform much better if there were a primary key (or keys) to partition on rather than every column, which is why I wanted to see some sample data.
Changing to  "where rn = 1" would delete all unique rows and one copy of each duplicate row, leaving only the second, third, ... copies of the duplicates. The WHERE clause should remain as "RN > 1".

Also partitioning on PK (if any) will be pointless as PK are by definition unique, so any partitioning (or Group By) must include the identical columns, which according to khaledsalem's post "the only different field between duplicated record is (Sn)" means all columns except Sn.
awking00Information Technology SpecialistCommented:
First let me say I wasn't by any means trying to be critical of your proposed solutions. Actually, I think they're precisely what I would propose if I had the same confidence in what the asker's intent is as you seem to. However, because I didn't have that confidence, I simply wanted to get more clarification. :-)

Sorry, I meant to say "where rn = 2" if the intent was to only delete one row.

>>"the only different field between duplicated record is (Sn)" means all columns except Sn.<<
I'm glad you understand what that meant, but I felt it was extremely vague and just wanted a further explanation (in fact, I still do).
khaled salemSoftware DeveloperAuthor Commented:
Dear Guy Hengel [angelIII / a3]
I am trying your answer its looks like correct 100%. but the result different.
I have found the 17 duplicate rows by the using code
select  tktno, count(tktno)
from tktrans
where airlineid=148 and issuedate>='2014-01-01' AND tktstatus='VO'
group by tktno
having count(tktno) >1

Open in new window

when try to use your code found 54 duplicate rows
select, t.tktno, t.tktstatus, t.issuedate 
  from tktrans t
 where issuedate>='2014-01-01' and t.tktstatus='VO' and t.airlineid=148 and = 
        from tktrans i
       where i.tktno = t.tktno and i.tktstatus='VO'
    order by t.tktno

Open in new window

Guy Hengel [angelIII / a3]Billing EngineerCommented:
your first code returns how many different values you have that have duplicates (or more)
your second query is missing the AND i.airlineid = 148 ... and is returning all the rows that have the same sn value, which might be several rows ...
what about this ?

insert into tktrans_tmp select tktno, issuedate, tktstatus, ... , max/min(sn)
from tktrans
group by  tktno, issuedate, tktstatus, ...

rename table tkrans to tktrans_with_dups , tktrans_tmp to tktrans
drop table tktrans_with_dups

you can try the select query alone first in order to make sure it gives the data you expect
Eric MulovhedziCommented:
If there is a duplicate, that means there must be a field on table (tktrans) that yo do not want  appearing in more than one raw on the resultant query. Assuming the field name is (Sn):

in My SQL you will use:

FROM `tktrans` tk1, `tktrans` tk2 
WHERE < AND tk1.`Sn` = tk2.`Sn`

Open in new window

Bu its much safer to double check this by doing a SELECT first instead of DELETE i.e.

FROM `tktrans` tk1, `tktrans` tk2 
WHERE < AND tk1.`Sn` = tk2.`Sn`

Open in new window

This is so, provided your table (tktrans) as a unique field called 'id', which most people will usually make a unique table field and auto-increment by default but lets leave that topic for another day.
Guy Hengel [angelIII / a3]Billing EngineerCommented:
@guy: don't really care about points, but i'm interested in what you think is wrong about the solution i posted
Guy Hengel [angelIII / a3]Billing EngineerCommented:
Your technique would surely work, but has the side effect of "dropping" the table. I only would apply this on really large tables with many rows to be deleted,  where the simple delete would eventually fail....
duplicating the data would be a side effect, which would also happen with other techniques in many cases, but for some columns only because the server requires a temporary table to handle the order/group by clause

i hardly see why dropping would be viewed as a side effect, however. this approach has the above mentioned disadvantage, but replacing a table with another is an instantaneous and atomic operation, and avoids scattered row locking. if rows are timestamped, it can also be devised in such a way that the second table can be built incrementally so you'd actually lock the table for a fraction of a second only. depending on the use cases, (namely the frequency and duration of write operations on the table) either approach can be better
Guy Hengel [angelIII / a3]Billing EngineerCommented:
dropping a table has the following "side effects",
A) it drops plenty of other things assigned to the table:
  * constraints
  * indexes
  * permissions
  * statistics
  * replication
  * partitions

B) you may not be able to "drop/replace" the table if
 * it's linked in a foreign key being the "master" table
 * a schema triggers is not allowing the drop
 * table is "schema bound" to views

surely all of those things can be "listed" on the existing table and be recreated, but its just tedious, hence that technique should really and only be used in very rare cases.
interesting point, thanks for sharing.

actually, apart from statistics,

- A is not really an issue because most db engines will allow the rest to be copied using a single "create table ... like ..." query. (possibly with an extra including clause for example in postgres)

- constraints on the other hand would require some extra scripting : this table's constraint will be kept, but foreign keys referencing columns from the subject table will not (you'd need to add a "cascade constraints" clause to the drop statement)

- (unfrequent) cases where views would be an issue would require to remap the views to the new table before performing the update. note that a regular view would not need any modification unless it is referenced by foreign constraints which is not even feasible in many db engines and definitely very bad practice.

thanks again for pointing out. actually, if some of the table columns are foreign keys of other tables, i would not recommend it either without further digging
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Query Syntax

From novice to tech pro — start learning today.