Solved

SQL Remove duplicate records based on two fields

Posted on 2014-11-09
5
2,628 Views
Last Modified: 2014-11-14
Hi

I have a table that I am using for testing purposes. There are two columns called "ValA" and "Article". I
want to use these two together to create a primary key, but first want to delete any extra records
that are duplicates based on thecombination of these two columns. How do I do this
0
Comment
Question by:Murray Brown
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 66

Accepted Solution

by:
Jim Horn earned 167 total points
ID: 40431572
A couple of ways to pull this off, but below is the code to create a subquery to pick off all of the ValA / Article combinations that have more than one row, then the main query deletes everything in the subquery.
DELETE 
FROM YourTable yt
  JOIN (
      SELECT ValA, Article
      FROM YourTable
      GROUP BY ValA, Article
      HAVING COUNT(Article) > 1) dups ON yt.ValA = dups.ValA AND yt.Article = dups.Article

Open in new window

0
 
LVL 15

Assisted Solution

by:Haris Djulic
Haris Djulic earned 167 total points
ID: 40431580
This is the code that will find you the duplicates :

select rtrim(ValA)+rtrim(Article),count( *)
from your_table_name
group by rtrim(ValA)+rtrim(Article)
having count(*)>1

Open in new window


and I pressume you have some kind of additional ID field in your table you can use it to delete those with minimum IDs..

delete from your_table_name
where YOUR_ID_FIELD in (
  select min(YOUR_ID_FIELD ) from your_table_name where rtrim(ValA)+rtrim(Article) in  
  (select value from (
    
    select rtrim(ValA)+rtrim(Article) value,count( *) num
from your_table_name
group by rtrim(ValA)+rtrim(Article)
having count(*)>1)a
    )
  )

Open in new window

0
 
LVL 25

Assisted Solution

by:chaau
chaau earned 166 total points
ID: 40431628
This code will remove duplicates leaving at least one record from the duplicate records intact:
with cte as (
select *, ROW_NUMBER() OVER (PARTITION BY ValA, Article) AS rn
from yourTable)
DELETE FROM cte WHERE rn > 1

Open in new window

If you have a column that you can use as a sort order to determine which record is more preferred to be kept than you can use it. E.g. imagine you have a dateAdded column that tells when the record was added. If you wish to keep the most recent records you would use this query:
with cte as (
select *, ROW_NUMBER() OVER (PARTITION BY ValA, Article ORDER BY dateAdded DESC) AS rn
from yourTable)
DELETE FROM cte WHERE rn > 1

Open in new window

0
 

Author Closing Comment

by:Murray Brown
ID: 40442665
Thank you all
0
 
LVL 66

Expert Comment

by:Jim Horn
ID: 40442690
Thanks for the split.  Good luck with your project.  -Jim
0

Featured Post

Salesforce Has Never Been Easier

Improve and reinforce salesforce training & adoption using WalkMe's digital adoption platform. Start saving on costly employee training by creating fast intuitive Walk-Thrus for Salesforce. Claim your Free Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.

630 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question