Solved

SQL Remove duplicate records based on two fields

Posted on 2014-11-09
5
2,242 Views
Last Modified: 2014-11-14
Hi

I have a table that I am using for testing purposes. There are two columns called "ValA" and "Article". I
want to use these two together to create a primary key, but first want to delete any extra records
that are duplicates based on thecombination of these two columns. How do I do this
0
Comment
Question by:murbro
5 Comments
 
LVL 65

Accepted Solution

by:
Jim Horn earned 167 total points
ID: 40431572
A couple of ways to pull this off, but below is the code to create a subquery to pick off all of the ValA / Article combinations that have more than one row, then the main query deletes everything in the subquery.
DELETE 
FROM YourTable yt
  JOIN (
      SELECT ValA, Article
      FROM YourTable
      GROUP BY ValA, Article
      HAVING COUNT(Article) > 1) dups ON yt.ValA = dups.ValA AND yt.Article = dups.Article

Open in new window

0
 
LVL 15

Assisted Solution

by:Haris Djulic
Haris Djulic earned 167 total points
ID: 40431580
This is the code that will find you the duplicates :

select rtrim(ValA)+rtrim(Article),count( *)
from your_table_name
group by rtrim(ValA)+rtrim(Article)
having count(*)>1

Open in new window


and I pressume you have some kind of additional ID field in your table you can use it to delete those with minimum IDs..

delete from your_table_name
where YOUR_ID_FIELD in (
  select min(YOUR_ID_FIELD ) from your_table_name where rtrim(ValA)+rtrim(Article) in  
  (select value from (
    
    select rtrim(ValA)+rtrim(Article) value,count( *) num
from your_table_name
group by rtrim(ValA)+rtrim(Article)
having count(*)>1)a
    )
  )

Open in new window

0
 
LVL 25

Assisted Solution

by:chaau
chaau earned 166 total points
ID: 40431628
This code will remove duplicates leaving at least one record from the duplicate records intact:
with cte as (
select *, ROW_NUMBER() OVER (PARTITION BY ValA, Article) AS rn
from yourTable)
DELETE FROM cte WHERE rn > 1

Open in new window

If you have a column that you can use as a sort order to determine which record is more preferred to be kept than you can use it. E.g. imagine you have a dateAdded column that tells when the record was added. If you wish to keep the most recent records you would use this query:
with cte as (
select *, ROW_NUMBER() OVER (PARTITION BY ValA, Article ORDER BY dateAdded DESC) AS rn
from yourTable)
DELETE FROM cte WHERE rn > 1

Open in new window

0
 

Author Closing Comment

by:murbro
ID: 40442665
Thank you all
0
 
LVL 65

Expert Comment

by:Jim Horn
ID: 40442690
Thanks for the split.  Good luck with your project.  -Jim
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
I have a large data set and a SSIS package. How can I load this file in multi threading?
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question