SQL Remove duplicate records based on two fields

Hi

I have a table that I am using for testing purposes. There are two columns called "ValA" and "Article". I
want to use these two together to create a primary key, but first want to delete any extra records
that are duplicates based on thecombination of these two columns. How do I do this
Murray BrownMicrosoft Cloud Azure/Excel Solution DeveloperAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
A couple of ways to pull this off, but below is the code to create a subquery to pick off all of the ValA / Article combinations that have more than one row, then the main query deletes everything in the subquery.
DELETE 
FROM YourTable yt
  JOIN (
      SELECT ValA, Article
      FROM YourTable
      GROUP BY ValA, Article
      HAVING COUNT(Article) > 1) dups ON yt.ValA = dups.ValA AND yt.Article = dups.Article

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Haris DulicCommented:
This is the code that will find you the duplicates :

select rtrim(ValA)+rtrim(Article),count( *)
from your_table_name
group by rtrim(ValA)+rtrim(Article)
having count(*)>1

Open in new window


and I pressume you have some kind of additional ID field in your table you can use it to delete those with minimum IDs..

delete from your_table_name
where YOUR_ID_FIELD in (
  select min(YOUR_ID_FIELD ) from your_table_name where rtrim(ValA)+rtrim(Article) in  
  (select value from (
    
    select rtrim(ValA)+rtrim(Article) value,count( *) num
from your_table_name
group by rtrim(ValA)+rtrim(Article)
having count(*)>1)a
    )
  )

Open in new window

0
chaauCommented:
This code will remove duplicates leaving at least one record from the duplicate records intact:
with cte as (
select *, ROW_NUMBER() OVER (PARTITION BY ValA, Article) AS rn
from yourTable)
DELETE FROM cte WHERE rn > 1

Open in new window

If you have a column that you can use as a sort order to determine which record is more preferred to be kept than you can use it. E.g. imagine you have a dateAdded column that tells when the record was added. If you wish to keep the most recent records you would use this query:
with cte as (
select *, ROW_NUMBER() OVER (PARTITION BY ValA, Article ORDER BY dateAdded DESC) AS rn
from yourTable)
DELETE FROM cte WHERE rn > 1

Open in new window

0
Murray BrownMicrosoft Cloud Azure/Excel Solution DeveloperAuthor Commented:
Thank you all
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
Thanks for the split.  Good luck with your project.  -Jim
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.