Solved

URGENT-->  SQL query needed... Remove Duplicate records

Posted on 2004-04-08
9
1,221 Views
Last Modified: 2012-06-21
SQL 2000 DB

I need a query that finds all duplicate records within a table of ~6million records and deletes all but one of the duplicates.  I just need to search on 1 field (BCNUM) (which is indexed but not unique, obviously)... basically I want no more than 1 record with the BCNUM becoming unique.

In addition, how would I need to run the query above if I wanted 3 fields to match as opposed to just the one (i.e. find duplicates where BCNUM, CUSTNUM,INVNUM all match another record w/ the same 3 values)

Tks!

J
0
Comment
Question by:jimtxas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
9 Comments
 
LVL 5

Expert Comment

by:millsoft
ID: 10787007
Hi jimtxas,

Getting BCNUMs that are duplicates are easy:

-- duplicate BCNUMs
select BCNUM, count(BCNUM)
from myTable
group by BCNUM
having count(BCNUM)>1

However, to delete them you need to decide which record you want to keep.  In other words, you'll need to find another field that you can use to keep one of the records.

Do you have a primary key on the table?


Cheers!
Brad
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787127
adding a temporary identity column will help allot.

once you do this you can distinguish the different columns and delete them
0
 

Author Comment

by:jimtxas
ID: 10787161
I do have a IDNUM field which is an identity colum in the DB already...
0
Backup Solution for AWS

Read about how CloudBerry Backup fully integrates your backups with Amazon S3 and Amazon Glacier to provide military-grade encryption and dramatically cut storage costs on any platform.

 
LVL 4

Accepted Solution

by:
ATAHAC earned 500 total points
ID: 10787187
With identity column you can delete with that query if ind is indentity column

delete myTable from myTable as t
where exists (select 1 from myTable where BCNUM = t.BCNUM
and CUSTNUM = t.CUSTNUM and INVNUM = t.INVNUM AND ind < t.ind)

Its deleted all records which have identical BCNUM, CUSTNUM,INVNUM
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787208
This'll get ya somewhat started


declare @id as int
declare @field1 as varchar(40)
declare @temp_Var as varchar(40)

declare c1 cursor for
select IDNUM
from myTable
group by BCNUM, IDNUM
having count(BCNUM)>1

open c1

fetch c1 into @id

while(@@fetch_status = 0)
begin

delete from [table] where idnum <> @id

end

close c1
deallocate c1
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787217
atahac's solution works better :)
0
 
LVL 4

Expert Comment

by:BulZeyE
ID: 10788919
or you could try:

DELETE T1
  FROM MYTABLE T1
INNER JOIN (SELECT BCNUM, CUSTNUM, INVNUM, MAX(IDNUM) as indnum
                    from mytable
                    group by BCNUM, CUSTNUM, INVNUM) T2
  ON T1.BCNUM = T2.BCNUM
  AND T1.CUSTNUM = T2.CUSTNUM
  AND T1.INVNUM = T2.INVNUM
  AND T1.INDNUM < T2.INDNUM
 
0
 
LVL 2

Expert Comment

by:stockcowboy
ID: 10794788
I use this one all day long

This removes duplicate rows where the email_address column has dups in it

It deletes the duplicates indiscriminately (essentially anyways)


--ADD AUTONUMBER COLUMN TO TABLE FOR A GARUNTEED UNIQUE COLUMN
ALTER TABLE TEMP_FILE_IMPORT ADD
USER_ID INT IDENTITY
GO



--REMOVE DUPS FROM SELF
DELETE from TEMP_FILE_IMPORT where user_id in
(
select USER_ID FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
)

AND USER_ID NOT IN (
select max(USER_ID) FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
group by a.EMAIL_ADDRESS
)
GO
0
 

Expert Comment

by:Amitabh
ID: 11107258
Okay so here are so many methods to remove duplicate records...one more...this is done in 3 steps. This will ensure that you have all rows in your main table as unique.


1) Select col1, col2, col3, coln into temp_table -- creating  a new table to store duplicates
    from main_table                                       --from you main table
    group by col1, col2, col3, coln                     --ensuring that only duplicate values goes to temp_table and as unique
    having count(*) > 1

2) Delete main_table                                                       --Duplicate records are deleted
    from main_table m, temp_table t                                 --from the main table
    where m.primary_key_id_field=t.primary_key_id_field

3) Insert into main_table
    Select * from temp_table


hope this solves your problem.
0

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question