Solved

URGENT-->  SQL query needed... Remove Duplicate records

Posted on 2004-04-08
9
1,224 Views
Last Modified: 2012-06-21
SQL 2000 DB

I need a query that finds all duplicate records within a table of ~6million records and deletes all but one of the duplicates.  I just need to search on 1 field (BCNUM) (which is indexed but not unique, obviously)... basically I want no more than 1 record with the BCNUM becoming unique.

In addition, how would I need to run the query above if I wanted 3 fields to match as opposed to just the one (i.e. find duplicates where BCNUM, CUSTNUM,INVNUM all match another record w/ the same 3 values)

Tks!

J
0
Comment
Question by:jimtxas
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
9 Comments
 
LVL 5

Expert Comment

by:millsoft
ID: 10787007
Hi jimtxas,

Getting BCNUMs that are duplicates are easy:

-- duplicate BCNUMs
select BCNUM, count(BCNUM)
from myTable
group by BCNUM
having count(BCNUM)>1

However, to delete them you need to decide which record you want to keep.  In other words, you'll need to find another field that you can use to keep one of the records.

Do you have a primary key on the table?


Cheers!
Brad
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787127
adding a temporary identity column will help allot.

once you do this you can distinguish the different columns and delete them
0
 

Author Comment

by:jimtxas
ID: 10787161
I do have a IDNUM field which is an identity colum in the DB already...
0
Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

 
LVL 4

Accepted Solution

by:
ATAHAC earned 500 total points
ID: 10787187
With identity column you can delete with that query if ind is indentity column

delete myTable from myTable as t
where exists (select 1 from myTable where BCNUM = t.BCNUM
and CUSTNUM = t.CUSTNUM and INVNUM = t.INVNUM AND ind < t.ind)

Its deleted all records which have identical BCNUM, CUSTNUM,INVNUM
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787208
This'll get ya somewhat started


declare @id as int
declare @field1 as varchar(40)
declare @temp_Var as varchar(40)

declare c1 cursor for
select IDNUM
from myTable
group by BCNUM, IDNUM
having count(BCNUM)>1

open c1

fetch c1 into @id

while(@@fetch_status = 0)
begin

delete from [table] where idnum <> @id

end

close c1
deallocate c1
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787217
atahac's solution works better :)
0
 
LVL 4

Expert Comment

by:BulZeyE
ID: 10788919
or you could try:

DELETE T1
  FROM MYTABLE T1
INNER JOIN (SELECT BCNUM, CUSTNUM, INVNUM, MAX(IDNUM) as indnum
                    from mytable
                    group by BCNUM, CUSTNUM, INVNUM) T2
  ON T1.BCNUM = T2.BCNUM
  AND T1.CUSTNUM = T2.CUSTNUM
  AND T1.INVNUM = T2.INVNUM
  AND T1.INDNUM < T2.INDNUM
 
0
 
LVL 2

Expert Comment

by:stockcowboy
ID: 10794788
I use this one all day long

This removes duplicate rows where the email_address column has dups in it

It deletes the duplicates indiscriminately (essentially anyways)


--ADD AUTONUMBER COLUMN TO TABLE FOR A GARUNTEED UNIQUE COLUMN
ALTER TABLE TEMP_FILE_IMPORT ADD
USER_ID INT IDENTITY
GO



--REMOVE DUPS FROM SELF
DELETE from TEMP_FILE_IMPORT where user_id in
(
select USER_ID FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
)

AND USER_ID NOT IN (
select max(USER_ID) FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
group by a.EMAIL_ADDRESS
)
GO
0
 

Expert Comment

by:Amitabh
ID: 11107258
Okay so here are so many methods to remove duplicate records...one more...this is done in 3 steps. This will ensure that you have all rows in your main table as unique.


1) Select col1, col2, col3, coln into temp_table -- creating  a new table to store duplicates
    from main_table                                       --from you main table
    group by col1, col2, col3, coln                     --ensuring that only duplicate values goes to temp_table and as unique
    having count(*) > 1

2) Delete main_table                                                       --Duplicate records are deleted
    from main_table m, temp_table t                                 --from the main table
    where m.primary_key_id_field=t.primary_key_id_field

3) Insert into main_table
    Select * from temp_table


hope this solves your problem.
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question