Solved

URGENT-->  SQL query needed... Remove Duplicate records

Posted on 2004-04-08
9
1,217 Views
Last Modified: 2012-06-21
SQL 2000 DB

I need a query that finds all duplicate records within a table of ~6million records and deletes all but one of the duplicates.  I just need to search on 1 field (BCNUM) (which is indexed but not unique, obviously)... basically I want no more than 1 record with the BCNUM becoming unique.

In addition, how would I need to run the query above if I wanted 3 fields to match as opposed to just the one (i.e. find duplicates where BCNUM, CUSTNUM,INVNUM all match another record w/ the same 3 values)

Tks!

J
0
Comment
Question by:jimtxas
9 Comments
 
LVL 5

Expert Comment

by:millsoft
ID: 10787007
Hi jimtxas,

Getting BCNUMs that are duplicates are easy:

-- duplicate BCNUMs
select BCNUM, count(BCNUM)
from myTable
group by BCNUM
having count(BCNUM)>1

However, to delete them you need to decide which record you want to keep.  In other words, you'll need to find another field that you can use to keep one of the records.

Do you have a primary key on the table?


Cheers!
Brad
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787127
adding a temporary identity column will help allot.

once you do this you can distinguish the different columns and delete them
0
 

Author Comment

by:jimtxas
ID: 10787161
I do have a IDNUM field which is an identity colum in the DB already...
0
 
LVL 4

Accepted Solution

by:
ATAHAC earned 500 total points
ID: 10787187
With identity column you can delete with that query if ind is indentity column

delete myTable from myTable as t
where exists (select 1 from myTable where BCNUM = t.BCNUM
and CUSTNUM = t.CUSTNUM and INVNUM = t.INVNUM AND ind < t.ind)

Its deleted all records which have identical BCNUM, CUSTNUM,INVNUM
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 3

Expert Comment

by:jayrod
ID: 10787208
This'll get ya somewhat started


declare @id as int
declare @field1 as varchar(40)
declare @temp_Var as varchar(40)

declare c1 cursor for
select IDNUM
from myTable
group by BCNUM, IDNUM
having count(BCNUM)>1

open c1

fetch c1 into @id

while(@@fetch_status = 0)
begin

delete from [table] where idnum <> @id

end

close c1
deallocate c1
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787217
atahac's solution works better :)
0
 
LVL 4

Expert Comment

by:BulZeyE
ID: 10788919
or you could try:

DELETE T1
  FROM MYTABLE T1
INNER JOIN (SELECT BCNUM, CUSTNUM, INVNUM, MAX(IDNUM) as indnum
                    from mytable
                    group by BCNUM, CUSTNUM, INVNUM) T2
  ON T1.BCNUM = T2.BCNUM
  AND T1.CUSTNUM = T2.CUSTNUM
  AND T1.INVNUM = T2.INVNUM
  AND T1.INDNUM < T2.INDNUM
 
0
 
LVL 2

Expert Comment

by:stockcowboy
ID: 10794788
I use this one all day long

This removes duplicate rows where the email_address column has dups in it

It deletes the duplicates indiscriminately (essentially anyways)


--ADD AUTONUMBER COLUMN TO TABLE FOR A GARUNTEED UNIQUE COLUMN
ALTER TABLE TEMP_FILE_IMPORT ADD
USER_ID INT IDENTITY
GO



--REMOVE DUPS FROM SELF
DELETE from TEMP_FILE_IMPORT where user_id in
(
select USER_ID FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
)

AND USER_ID NOT IN (
select max(USER_ID) FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
group by a.EMAIL_ADDRESS
)
GO
0
 

Expert Comment

by:Amitabh
ID: 11107258
Okay so here are so many methods to remove duplicate records...one more...this is done in 3 steps. This will ensure that you have all rows in your main table as unique.


1) Select col1, col2, col3, coln into temp_table -- creating  a new table to store duplicates
    from main_table                                       --from you main table
    group by col1, col2, col3, coln                     --ensuring that only duplicate values goes to temp_table and as unique
    having count(*) > 1

2) Delete main_table                                                       --Duplicate records are deleted
    from main_table m, temp_table t                                 --from the main table
    where m.primary_key_id_field=t.primary_key_id_field

3) Insert into main_table
    Select * from temp_table


hope this solves your problem.
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to shrink a transaction log file down to a reasonable size.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now