• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1232
  • Last Modified:

URGENT--> SQL query needed... Remove Duplicate records

SQL 2000 DB

I need a query that finds all duplicate records within a table of ~6million records and deletes all but one of the duplicates.  I just need to search on 1 field (BCNUM) (which is indexed but not unique, obviously)... basically I want no more than 1 record with the BCNUM becoming unique.

In addition, how would I need to run the query above if I wanted 3 fields to match as opposed to just the one (i.e. find duplicates where BCNUM, CUSTNUM,INVNUM all match another record w/ the same 3 values)

Tks!

J
0
jimtxas
Asked:
jimtxas
1 Solution
 
millsoftCommented:
Hi jimtxas,

Getting BCNUMs that are duplicates are easy:

-- duplicate BCNUMs
select BCNUM, count(BCNUM)
from myTable
group by BCNUM
having count(BCNUM)>1

However, to delete them you need to decide which record you want to keep.  In other words, you'll need to find another field that you can use to keep one of the records.

Do you have a primary key on the table?


Cheers!
Brad
0
 
jayrodCommented:
adding a temporary identity column will help allot.

once you do this you can distinguish the different columns and delete them
0
 
jimtxasAuthor Commented:
I do have a IDNUM field which is an identity colum in the DB already...
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
ATAHACCommented:
With identity column you can delete with that query if ind is indentity column

delete myTable from myTable as t
where exists (select 1 from myTable where BCNUM = t.BCNUM
and CUSTNUM = t.CUSTNUM and INVNUM = t.INVNUM AND ind < t.ind)

Its deleted all records which have identical BCNUM, CUSTNUM,INVNUM
0
 
jayrodCommented:
This'll get ya somewhat started


declare @id as int
declare @field1 as varchar(40)
declare @temp_Var as varchar(40)

declare c1 cursor for
select IDNUM
from myTable
group by BCNUM, IDNUM
having count(BCNUM)>1

open c1

fetch c1 into @id

while(@@fetch_status = 0)
begin

delete from [table] where idnum <> @id

end

close c1
deallocate c1
0
 
jayrodCommented:
atahac's solution works better :)
0
 
BulZeyECommented:
or you could try:

DELETE T1
  FROM MYTABLE T1
INNER JOIN (SELECT BCNUM, CUSTNUM, INVNUM, MAX(IDNUM) as indnum
                    from mytable
                    group by BCNUM, CUSTNUM, INVNUM) T2
  ON T1.BCNUM = T2.BCNUM
  AND T1.CUSTNUM = T2.CUSTNUM
  AND T1.INVNUM = T2.INVNUM
  AND T1.INDNUM < T2.INDNUM
 
0
 
stockcowboyCommented:
I use this one all day long

This removes duplicate rows where the email_address column has dups in it

It deletes the duplicates indiscriminately (essentially anyways)


--ADD AUTONUMBER COLUMN TO TABLE FOR A GARUNTEED UNIQUE COLUMN
ALTER TABLE TEMP_FILE_IMPORT ADD
USER_ID INT IDENTITY
GO



--REMOVE DUPS FROM SELF
DELETE from TEMP_FILE_IMPORT where user_id in
(
select USER_ID FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
)

AND USER_ID NOT IN (
select max(USER_ID) FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
group by a.EMAIL_ADDRESS
)
GO
0
 
AmitabhCommented:
Okay so here are so many methods to remove duplicate records...one more...this is done in 3 steps. This will ensure that you have all rows in your main table as unique.


1) Select col1, col2, col3, coln into temp_table -- creating  a new table to store duplicates
    from main_table                                       --from you main table
    group by col1, col2, col3, coln                     --ensuring that only duplicate values goes to temp_table and as unique
    having count(*) > 1

2) Delete main_table                                                       --Duplicate records are deleted
    from main_table m, temp_table t                                 --from the main table
    where m.primary_key_id_field=t.primary_key_id_field

3) Insert into main_table
    Select * from temp_table


hope this solves your problem.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now