Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

URGENT-->  SQL query needed... Remove Duplicate records

Posted on 2004-04-08
9
Medium Priority
?
1,228 Views
Last Modified: 2012-06-21
SQL 2000 DB

I need a query that finds all duplicate records within a table of ~6million records and deletes all but one of the duplicates.  I just need to search on 1 field (BCNUM) (which is indexed but not unique, obviously)... basically I want no more than 1 record with the BCNUM becoming unique.

In addition, how would I need to run the query above if I wanted 3 fields to match as opposed to just the one (i.e. find duplicates where BCNUM, CUSTNUM,INVNUM all match another record w/ the same 3 values)

Tks!

J
0
Comment
Question by:jimtxas
9 Comments
 
LVL 5

Expert Comment

by:millsoft
ID: 10787007
Hi jimtxas,

Getting BCNUMs that are duplicates are easy:

-- duplicate BCNUMs
select BCNUM, count(BCNUM)
from myTable
group by BCNUM
having count(BCNUM)>1

However, to delete them you need to decide which record you want to keep.  In other words, you'll need to find another field that you can use to keep one of the records.

Do you have a primary key on the table?


Cheers!
Brad
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787127
adding a temporary identity column will help allot.

once you do this you can distinguish the different columns and delete them
0
 

Author Comment

by:jimtxas
ID: 10787161
I do have a IDNUM field which is an identity colum in the DB already...
0
Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

 
LVL 4

Accepted Solution

by:
ATAHAC earned 2000 total points
ID: 10787187
With identity column you can delete with that query if ind is indentity column

delete myTable from myTable as t
where exists (select 1 from myTable where BCNUM = t.BCNUM
and CUSTNUM = t.CUSTNUM and INVNUM = t.INVNUM AND ind < t.ind)

Its deleted all records which have identical BCNUM, CUSTNUM,INVNUM
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787208
This'll get ya somewhat started


declare @id as int
declare @field1 as varchar(40)
declare @temp_Var as varchar(40)

declare c1 cursor for
select IDNUM
from myTable
group by BCNUM, IDNUM
having count(BCNUM)>1

open c1

fetch c1 into @id

while(@@fetch_status = 0)
begin

delete from [table] where idnum <> @id

end

close c1
deallocate c1
0
 
LVL 3

Expert Comment

by:jayrod
ID: 10787217
atahac's solution works better :)
0
 
LVL 4

Expert Comment

by:BulZeyE
ID: 10788919
or you could try:

DELETE T1
  FROM MYTABLE T1
INNER JOIN (SELECT BCNUM, CUSTNUM, INVNUM, MAX(IDNUM) as indnum
                    from mytable
                    group by BCNUM, CUSTNUM, INVNUM) T2
  ON T1.BCNUM = T2.BCNUM
  AND T1.CUSTNUM = T2.CUSTNUM
  AND T1.INVNUM = T2.INVNUM
  AND T1.INDNUM < T2.INDNUM
 
0
 
LVL 2

Expert Comment

by:stockcowboy
ID: 10794788
I use this one all day long

This removes duplicate rows where the email_address column has dups in it

It deletes the duplicates indiscriminately (essentially anyways)


--ADD AUTONUMBER COLUMN TO TABLE FOR A GARUNTEED UNIQUE COLUMN
ALTER TABLE TEMP_FILE_IMPORT ADD
USER_ID INT IDENTITY
GO



--REMOVE DUPS FROM SELF
DELETE from TEMP_FILE_IMPORT where user_id in
(
select USER_ID FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
)

AND USER_ID NOT IN (
select max(USER_ID) FROM (
SELECT EMAIL_ADDRESS
FROM TEMP_FILE_IMPORT
GROUP BY EMAIL_ADDRESS
HAVING Count(USER_ID) > 1) a, TEMP_FILE_IMPORT b
WHERE a.EMAIL_ADDRESS = b.EMAIL_ADDRESS
group by a.EMAIL_ADDRESS
)
GO
0
 

Expert Comment

by:Amitabh
ID: 11107258
Okay so here are so many methods to remove duplicate records...one more...this is done in 3 steps. This will ensure that you have all rows in your main table as unique.


1) Select col1, col2, col3, coln into temp_table -- creating  a new table to store duplicates
    from main_table                                       --from you main table
    group by col1, col2, col3, coln                     --ensuring that only duplicate values goes to temp_table and as unique
    having count(*) > 1

2) Delete main_table                                                       --Duplicate records are deleted
    from main_table m, temp_table t                                 --from the main table
    where m.primary_key_id_field=t.primary_key_id_field

3) Insert into main_table
    Select * from temp_table


hope this solves your problem.
0

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
This shares a stored procedure to retrieve permissions for a given user on the current database or across all databases on a server.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Suggested Courses

580 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question