• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 280
  • Last Modified:

SQL CLEARING DUPES

I have a table with lots of duplications and I need to clean out the dupes .  I can determine the sets of dupe rows within using SQL like the following:

select key1, key2, key3, sum(1) as RecCount
from myTable
group by key1, key2, key3
having sum(1) > 1

Giving back something like:
A11, B12, C13, 2
A21, B22, C23, 14
A31, B32, C33, 50




And I hoped that I could kill the rows using something like the following:

Delete myTable T
Where T.key1, T.key2, T.key3 in
(
select key1, key2, key3, sum(1) as RecCount
from myTable
group By key1, key2, key3
having sum(1) > 1
)

Unfortunately this nor any variation I have come up with will allow me to delete the dupes.

If you think about it,  the first query identifies all the rows that are dupes and groups them with their group counts.  I need to get all the counts to 1.



0
Howard Bash
Asked:
Howard Bash
7 Solutions
 
ThomasianCommented:
DECLARE @myTable TABLE (key1 varchar(10), key2 varchar(10), key3 varchar(10))

INSERT @myTable
SELECT 'A11','B12','C13'
UNION ALL SELECT 'A11','B12','C13'
UNION ALL SELECT 'A21','B22','C23'
UNION ALL SELECT 'A21','B22','C23'
UNION ALL SELECT 'A21','B22','C23'
UNION ALL SELECT 'A21','B22','C23'
UNION ALL SELECT 'A31','B32','C33'
UNION ALL SELECT 'A31','B32','C33'
UNION ALL SELECT 'A31','B32','C33'

;WITH CTE AS (
	SELECT rn=ROW_NUMBER() OVER (PARTITION BY key1,key2,key3 ORDER BY key1)
	FROM @myTable 
)
DELETE CTE
WHERE rn>1

SELECT *
FROM @myTable

Open in new window

0
 
mcs0506Commented:
Create Table emp
(
id int identity,
empname varchar(20),
empdept varchar(10)
)

insert into emp values(‘A’,'X’)
insert into emp values(‘A’,'X’)
insert into emp values(‘A’,'X’)
insert into emp values(‘B’,'X’)
insert into emp values(‘B’,'X’)
insert into emp values(‘C’,'X’)

select * from emp

WITH Dublicates_Emp(empname, empdept,id)
AS
(
SELECT empname,empdept,max(id)
FROM EMP
GROUP BY empname,empdept
HAVING COUNT(empname)>1
)

delete from emp where id in
(
select emp.id
from emp inner join Dublicates_Emp as d
on emp.empname = d.empname
and emp.empdept = d.empdept
and emp.id d.id
)
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
here the technique given my MS:
http://support.microsoft.com/kb/139444

another technique is to use a CTE, generating the ROW_NUMBER() per "partition", and delete all rows which are rn > 1:
http://www.mytechmantra.com/LearnSQLServer/How_to_Delete_Duplicate_Records_from_an_SQL_Server_Table.html

used both already.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Howard BashSenior Software EngineerAuthor Commented:
I need to do this for an Oracle table.  We use Oracle and MS SQL.  So, not sure about the CTE technique.

Regarding the first link, the sql copied:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

must be a typo and should be:

SELECT col1, col2, col3, count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
0
 
Scott PletcherSenior DBACommented:
Is there a unique sequence# or some other value in the row?

If so, it's fairly easy, just select the MIN() or MAX() of that row in your COUNT() query, then delete all rows for the matching keys that are > min() or < max() value you got from the COUNT() query.
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
oracle can do the CTE technique also
0
 
Howard BashSenior Software EngineerAuthor Commented:
The table has no primary key.
0
 
Howard BashSenior Software EngineerAuthor Commented:
I am thinking that this CTE technique and the related SQL cannot be run in a query window but as a stored proc.  Is that the case?
0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
no, you can run CTE also in a query window. ... at least in most applications I know ... some might be more restrictive in that regards,
0
 
Howard BashSenior Software EngineerAuthor Commented:
I have found an neater technique for deleting dupes which I will paste for your information:

DELETE FROM DATA_PLAN 
       WHERE rowid NOT IN ( SELECT min(rowid) 
                            FROM DATA_PLAN
                            
                            GROUP BY                             
                                PLAN                        ,
                                CUR_EFFECTIVE_DATE          ,
                                CUR_VISIT_REASON            ,
                                CUR_TRANS_ID                ,
                                CHANGED_TIME                ,
                                PICKUP_TIME                 ,
                                BATCH_STATUS               
                            
                           );

Open in new window

0
 
Guy Hengel [angelIII / a3]Billing EngineerCommented:
neater ... well, I have run into troubles with that one, actually.
because the rowid values might change during the query (row movement) as the subquery might be evaluated as temp result, before actually running the delete part ...
0
 
Howard BashSenior Software EngineerAuthor Commented:
I suppose you could copy all the data to a temp table do the delete, and then truncate the old table and insert the newly cleaned temp rows.  Or depending on the situation,  during a maintainence window,  have a DBA run the SQL when the database is not available.

But,  I understand your point.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now