asked on

SQL CLEARING DUPES

I have a table with lots of duplications and I need to clean out the dupes . I can determine the sets of dupe rows within using SQL like the following:

select key1, key2, key3, sum(1) as RecCount
from myTable
group by key1, key2, key3
having sum(1) > 1

Giving back something like:
A11, B12, C13, 2
A21, B22, C23, 14
A31, B32, C33, 50
…
…

And I hoped that I could kill the rows using something like the following:

Delete myTable T
Where T.key1, T.key2, T.key3 in
(
select key1, key2, key3, sum(1) as RecCount
from myTable
group By key1, key2, key3
having sum(1) > 1
)

Unfortunately this nor any variation I have come up with will allow me to delete the dupes.

If you think about it, the first query identifies all the rows that are dupes and groups them with their group counts. I need to get all the counts to 1.

ASKER CERTIFIED SOLUTION

Thomasian

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

mcs0506

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Guy Hengel [angelIII / a3]

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Howard Bash

ASKER

I need to do this for an Oracle table. We use Oracle and MS SQL. So, not sure about the CTE technique.

Regarding the first link, the sql copied:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

must be a typo and should be:

SELECT col1, col2, col3, count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

SOLUTION

Scott Pletcher

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Guy Hengel [angelIII / a3]

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Howard Bash

ASKER

The table has no primary key.

Howard Bash

ASKER

I am thinking that this CTE technique and the related SQL cannot be run in a query window but as a stored proc. Is that the case?

SOLUTION

Guy Hengel [angelIII / a3]

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Howard Bash

ASKER

I have found an neater technique for deleting dupes which I will paste for your information:

DELETE FROM DATA_PLAN 
       WHERE rowid NOT IN ( SELECT min(rowid) 
                            FROM DATA_PLAN
                            
                            GROUP BY                             
                                PLAN                        ,
                                CUR_EFFECTIVE_DATE          ,
                                CUR_VISIT_REASON            ,
                                CUR_TRANS_ID                ,
                                CHANGED_TIME                ,
                                PICKUP_TIME                 ,
                                BATCH_STATUS               
                            
                           );

Open in new window

SOLUTION

Guy Hengel [angelIII / a3]

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Howard Bash

ASKER

I suppose you could copy all the data to a temp table do the delete, and then truncate the old table and insert the newly cleaned temp rows. Or depending on the situation, during a maintainence window, have a DBA run the SQL when the database is not available.

But, I understand your point.