Link to home
Start Free TrialLog in
Avatar of Howard Bash
Howard BashFlag for United States of America

asked on

SQL CLEARING DUPES

I have a table with lots of duplications and I need to clean out the dupes .  I can determine the sets of dupe rows within using SQL like the following:

select key1, key2, key3, sum(1) as RecCount
from myTable
group by key1, key2, key3
having sum(1) > 1

Giving back something like:
A11, B12, C13, 2
A21, B22, C23, 14
A31, B32, C33, 50




And I hoped that I could kill the rows using something like the following:

Delete myTable T
Where T.key1, T.key2, T.key3 in
(
select key1, key2, key3, sum(1) as RecCount
from myTable
group By key1, key2, key3
having sum(1) > 1
)

Unfortunately this nor any variation I have come up with will allow me to delete the dupes.

If you think about it,  the first query identifies all the rows that are dupes and groups them with their group counts.  I need to get all the counts to 1.



ASKER CERTIFIED SOLUTION
Avatar of Thomasian
Thomasian
Flag of Philippines image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Avatar of mcs0506
mcs0506
Flag of Pakistan image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Howard Bash

ASKER

I need to do this for an Oracle table.  We use Oracle and MS SQL.  So, not sure about the CTE technique.

Regarding the first link, the sql copied:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1

must be a typo and should be:

SELECT col1, col2, col3, count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The table has no primary key.
I am thinking that this CTE technique and the related SQL cannot be run in a query window but as a stored proc.  Is that the case?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I have found an neater technique for deleting dupes which I will paste for your information:

DELETE FROM DATA_PLAN 
       WHERE rowid NOT IN ( SELECT min(rowid) 
                            FROM DATA_PLAN
                            
                            GROUP BY                             
                                PLAN                        ,
                                CUR_EFFECTIVE_DATE          ,
                                CUR_VISIT_REASON            ,
                                CUR_TRANS_ID                ,
                                CHANGED_TIME                ,
                                PICKUP_TIME                 ,
                                BATCH_STATUS               
                            
                           );

Open in new window

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I suppose you could copy all the data to a temp table do the delete, and then truncate the old table and insert the newly cleaned temp rows.  Or depending on the situation,  during a maintainence window,  have a DBA run the SQL when the database is not available.

But,  I understand your point.