Link to home
Start Free TrialLog in
Avatar of mypatek
mypatek

asked on

remove duplicate rows (mysql) / preserve newest row

I use the following queries to delete duplicate rows:
$dbh->do( "ALTER IGNORE TABLE data ADD UNIQUE INDEX dupidx ( field1, field2, field3, field4 )" );
$dbh->do( "ALTER TABLE leads DROP INDEX dupidx" );

However, this way it preserves the oldest entry and deletes the following rows with same data. IS there a way to delete the old one and keep the newest row of the duplicates?
Avatar of Lee Wadwell
Lee Wadwell
Flag of Australia image

Hi mypatek,

how do you know which one is "newer" than the others?

lwadwell
Avatar of mypatek
mypatek

ASKER

Hi lwadwell

There is an integer field in the table, "transid" with increasing numbers

i.e.

400000
400001
400002
...

and so on.. as higher the number the newer the row
mypatek,

Your best bet here is to use a DELETE statement.  Something like ...

delete t1 from yourtable t1, yourtable t2
 where t1.dup_col1 = t2.dup_col1
   and t1.dup_col2 = t2.dup_col2
   and t1.dup_col3 = t2.dup_col3
   and t1.transid  < t2.transid;

lwadwell
Avatar of mypatek

ASKER

is this not achievable without creating a second table? as there are about 200k rows in this table I'm a bit concerned about performance issues
mypatek,

there is no second table in the DELETE statement.  The same table ... the one you wish to delete from ... is referenced twice.

lwadwell
Avatar of mypatek

ASKER

ok, thanks for the hint.

howeber, i'm not fully satisfied yet with your solution as mysql crashes with your proposal:

DELETE t2 FROM tablename as t1, tablename as t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.col4 = t2.col4 and t1.col5 = t2.col5 and t1.transid < t2.transid



The job is done in about 2-3 seconds with the following statement:

ALTER IGNORE TABLE tablename ADD UNIQUE INDEX dupidx ( col1, col2, col3, col4, col5 )

but it keeps the oldest of multiple identical rows. Is it possible to add a ORDER BY statement to this to delete the older rows first?
mypatek,

in your last post you had my proposal as "DELETE t2 ..." that should be "DELETE t1 ..." to meet your requirement, I assume that was just a typo.

To the best of my knowledge ... there is nothing you can do with the ALTER statement.  There is no order by you can add.  mysql is just adding the rows to the index ... one row at a time and removing rows that would be duplicate.  It is not trying to keep the oldest, it just keeps the one it added to the index first - which in this case happen to be the oldest.  To change this behaviour you would need to change mysql or reorganize the table in the database - this would need a second table; and even then I cannot guarentee it would work.

lwadwell
If your table is not too large, you can write a small script like this :
use table as recordset order by unquie field, transid desc
rs.movefirst
do while not rs.eof()
 fld=unique field
rs.movenext
if rs.fields(unique field)=fld
  delete
end if
loop

The above  script will delete those records which are duplicate, if your table has triplicate or multi records .. then it will not work, Try at your own risk,

Regards,

Bm Keshav



Avatar of mypatek

ASKER

Thank you, Bm Keshav. Unfortunately the table does not only contain duplicates, but also triplicates etc.

however, the syntax from Iwadwell is working now after creating a corresponding index for that table. no performance issues anymore.

one last thing needs to be resolved:
how can i get out the values of a field of all deleted rows? the name of the field is 'transid' and i need it to delete other data outside of the database corresponding with the deleted row.

it is running within perl..

i wrote the following (not tested yet) is it correct, or is there a simpler way to achieve this?:

my $sth = $dbh->prepare("SELECT transid FROM tablename as t1, tablename as t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.col4 = t2.col4 and t1.col5 = t2.col5 and t1.transid < t2.transid'");

$sth->execute;
$rows = $sth->rows;

if ($rows) {
   $sth->bind_columns(undef, \$transid);
   while (my @row=$sth->fetchrow_array) {
      my $rowref=$dbh->selectall_arrayref("delete from... LIMIT 1");
   }
   $sth->finish;
}
No idea about Perl???????

Bm Keshav

ASKER CERTIFIED SOLUTION
Avatar of Lee Wadwell
Lee Wadwell
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mypatek

ASKER

It works now like a charme, Thank you!