Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 760
  • Last Modified:

remove duplicate rows (mysql) / preserve newest row

I use the following queries to delete duplicate rows:
$dbh->do( "ALTER IGNORE TABLE data ADD UNIQUE INDEX dupidx ( field1, field2, field3, field4 )" );
$dbh->do( "ALTER TABLE leads DROP INDEX dupidx" );

However, this way it preserves the oldest entry and deletes the following rows with same data. IS there a way to delete the old one and keep the newest row of the duplicates?
0
mypatek
Asked:
mypatek
  • 5
  • 5
  • 2
1 Solution
 
lwadwellCommented:
Hi mypatek,

how do you know which one is "newer" than the others?

lwadwell
0
 
mypatekAuthor Commented:
Hi lwadwell

There is an integer field in the table, "transid" with increasing numbers

i.e.

400000
400001
400002
...

and so on.. as higher the number the newer the row
0
 
lwadwellCommented:
mypatek,

Your best bet here is to use a DELETE statement.  Something like ...

delete t1 from yourtable t1, yourtable t2
 where t1.dup_col1 = t2.dup_col1
   and t1.dup_col2 = t2.dup_col2
   and t1.dup_col3 = t2.dup_col3
   and t1.transid  < t2.transid;

lwadwell
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
mypatekAuthor Commented:
is this not achievable without creating a second table? as there are about 200k rows in this table I'm a bit concerned about performance issues
0
 
lwadwellCommented:
mypatek,

there is no second table in the DELETE statement.  The same table ... the one you wish to delete from ... is referenced twice.

lwadwell
0
 
mypatekAuthor Commented:
ok, thanks for the hint.

howeber, i'm not fully satisfied yet with your solution as mysql crashes with your proposal:

DELETE t2 FROM tablename as t1, tablename as t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.col4 = t2.col4 and t1.col5 = t2.col5 and t1.transid < t2.transid



The job is done in about 2-3 seconds with the following statement:

ALTER IGNORE TABLE tablename ADD UNIQUE INDEX dupidx ( col1, col2, col3, col4, col5 )

but it keeps the oldest of multiple identical rows. Is it possible to add a ORDER BY statement to this to delete the older rows first?
0
 
lwadwellCommented:
mypatek,

in your last post you had my proposal as "DELETE t2 ..." that should be "DELETE t1 ..." to meet your requirement, I assume that was just a typo.

To the best of my knowledge ... there is nothing you can do with the ALTER statement.  There is no order by you can add.  mysql is just adding the rows to the index ... one row at a time and removing rows that would be duplicate.  It is not trying to keep the oldest, it just keeps the one it added to the index first - which in this case happen to be the oldest.  To change this behaviour you would need to change mysql or reorganize the table in the database - this would need a second table; and even then I cannot guarentee it would work.

lwadwell
0
 
BALMUKUND KESHAVCommented:
If your table is not too large, you can write a small script like this :
use table as recordset order by unquie field, transid desc
rs.movefirst
do while not rs.eof()
 fld=unique field
rs.movenext
if rs.fields(unique field)=fld
  delete
end if
loop

The above  script will delete those records which are duplicate, if your table has triplicate or multi records .. then it will not work, Try at your own risk,

Regards,

Bm Keshav



0
 
mypatekAuthor Commented:
Thank you, Bm Keshav. Unfortunately the table does not only contain duplicates, but also triplicates etc.

however, the syntax from Iwadwell is working now after creating a corresponding index for that table. no performance issues anymore.

one last thing needs to be resolved:
how can i get out the values of a field of all deleted rows? the name of the field is 'transid' and i need it to delete other data outside of the database corresponding with the deleted row.

it is running within perl..

i wrote the following (not tested yet) is it correct, or is there a simpler way to achieve this?:

my $sth = $dbh->prepare("SELECT transid FROM tablename as t1, tablename as t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.col4 = t2.col4 and t1.col5 = t2.col5 and t1.transid < t2.transid'");

$sth->execute;
$rows = $sth->rows;

if ($rows) {
   $sth->bind_columns(undef, \$transid);
   while (my @row=$sth->fetchrow_array) {
      my $rowref=$dbh->selectall_arrayref("delete from... LIMIT 1");
   }
   $sth->finish;
}
0
 
BALMUKUND KESHAVCommented:
No idea about Perl???????

Bm Keshav

0
 
lwadwellCommented:
BMKESHAV,

As long as this is done before the data is deleted from the main table ... your concept is right.  I have simplified the code and have shown it below.  Note: I am showing two different DELETE's to show how multiple tables can be done in the one loop; remove or add as needed.

I do need to ask why you were specifiying "LIMIT 1" on your DELETE statement?  As transid is the FK in the other tables ... would you not want to delete all rows with that FK?

Otherwise ... if you have deleted the rows from the main table, you could accomplish this in a single DELETE per table:
DELETE FROM othertablename as d WHERE NOT EXISTS (SELECT 1 FROM tablename as t1 WHERE t1.transid = d.transid);

N.B. as always, for optimal performance - appropriate indexes need to exist.

lwadwell
my $sth = $dbh->prepare("SELECT t1.transid FROM tablename as t1, tablename as t2 where t1.col1 = t2.col1 and t1.col2 = t2.col2 and t1.col3 = t2.col3 and t1.col4 = t2.col4 and t1.col5 = t2.col5 and t1.transid < t2.transid");
my $dh1 = $dbh->prepare("DELETE FROM othertablename WHERE transid=?");
my $dh2 = $dbh->prepare("DELETE FROM anothertable WHERE transid=?"); 
$sth->execute(); 
while (my @row=$sth->fetchrow_array) { 
    $dh1->execute($row[0]);
    $dh2->execute($row[0]);
}

Open in new window

0
 
mypatekAuthor Commented:
It works now like a charme, Thank you!
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 5
  • 5
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now