We help IT Professionals succeed at work.

Delete duplicate values leave one row

tyuret
tyuret asked
on
Hi
I want to clean the duplicate values from the my table and only leave the last inserted row of the duplicates.
With the query below I can get the duplicates (some have the same row 4 times)
Anybody can help?
thank you


SELECT A. *
FROM CONTACTFIELDVALUE A
LEFT JOIN CONTACTFIELDVALUE B ON B.ORID = A.ORID
AND A.FIELDID = B.FIELDID
AND A.CONTACTID = B.CONTACTID
WHERE A.ID != B.ID
ORDER BY A.ORID, A.CONTACTID, A.FIELDID
Comment
Watch Question

Nem SchlechtIT Supervisor
Top Expert 2009

Commented:
Just to clarify... ID *is* unique to the rows, but the other fields, ORID, FIELDID, and CONTACTID are all duplicated, yes?

Author

Commented:
exactly
Nem SchlechtIT Supervisor
Top Expert 2009

Commented:
Well, the problem is you want to find rows in a table and then delete them from the same table.  Unfortunately, MySQL doesn't let you use the table you're deleting from in a sub-query on the same DELETE statement, so you need to use a temporary table.

Here's my solution - make SURE to back everything up before you run this!  In fact, you should copy your data and run this on the copy, just to make sure you don't destroy your data (a 'mysqldump' beforehand would be a good idea as well!)
CREATE TEMPORARY TABLE ids_to_delete
SELECT ID
FROM CONTACTFIELDVALUE 
WHERE ID NOT IN (
    SELECT MAX(ID)
    FROM CONTACTFIELDVALUE
    GROUP BY ORID, FIELDID, CONTACTID
  )
;

DELETE CONTACTFIELDVALUE
FROM CONTACTFIELDVALUE, ids_to_delete
WHERE CONTACTFIELDVALUE.ID = ids_to_delete.ID; 
;

Open in new window

Author

Commented:
There is a problem,
 the table ids_to_contain like 180 rows, but it should contain more (I have 550 duplicates and most of them duplicated 3-4 times)
IT Supervisor
Top Expert 2009
Commented:
I should have mentioned ... your first query doesn't work correctly - it produces duplicates of the duplicates.  Double-check your counts by looking at the table itself and the rows produced in the temporary table.

Here are my test inserts:
-- Fields are ID, ORID, FIELDID, and CONTACTID, in that order
INSERT INTO CONTACTFIELDVALUE VALUES (NULL, 1, 2, 2);
INSERT INTO CONTACTFIELDVALUE VALUES (NULL, 1, 2, 2);
INSERT INTO CONTACTFIELDVALUE VALUES (NULL, 1, 2, 2);
INSERT INTO CONTACTFIELDVALUE VALUES (NULL, 2, 4, 4);
INSERT INTO CONTACTFIELDVALUE VALUES (NULL, 2, 4, 4);
INSERT INTO CONTACTFIELDVALUE VALUES (NULL, 3, 5, 5);
INSERT INTO CONTACTFIELDVALUE VALUES (NULL, 4, 6, 6);

Open in new window


Looking at this, the ORID of 1 (field 2) is duplicated a total of 3 times.  ORID 2 is duplicated twice.  My query returns IDs of 1, 2, and 4 (the 1st, 2nd, and 4th inserts) which is correct.

Your query returns this for my test data:
+----+------+---------+-----------+
| ID | ORID | FIELDID | CONTACTID |
+----+------+---------+-----------+
|  2 |    1 |       2 |         2 |
|  3 |    1 |       2 |         2 |
|  1 |    1 |       2 |         2 |
|  3 |    1 |       2 |         2 |
|  1 |    1 |       2 |         2 |
|  2 |    1 |       2 |         2 |
|  5 |    2 |       4 |         4 |
|  4 |    2 |       4 |         4 |
+----+------+---------+-----------+

Open in new window


Which lists ORID 1 *six* times instead of just 3.  The problem is this, the rows have IDs of 1, 2, and 3.  You're just checking to make sure the IDs between the two tables in your join aren't the same, which is true, but in more cases that you'd like.  When A.ID is 1, it doesn't match B.ID of 2 *and* 3.  When A.ID is 2, it doesn't match B.ID of 1 *and* 3, and when A.ID is 3, it doesn't match B.ID of 1 *and* 2.

Here is your output again, but printing out A's ID and B's ID in separate columns:
+------+------+----+------+---------+-----------+
| id_A | id_B | ID | ORID | FIELDID | CONTACTID |
+------+------+----+------+---------+-----------+
|    1 |    2 |  1 |    1 |       2 |         2 |
|    1 |    3 |  1 |    1 |       2 |         2 |
|    2 |    1 |  2 |    1 |       2 |         2 |
|    2 |    3 |  2 |    1 |       2 |         2 |
|    3 |    1 |  3 |    1 |       2 |         2 |
|    3 |    2 |  3 |    1 |       2 |         2 |
|    4 |    5 |  4 |    2 |       4 |         4 |
|    5 |    4 |  5 |    2 |       4 |         4 |
+------+------+----+------+---------+-----------+

Open in new window


Just compare the first 2 columns and you'll see what I mean.

Author

Commented:
worked! thank you

Author

Commented:
Please check this question related, thank you

http://www.experts-exchange.com/Database/MySQL/Q_27707060.html