How can I identify and delete these duplicates?

I've got a table that looks like this

id    contestant_id   voter_id   contest_end_date   ballots

In some instances, I'm getting entries like this:

1      12345          0000          2012-09-30   1
2      12345          0000          2012-09-30   1
3      67890          1111          2012-09-30   4

Entry #3 is gold, but the first two represent duplicates that I need to delete. So, in light of the fact that the duplicates are defined in terms of two columns rather than just one, how do I identify them and then how do I delete them in the context of a loop / delete dynamic?
brucegustPHP DeveloperAsked:
Who is Participating?
 
Cornelia YoderConnect With a Mentor ArtistCommented:
Something like this?







   $previd = '';
   $prevcontestant_id = '';
   $prevvoter_id = '';
   $prevcontest_end_date = '';
   $prevballots = '';


$query = "SELECT * FROM Table";
$result = mysql_query($query);

while ($row = mysql_fetch_array($result))
{
   $id = $row['id'];
   $contestant_id = $row['contestant_id'];
   $voter_id = $row['voter_id'];
   $contest_end_date = $row['contest_end_date'];
   $ballots = $row['ballots'];

   if ($contestant_id == $prevcontestant_id &&
      $voter_id == $prevvoter_id &&
      $contest_end_date == $prevcontest_end_date &&
      $ballots == $prevballots)
   {
      mysql_query("DELETE FROM Table WHERE id='$id'");
   }
   else
   {
      $previd = $row['id'];
      $prevcontestant_id = $row['contestant_id'];
      $prevvoter_id = $row['voter_id'];
      $prevcontest_end_date = $row['contest_end_date'];
      $prevballots = $row['ballots'];
   }

}
0
 
GaryCommented:
Are you saying delete all rows that contain duplicates and not even keep one row?
What are the two columns that need to be duplicates?
0
 
dimmergeekConnect With a Mentor Commented:
-- Find Duplicate Rows:
SELECT MAX(ID) as ID, contestant_id, voter_id, contest_end_date, ballots
FROM tableName
GROUP BY contestant_id, voter_id
HAVING COUNT(*) > 1;
 
-- Delete Duplicate Rows
DELETE FROM tableName
WHERE ID IN
( SELECT MAX(ID) FROM tableName
GROUP BY CustName, Pincode
HAVING COUNT(*) > 1);
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
Ray PaseurCommented:
Do you want to delete all of the duplicates or all but one of the duplicates?

To access two columns in the array, you would probably want to concatenate the data elements.
0
 
brucegustPHP DeveloperAuthor Commented:
All but one of the duplicates would be ideal.
0
 
GaryCommented:
Don't think you can get simpler than this, though it does create a new table.

CREATE TABLE new_table as
SELECT * FROM mytable WHERE 1 GROUP BY concat(contestant_id,voter_id);

Don't run on live data until you've verified it is working correctly
0
 
dimmergeekCommented:
Running the first section of my original post will give you the records that have duplicates, and you can use that to determine which ones to delete.
0
 
Ray PaseurCommented:
@GaryC123: Since your script at ID: 38378088 creates a new table, there is no real risk of running it on live data ;-)
0
 
GaryCommented:
doh lmao

but better safe than sorry....
0
 
Ray PaseurCommented:
Roger that!
0
 
Ray PaseurConnect With a Mentor Commented:
Here is a purely PHP solution showing the general logic for eliminating redundancies over multiple columns.  If you were looking for redundancies over all columns you might be able to use array_unique().

<?php // RAY_temp_brucegust.php
error_reporting(E_ALL);
echo '<pre>';

// ELIMINATE ANY SUB-ARRAYS THAT ARE REDUNDANT ON THE num1 AND num2 FIELDS

// TEST DATA
$dat = array
( array
  ( 1
  , 'num1' => '12345'
  , 'num2' => '0000'
  , '2012-09-30'
  , 1
  )
  ,
  array
  ( 2
  , 'num1' => '12345'
  , 'num2' => '0000'
  , '2012-09-30'
  , 1
  )
  ,
  array
  ( 3
  , 'num1' => '67890'
  , 'num2' => '1111'
  , '2012-09-30'
  , 4
  ,
  )
)
;

// CREATE A SIGNAL STRING FROM THE IMPORTANT FIELDS
foreach ($dat as $key => $sub)
{
    $sub['sig'] = $sub['num1'] . $sub['num2'];
    $dat[$key]  = $sub;
}

// USER SORT FOR THE ARRAY ON THE SIGNAL STRING
function signal_sort($a, $b)
{
    if ($a["sig"] == $b["sig"]) return 0;
    return ($a["sig"] < $b["sig"]) ? -1 : 1;
}
usort($dat, 'signal_sort');

// ELIMINATE REDUNDANCIES
$sig = '?';
$out = array();
$bad = array();
foreach ($dat as $key => $sub)
{
    if ($sub['sig'] == $sig)
    {
        $bad[] = $sub;
    }
    else
    {
        $sig = $sub['sig'];
        unset($sub['sig']);
        $out[] = $sub;
    }
}

// SHOW THE WORK PRODUCTS
echo 'THE VALUES WE KEPT: ' . PHP_EOL;
print_r($out);
echo PHP_EOL;
echo 'THE VALUES WE DISCARDED AS REDUNDANT: ' . PHP_EOL;
print_r($bad);
echo PHP_EOL;

Open in new window

HTH, ~Ray
0
 
brucegustPHP DeveloperAuthor Commented:
I got it! Thanks for the feedback!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.