[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 369
  • Last Modified:

How can I identify and delete these duplicates?

I've got a table that looks like this

id    contestant_id   voter_id   contest_end_date   ballots

In some instances, I'm getting entries like this:

1      12345          0000          2012-09-30   1
2      12345          0000          2012-09-30   1
3      67890          1111          2012-09-30   4

Entry #3 is gold, but the first two represent duplicates that I need to delete. So, in light of the fact that the duplicates are defined in terms of two columns rather than just one, how do I identify them and then how do I delete them in the context of a loop / delete dynamic?
0
brucegust
Asked:
brucegust
  • 4
  • 3
  • 2
  • +2
3 Solutions
 
GaryCommented:
Are you saying delete all rows that contain duplicates and not even keep one row?
What are the two columns that need to be duplicates?
0
 
dimmergeekCommented:
-- Find Duplicate Rows:
SELECT MAX(ID) as ID, contestant_id, voter_id, contest_end_date, ballots
FROM tableName
GROUP BY contestant_id, voter_id
HAVING COUNT(*) > 1;
 
-- Delete Duplicate Rows
DELETE FROM tableName
WHERE ID IN
( SELECT MAX(ID) FROM tableName
GROUP BY CustName, Pincode
HAVING COUNT(*) > 1);
0
 
Ray PaseurCommented:
Do you want to delete all of the duplicates or all but one of the duplicates?

To access two columns in the array, you would probably want to concatenate the data elements.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
brucegustAuthor Commented:
All but one of the duplicates would be ideal.
0
 
Cornelia YoderArtistCommented:
Something like this?







   $previd = '';
   $prevcontestant_id = '';
   $prevvoter_id = '';
   $prevcontest_end_date = '';
   $prevballots = '';


$query = "SELECT * FROM Table";
$result = mysql_query($query);

while ($row = mysql_fetch_array($result))
{
   $id = $row['id'];
   $contestant_id = $row['contestant_id'];
   $voter_id = $row['voter_id'];
   $contest_end_date = $row['contest_end_date'];
   $ballots = $row['ballots'];

   if ($contestant_id == $prevcontestant_id &&
      $voter_id == $prevvoter_id &&
      $contest_end_date == $prevcontest_end_date &&
      $ballots == $prevballots)
   {
      mysql_query("DELETE FROM Table WHERE id='$id'");
   }
   else
   {
      $previd = $row['id'];
      $prevcontestant_id = $row['contestant_id'];
      $prevvoter_id = $row['voter_id'];
      $prevcontest_end_date = $row['contest_end_date'];
      $prevballots = $row['ballots'];
   }

}
0
 
GaryCommented:
Don't think you can get simpler than this, though it does create a new table.

CREATE TABLE new_table as
SELECT * FROM mytable WHERE 1 GROUP BY concat(contestant_id,voter_id);

Don't run on live data until you've verified it is working correctly
0
 
dimmergeekCommented:
Running the first section of my original post will give you the records that have duplicates, and you can use that to determine which ones to delete.
0
 
Ray PaseurCommented:
@GaryC123: Since your script at ID: 38378088 creates a new table, there is no real risk of running it on live data ;-)
0
 
GaryCommented:
doh lmao

but better safe than sorry....
0
 
Ray PaseurCommented:
Roger that!
0
 
Ray PaseurCommented:
Here is a purely PHP solution showing the general logic for eliminating redundancies over multiple columns.  If you were looking for redundancies over all columns you might be able to use array_unique().

<?php // RAY_temp_brucegust.php
error_reporting(E_ALL);
echo '<pre>';

// ELIMINATE ANY SUB-ARRAYS THAT ARE REDUNDANT ON THE num1 AND num2 FIELDS

// TEST DATA
$dat = array
( array
  ( 1
  , 'num1' => '12345'
  , 'num2' => '0000'
  , '2012-09-30'
  , 1
  )
  ,
  array
  ( 2
  , 'num1' => '12345'
  , 'num2' => '0000'
  , '2012-09-30'
  , 1
  )
  ,
  array
  ( 3
  , 'num1' => '67890'
  , 'num2' => '1111'
  , '2012-09-30'
  , 4
  ,
  )
)
;

// CREATE A SIGNAL STRING FROM THE IMPORTANT FIELDS
foreach ($dat as $key => $sub)
{
    $sub['sig'] = $sub['num1'] . $sub['num2'];
    $dat[$key]  = $sub;
}

// USER SORT FOR THE ARRAY ON THE SIGNAL STRING
function signal_sort($a, $b)
{
    if ($a["sig"] == $b["sig"]) return 0;
    return ($a["sig"] < $b["sig"]) ? -1 : 1;
}
usort($dat, 'signal_sort');

// ELIMINATE REDUNDANCIES
$sig = '?';
$out = array();
$bad = array();
foreach ($dat as $key => $sub)
{
    if ($sub['sig'] == $sig)
    {
        $bad[] = $sub;
    }
    else
    {
        $sig = $sub['sig'];
        unset($sub['sig']);
        $out[] = $sub;
    }
}

// SHOW THE WORK PRODUCTS
echo 'THE VALUES WE KEPT: ' . PHP_EOL;
print_r($out);
echo PHP_EOL;
echo 'THE VALUES WE DISCARDED AS REDUNDANT: ' . PHP_EOL;
print_r($bad);
echo PHP_EOL;

Open in new window

HTH, ~Ray
0
 
brucegustAuthor Commented:
I got it! Thanks for the feedback!
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now