Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 159
  • Last Modified:

Remove Duplicate Records From SQL 2008 Table

I want to delete duplicate records from this sql table (Contacts) if :

Cust#, First and Last Name are the same and the Contact_ID <> Main_Contact_ID

* upper/lower case does not matter.

* do not want to delete the duplicate that has the same Contact_ID and Main_Contact_ID

* keep all non-duplicate records

* last record in my .png should be deleted


ALSO:

If none of the duplicate records Contact_ID's match the Main_Contact_ID then I need to keep 1 (any one) of those duplicate records.
Capture.PNG
0
thayduck
Asked:
thayduck
  • 10
  • 7
  • 5
1 Solution
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
I didn't test it but I think this can be the solution for your problem:
DELETE FROM Contacts
WHERE EXISTS (SELECT 1 FROM Contacts C2
                             WHERE C2.CustNo = Contacts.CustNo
                                  AND C2.FirstName = Contacts.FirstName
                                  AND C2.LastName = Contacts.LastName
                                  AND C2.Contact_ID <> Contacts.Contact_ID)

Open in new window


Cheers
0
 
thayduckAuthor Commented:
You are not doing any checking against the Main_Contact_ID.
So I am not sure how your code will solve my problem.

Also it looks like your code would also delete non duplicate records.

Can you explain your code, maybe I just don't understand.
0
 
awking00Commented:
delete from contacts c
where exists
(select 1 from
 (select custno, firstname, lastname, contact_id, main_contact_id,
  case when contact_id = main_contact_id then 'yes'
       else 'no'
  end keep
  from contacts) x
 inner join
 (select upper(custno) custno, upper(firstname) firstname, upper(lastname) lastname, count(*) cnt
  from contacts
  group by upper(custno), upper(firstname), upper(lastname)
  having count(*) > 1) y
 on upper(x.custno) = y.custno
 and upper(x.firstname) = y.firstname
 and upper(x.lastname) = y.lastname
 where x.keep = 'no'
 and c.custno = x.custno
 and c.firstname = x.firstname
 and c.lastname = x.lastname)
;
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
thayduckAuthor Commented:
Getting error:


Msg 102, Level 15, State 1, Line 66
Incorrect syntax near 'c'.
Capture.PNG
0
 
awking00Commented:
I keep forgetting sql server requires aliases to be declared with the AS keyword.
Try changing the following lines:
delete from contacts c to delete from contacts as c
from contacts) x to  from contacts) as x
having count(*) > 1) y to having count(*) > 1) as y
0
 
thayduckAuthor Commented:
Your code left only this record for this customer (which is correct because it is not a dup).

cust                          first         last          contact_id Main_Contact_id

F747090A      Elizabeth      Adams      144943      129838


But, I also needed the below dup record to be left because the Contact_ID = Main_Contact_ID)

F747090A      LISA      DIONNE      129838      129838
 
This below dup rec was not left which is correct because the Contact_ID <> Main_Contact_ID)

F747090A      Lisa      Dionne      129837      129838



Your code works this way:  (got error when I changed to   AS C  )

delete from contacts
where exists
(select 1 from
 (select custno, firstname, lastname, contact_id, main_contact_id,
  case when contact_id = main_contact_id then 'yes'
       else 'no'
  end keep
  from contacts) as x
 inner join
 (select upper(custno) custno, upper(firstname) firstname, upper(lastname) lastname, count(*) cnt
  from contacts
  group by upper(custno), upper(firstname), upper(lastname)
  having count(*) > 1) as y
 on upper(x.custno) = y.custno
 and upper(x.firstname) = y.firstname
 and upper(x.lastname) = y.lastname
 where x.keep = 'no'
 and contacts.custno = x.custno
 and contacts.firstname = x.firstname
 and contacts.lastname = x.lastname)
 
 
 
 select * from contacts   order by custno
0
 
awking00Commented:
What does this produce?
select x.custno, x.firstname, x.lastname, x.contact_id, x.main_contact_id from
(select custno, firstname, lastname, contact_id, main_contact_id,
 case when contact_id = main_contact_id then 'yes'
      else 'no'
 end keep
 from contacts) as x
 inner join
(select upper(custno) custno, upper(firstname) firstname, upper(lastname) lastname, count(*) cnt
 from contacts
 group by upper(custno), upper(firstname), upper(lastname)
 having count(*) > 1) as y
on upper(x.custno) = y.custno
and upper(x.firstname) = y.firstname
and upper(x.lastname) = y.lastname
where x.keep = 'no'

It should only produce

CUSTNO   FIRSTNAME    LASTNAME     CONTACT_ID MAIN_CONTACT_ID
-------- ------------ ------------ ---------- ---------------
F747090A Lisa         Dionne           129837          129838
0
 
thayduckAuthor Commented:
produces:


F747090A      Lisa      Dionne      129837      129838
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
Sorry, was copy/paste issue.
Should be this statement:
DELETE FROM Contacts
WHERE EXISTS (SELECT 1 FROM Contacts C2
                             WHERE C2.CustNo = Contacts.CustNo
                                  AND C2.FirstName = Contacts.FirstName
                                  AND C2.LastName = Contacts.LastName
                                  AND C2.Contact_ID <> Contacts.Main_Contact_ID)

Open in new window


This should delete all records from table Contacts where those criteria meets.
For assure that it going to delete the records you want, just substitute the DELETE for SELECT *, so it will returns the records that will be deleted.

Hope this helps you.
Cheer
0
 
thayduckAuthor Commented:
Your code is also deleting non-duplicate records where the contact_id <> master_contact_id.

I want to keep all non-duplicate records.
0
 
thayduckAuthor Commented:
Before running your code:

F747090A      LISA      DIONNE      129838      129838    **Duplicate**
F747090A      Lisa      Dionne      129837      129838    **Duplicate**
F747090A      Elizabeth      Adams      144943      129838    **Non-Duplicate**

After running your code:


This is what I want to be there after running your code:

F747090A      LISA      DIONNE      129838      129838
F747090A      Elizabeth      Adams      144943      129838
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
Ok. It's better with an example :)

So try this code and see if resolve your problem:

DELETE FROM Contacts
WHERE EXISTS (SELECT CustNo, FirstName, LastName, Main_Contact_ID, COUNT(Main_Contact_ID) Duplicates FROM Contacts C2
				WHERE C2.CustNo = Contacts.CustNo
					AND C2.FirstName = Contacts.FirstName
                    AND C2.LastName = Contacts.LastName
                    AND C2.Main_Contact_ID <> Contacts.Contact_ID
              GROUP BY CustNo, FirstName, LastName, Main_Contact_ID
              HAVING COUNT(Main_Contact_ID) > 1)

Open in new window

0
 
awking00Commented:
>>produces:
F747090A      Lisa      Dionne      129837      129838 <<
Then that should be the only record where the custno, firstname, and lastname from the contacts table matches with the subquery aliased as x and, therefore, be deleted leaving the other two records.
0
 
thayduckAuthor Commented:
awking00:

When I run your code to DELETE, my result:

custno      firstname      lastname      contact_id      main_contact_id
F746166F      DERRICK      HOCKLEY      129836      129836
F747090AElizabeth      Adams      144943      129838
F747125ALashawn      Baker      159329      159935

F747090A      LISA      DIONNE      129838      129838       **Is missing**





Vitor Montalvão:

Your latest code worked.


custno      firstname      lastname      contact_id      main_contact_id
F746166F      Robbie      Newby      157541      129836
F747090AElizabeth      Adams      144943      129838
F747090ALISA      DIONNE      129838      129838
F747125AEvelyn      Batista      159330      159935



The 2 - F747090 records are the ones I wanted.
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
Good.
The first query had an error on it but was easy to find it (swap the tables from Contact_ID with Main_Contact_ID in WHERE criteria):

  C2.Contact_ID <> Contacts.Main_Contact_ID ---->>> C2.Main_Contact_ID <> Contacts.Contact_ID

But changing that will let only the record:
F747090A      LISA      DIONNE      129838      129838

So I added a filter to acts only on duplicated records (more than 1 record with same data):
HAVING COUNT(Main_Contact_ID) > 1

Hope this explanation helps you to understand the code.

Cheers
0
 
awking00Commented:
The only thing I can figure is happening is somehow comparisons are being made without regard to case. Otherwise, it would not delete the record where contacts.Lisa = x.LISA and contacts.Dionne = x.DIONNE. Additionally, the group by having count(*) > 1 would produce no rows since the row with firstname Lisa would have a count of 1 and the row with firstname LISA would have a count of 1.
SQL> select custno, firstname, lastname, count(*) cnt
  2  from contacts
  3  group by custno, firstname, lastname;

CUSTNO   FIRSTNAME    LASTNAME            CNT
-------- ------------ ------------ ----------
F747090A Elizabeth    Adams                 1
F747090A Lisa         Dionne                1
F747090A LISA         DIONNE                1
0
 
awking00Commented:
Hit submit a little too soon. I meant to add, "which is why I used the upper function to determine duplicates."
0
 
thayduckAuthor Commented:
Do you want more time to see if you can get it to work, otherwise, I will accept the solution of Vitor since it does work.
0
 
thayduckAuthor Commented:
Thanks for help guys.
0
 
awking00Commented:
Out of curiosity, what does this produce?
SELECT CustNo, FirstName, LastName, Main_Contact_ID, COUNT(Main_Contact_ID) Duplicates FROM Contacts C2
GROUP BY CustNo, FirstName, LastName, Main_Contact_ID
HAVING COUNT(Main_Contact_ID) > 1
0
 
thayduckAuthor Commented:
CustNo      FirstName      LastName      Main_Contact_ID      Duplicates
F711035A      Accounts Payable      (AP)      173557      2
F747090A      LISA      DIONNE      129838      2
P000872W      Joseph      Goldrick      151085      2
0
 
Vitor MontalvãoMSSQL Senior EngineerCommented:
@awking00,

that query returns only duplicates records so the main query will only act on them and not on the single records.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 10
  • 7
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now