Solved

Delete duplicate records

Posted on 2013-01-15
1
280 Views
Last Modified: 2013-01-16
Hello Experts,

I am trying to clean up some sloppy data by deleting duplicate records from a table like this one:

CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

For my purposes, "duplicate" means that:
1. The locationID, dDate, and pieType are identical (notes can be different)
2. There are no intervening records for that locationID/dDate/pieType combination

So in this example, the duplicate records are:1&6, 2&3, and 4&5. 6&7 are NOT duplicates because the date is different, and &10 are NOT duplicates because of intervening record 9.

Of the duplicates, if only one record has non-NULL notes, only that one should be kept; otherwise only the first of the two should be kept.  So in this example, I want to delete records 2, 5, and 6 and keep all the rest.

How can I do this?

Thank you!
0
Comment
Question by:EffinGood
1 Comment
 
LVL 26

Accepted Solution

by:
Chris Luttrell earned 500 total points
ID: 38781343
Interesting problem and excellent work providing structure for the sample table and data to do a proof with.
I used a 2 part CTE to first assign a Row Number within each group of Location and Date values, ordering by Notes DESC then ID to put any with Notes first in the assignment.
Then in the second recursive part it starts with the first row of each group, and looks at the next rows in order and sets a flag if the Types are the same to mark those to remove.
It then uses the results of the CTE in the final delete statement joining back to the original table on ID and deleting thost flaged as duplicates.
This code worked on your sample data, but test all scenarios before trusting on real data.
CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

SELECT * FROM #tblPies TP;

;WITH TableWithRN AS (
SELECT TP.ID,
		TP.locationID,
		TP.dDate,
		TP.pieType,
		TP.notes, 
		ROW_NUMBER() OVER (PARTITION BY TP.locationID, TP.dDate ORDER BY TP.notes DESC, TP.ID) RN
FROM #tblPies TP
), 
RecursiveTableMarkingDups AS
(
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		0 AS DuplicateBit
FROM TableWithRN
WHERE TableWithRN.RN = 1 -- Start with first of series
UNION ALL
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		CASE WHEN RecursiveTableMarkingDups.pieType = TableWithRN.pieType THEN 1 ELSE 0 END AS DuplicateBit -- AssignDuplicateBit if Types are the same
FROM RecursiveTableMarkingDups
INNER JOIN TableWithRN 
		ON RecursiveTableMarkingDups.locationID = TableWithRN.locationID 
		AND RecursiveTableMarkingDups.dDate = TableWithRN.dDate 
		AND RecursiveTableMarkingDups.RN + 1 = TableWithRN.RN
)
-- switch to this select to see the results with flags set
--SELECT *
--FROM RecursiveTableMarkingDups
--ORDER BY RecursiveTableMarkingDups.ID
DELETE FROM tp
FROM #tblPies tP
INNER JOIN RecursiveTableMarkingDups ON tP.ID = RecursiveTableMarkingDups.ID
WHERE DuplicateBit = 1

-- see the results
SELECT *
FROM #tblPies TP

-- clean up
DROP TABLE #tblPies;

Open in new window

Yeilded this before and after of the table #tblPies
Delete_Dups_Problem_Before_and_After
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
T-SQL - How to use like '%%' operator to search an integer 2 25
TOOLS - convert T-SQL TO PL/SQL 3 23
MS SQL Delete Query 9 30
While in ##Table - Help 4 12
Audit has been really one of the more interesting, most useful, yet difficult to maintain topics in the history of SQL Server. In earlier versions of SQL people had very few options for auditing in SQL Server. It typically meant using SQL Trace …
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question