Solved

Delete duplicate records

Posted on 2013-01-15
1
274 Views
Last Modified: 2013-01-16
Hello Experts,

I am trying to clean up some sloppy data by deleting duplicate records from a table like this one:

CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

For my purposes, "duplicate" means that:
1. The locationID, dDate, and pieType are identical (notes can be different)
2. There are no intervening records for that locationID/dDate/pieType combination

So in this example, the duplicate records are:1&6, 2&3, and 4&5. 6&7 are NOT duplicates because the date is different, and &10 are NOT duplicates because of intervening record 9.

Of the duplicates, if only one record has non-NULL notes, only that one should be kept; otherwise only the first of the two should be kept.  So in this example, I want to delete records 2, 5, and 6 and keep all the rest.

How can I do this?

Thank you!
0
Comment
Question by:EffinGood
1 Comment
 
LVL 26

Accepted Solution

by:
Chris Luttrell earned 500 total points
ID: 38781343
Interesting problem and excellent work providing structure for the sample table and data to do a proof with.
I used a 2 part CTE to first assign a Row Number within each group of Location and Date values, ordering by Notes DESC then ID to put any with Notes first in the assignment.
Then in the second recursive part it starts with the first row of each group, and looks at the next rows in order and sets a flag if the Types are the same to mark those to remove.
It then uses the results of the CTE in the final delete statement joining back to the original table on ID and deleting thost flaged as duplicates.
This code worked on your sample data, but test all scenarios before trusting on real data.
CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

SELECT * FROM #tblPies TP;

;WITH TableWithRN AS (
SELECT TP.ID,
		TP.locationID,
		TP.dDate,
		TP.pieType,
		TP.notes, 
		ROW_NUMBER() OVER (PARTITION BY TP.locationID, TP.dDate ORDER BY TP.notes DESC, TP.ID) RN
FROM #tblPies TP
), 
RecursiveTableMarkingDups AS
(
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		0 AS DuplicateBit
FROM TableWithRN
WHERE TableWithRN.RN = 1 -- Start with first of series
UNION ALL
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		CASE WHEN RecursiveTableMarkingDups.pieType = TableWithRN.pieType THEN 1 ELSE 0 END AS DuplicateBit -- AssignDuplicateBit if Types are the same
FROM RecursiveTableMarkingDups
INNER JOIN TableWithRN 
		ON RecursiveTableMarkingDups.locationID = TableWithRN.locationID 
		AND RecursiveTableMarkingDups.dDate = TableWithRN.dDate 
		AND RecursiveTableMarkingDups.RN + 1 = TableWithRN.RN
)
-- switch to this select to see the results with flags set
--SELECT *
--FROM RecursiveTableMarkingDups
--ORDER BY RecursiveTableMarkingDups.ID
DELETE FROM tp
FROM #tblPies tP
INNER JOIN RecursiveTableMarkingDups ON tP.ID = RecursiveTableMarkingDups.ID
WHERE DuplicateBit = 1

-- see the results
SELECT *
FROM #tblPies TP

-- clean up
DROP TABLE #tblPies;

Open in new window

Yeilded this before and after of the table #tblPies
Delete_Dups_Problem_Before_and_After
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

Audit has been really one of the more interesting, most useful, yet difficult to maintain topics in the history of SQL Server. In earlier versions of SQL people had very few options for auditing in SQL Server. It typically meant using SQL Trace …
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now