Solved

Delete duplicate records

Posted on 2013-01-15
1
289 Views
Last Modified: 2013-01-16
Hello Experts,

I am trying to clean up some sloppy data by deleting duplicate records from a table like this one:

CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

For my purposes, "duplicate" means that:
1. The locationID, dDate, and pieType are identical (notes can be different)
2. There are no intervening records for that locationID/dDate/pieType combination

So in this example, the duplicate records are:1&6, 2&3, and 4&5. 6&7 are NOT duplicates because the date is different, and &10 are NOT duplicates because of intervening record 9.

Of the duplicates, if only one record has non-NULL notes, only that one should be kept; otherwise only the first of the two should be kept.  So in this example, I want to delete records 2, 5, and 6 and keep all the rest.

How can I do this?

Thank you!
0
Comment
Question by:EffinGood
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 27

Accepted Solution

by:
Chris Luttrell earned 500 total points
ID: 38781343
Interesting problem and excellent work providing structure for the sample table and data to do a proof with.
I used a 2 part CTE to first assign a Row Number within each group of Location and Date values, ordering by Notes DESC then ID to put any with Notes first in the assignment.
Then in the second recursive part it starts with the first row of each group, and looks at the next rows in order and sets a flag if the Types are the same to mark those to remove.
It then uses the results of the CTE in the final delete statement joining back to the original table on ID and deleting thost flaged as duplicates.
This code worked on your sample data, but test all scenarios before trusting on real data.
CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

SELECT * FROM #tblPies TP;

;WITH TableWithRN AS (
SELECT TP.ID,
		TP.locationID,
		TP.dDate,
		TP.pieType,
		TP.notes, 
		ROW_NUMBER() OVER (PARTITION BY TP.locationID, TP.dDate ORDER BY TP.notes DESC, TP.ID) RN
FROM #tblPies TP
), 
RecursiveTableMarkingDups AS
(
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		0 AS DuplicateBit
FROM TableWithRN
WHERE TableWithRN.RN = 1 -- Start with first of series
UNION ALL
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		CASE WHEN RecursiveTableMarkingDups.pieType = TableWithRN.pieType THEN 1 ELSE 0 END AS DuplicateBit -- AssignDuplicateBit if Types are the same
FROM RecursiveTableMarkingDups
INNER JOIN TableWithRN 
		ON RecursiveTableMarkingDups.locationID = TableWithRN.locationID 
		AND RecursiveTableMarkingDups.dDate = TableWithRN.dDate 
		AND RecursiveTableMarkingDups.RN + 1 = TableWithRN.RN
)
-- switch to this select to see the results with flags set
--SELECT *
--FROM RecursiveTableMarkingDups
--ORDER BY RecursiveTableMarkingDups.ID
DELETE FROM tp
FROM #tblPies tP
INNER JOIN RecursiveTableMarkingDups ON tP.ID = RecursiveTableMarkingDups.ID
WHERE DuplicateBit = 1

-- see the results
SELECT *
FROM #tblPies TP

-- clean up
DROP TABLE #tblPies;

Open in new window

Yeilded this before and after of the table #tblPies
Delete_Dups_Problem_Before_and_After
0

Featured Post

Get HTML5 Certified

Want to be a web developer? You'll need to know HTML. Prepare for HTML5 certification by enrolling in July's Course of the Month! It's free for Premium Members, Team Accounts, and Qualified Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is basically a blog post I wrote recently. I've found that SARGability is poorly understood, and since many people don't read blogs, I figured I'd post it here as an article. SARGable is an adjective in SQL that means that an item can be fou…
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question