Solved

Delete duplicate records

Posted on 2013-01-15
1
288 Views
Last Modified: 2013-01-16
Hello Experts,

I am trying to clean up some sloppy data by deleting duplicate records from a table like this one:

CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

For my purposes, "duplicate" means that:
1. The locationID, dDate, and pieType are identical (notes can be different)
2. There are no intervening records for that locationID/dDate/pieType combination

So in this example, the duplicate records are:1&6, 2&3, and 4&5. 6&7 are NOT duplicates because the date is different, and &10 are NOT duplicates because of intervening record 9.

Of the duplicates, if only one record has non-NULL notes, only that one should be kept; otherwise only the first of the two should be kept.  So in this example, I want to delete records 2, 5, and 6 and keep all the rest.

How can I do this?

Thank you!
0
Comment
Question by:EffinGood
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 26

Accepted Solution

by:
Chris Luttrell earned 500 total points
ID: 38781343
Interesting problem and excellent work providing structure for the sample table and data to do a proof with.
I used a 2 part CTE to first assign a Row Number within each group of Location and Date values, ordering by Notes DESC then ID to put any with Notes first in the assignment.
Then in the second recursive part it starts with the first row of each group, and looks at the next rows in order and sets a flag if the Types are the same to mark those to remove.
It then uses the results of the CTE in the final delete statement joining back to the original table on ID and deleting thost flaged as duplicates.
This code worked on your sample data, but test all scenarios before trusting on real data.
CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

SELECT * FROM #tblPies TP;

;WITH TableWithRN AS (
SELECT TP.ID,
		TP.locationID,
		TP.dDate,
		TP.pieType,
		TP.notes, 
		ROW_NUMBER() OVER (PARTITION BY TP.locationID, TP.dDate ORDER BY TP.notes DESC, TP.ID) RN
FROM #tblPies TP
), 
RecursiveTableMarkingDups AS
(
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		0 AS DuplicateBit
FROM TableWithRN
WHERE TableWithRN.RN = 1 -- Start with first of series
UNION ALL
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		CASE WHEN RecursiveTableMarkingDups.pieType = TableWithRN.pieType THEN 1 ELSE 0 END AS DuplicateBit -- AssignDuplicateBit if Types are the same
FROM RecursiveTableMarkingDups
INNER JOIN TableWithRN 
		ON RecursiveTableMarkingDups.locationID = TableWithRN.locationID 
		AND RecursiveTableMarkingDups.dDate = TableWithRN.dDate 
		AND RecursiveTableMarkingDups.RN + 1 = TableWithRN.RN
)
-- switch to this select to see the results with flags set
--SELECT *
--FROM RecursiveTableMarkingDups
--ORDER BY RecursiveTableMarkingDups.ID
DELETE FROM tp
FROM #tblPies tP
INNER JOIN RecursiveTableMarkingDups ON tP.ID = RecursiveTableMarkingDups.ID
WHERE DuplicateBit = 1

-- see the results
SELECT *
FROM #tblPies TP

-- clean up
DROP TABLE #tblPies;

Open in new window

Yeilded this before and after of the table #tblPies
Delete_Dups_Problem_Before_and_After
0

Featured Post

Instantly Create Instructional Tutorials

Contextual Guidance at the moment of need helps your employees adopt to new software or processes instantly. Boost knowledge retention and employee engagement step-by-step with one easy solution.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

732 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question