Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Delete duplicate records

Posted on 2013-01-15
1
Medium Priority
?
290 Views
Last Modified: 2013-01-16
Hello Experts,

I am trying to clean up some sloppy data by deleting duplicate records from a table like this one:

CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

For my purposes, "duplicate" means that:
1. The locationID, dDate, and pieType are identical (notes can be different)
2. There are no intervening records for that locationID/dDate/pieType combination

So in this example, the duplicate records are:1&6, 2&3, and 4&5. 6&7 are NOT duplicates because the date is different, and &10 are NOT duplicates because of intervening record 9.

Of the duplicates, if only one record has non-NULL notes, only that one should be kept; otherwise only the first of the two should be kept.  So in this example, I want to delete records 2, 5, and 6 and keep all the rest.

How can I do this?

Thank you!
0
Comment
Question by:EffinGood
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 27

Accepted Solution

by:
Chris Luttrell earned 2000 total points
ID: 38781343
Interesting problem and excellent work providing structure for the sample table and data to do a proof with.
I used a 2 part CTE to first assign a Row Number within each group of Location and Date values, ordering by Notes DESC then ID to put any with Notes first in the assignment.
Then in the second recursive part it starts with the first row of each group, and looks at the next rows in order and sets a flag if the Types are the same to mark those to remove.
It then uses the results of the CTE in the final delete statement joining back to the original table on ID and deleting thost flaged as duplicates.
This code worked on your sample data, but test all scenarios before trusting on real data.
CREATE TABLE #tblPies
(ID INT NOT NULL,
locationID INT NOT NULL,
dDate datetime NOT NULL,
pieType varchar(20) NOT NULL,
notes varchar(50))

INSERT INTO #tblPies
VALUES (1,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (2,1002,'1-Jan-2013','Apple',NULL)
INSERT INTO #tblPies
VALUES (3,1002,'1-Jan-2013','Apple','notes')
INSERT INTO #tblPies
VALUES (4,1003,'1-Jan-2013','Cherry', 'notes')
INSERT INTO #tblPies
VALUES (5,1003,'1-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (6,1001,'1-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (7,1001,'2-Jan-2013','Apple', NULL)
INSERT INTO #tblPies
VALUES (8,1002,'2-Jan-2013','Apple', 'notes')
INSERT INTO #tblPies
VALUES (9,1002,'2-Jan-2013','Cherry', NULL)
INSERT INTO #tblPies
VALUES (10,1002,'2-Jan-2013','Apple', NULL)

SELECT * FROM #tblPies TP;

;WITH TableWithRN AS (
SELECT TP.ID,
		TP.locationID,
		TP.dDate,
		TP.pieType,
		TP.notes, 
		ROW_NUMBER() OVER (PARTITION BY TP.locationID, TP.dDate ORDER BY TP.notes DESC, TP.ID) RN
FROM #tblPies TP
), 
RecursiveTableMarkingDups AS
(
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		0 AS DuplicateBit
FROM TableWithRN
WHERE TableWithRN.RN = 1 -- Start with first of series
UNION ALL
SELECT TableWithRN.ID,
		TableWithRN.locationID,
		TableWithRN.dDate,
		TableWithRN.pieType,
		TableWithRN.notes,
		TableWithRN.RN,
		CASE WHEN RecursiveTableMarkingDups.pieType = TableWithRN.pieType THEN 1 ELSE 0 END AS DuplicateBit -- AssignDuplicateBit if Types are the same
FROM RecursiveTableMarkingDups
INNER JOIN TableWithRN 
		ON RecursiveTableMarkingDups.locationID = TableWithRN.locationID 
		AND RecursiveTableMarkingDups.dDate = TableWithRN.dDate 
		AND RecursiveTableMarkingDups.RN + 1 = TableWithRN.RN
)
-- switch to this select to see the results with flags set
--SELECT *
--FROM RecursiveTableMarkingDups
--ORDER BY RecursiveTableMarkingDups.ID
DELETE FROM tp
FROM #tblPies tP
INNER JOIN RecursiveTableMarkingDups ON tP.ID = RecursiveTableMarkingDups.ID
WHERE DuplicateBit = 1

-- see the results
SELECT *
FROM #tblPies TP

-- clean up
DROP TABLE #tblPies;

Open in new window

Yeilded this before and after of the table #tblPies
Delete_Dups_Problem_Before_and_After
0

Featured Post

Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There have been several questions about Large Transaction Log Files in SQL Server 2008, and how to get rid of them when disk space has become critical. This article will explain how to disable full recovery and implement simple recovery that carries…
Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…

704 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question