[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 227
  • Last Modified:

Help with Duplicate Stored Procedure

Two of my stored procedures are not properly removing and updating records for duplicates.  

The first stored procedure, [dbo].[stp_CleanAsrun], deals with data that has the following fields from a .csv file:

Production, Media, Schedule_Date, Date Inserted

The process should run as follows:

Data is pulled from the CSV files within the dtsx package and inserted within the tmp_logs_asrun table.  The stored procedure:

INSERT INTO TempDupeAsrun (Title, PrdNumber, dateInserted, schedule_date)
SELECT Title, PrdNumber, dateInserted, schedule_date
from 
(
select *, row_number() over (partition by PrdNumber order by Schedule_date desc) rn from tmp_logs_vciasrun
) a
where rn > 1


DELETE FROM tmp_logs_vciasrun 
FROM tmp_logs_vciasrun
INNER JOIN TempDupeAsrun
ON tmp_logs_vciasrun.PrdNumber= TempDupeAsrun.PrdNumber
AND 
tmp_logs_vciasrun.Schedule_date= TempDupeAsrun.Schedule_date
WHERE tmp_logs_vciasrun.Schedule_date IN
	(select max(schedule_date) from tmp_logs_vciasrun group by prdNumber) 

Open in new window



1) Insert duplicate data into a temporary table called tempdupeasrun based on the duplicate records from the table tmp_logs_vciasrun.  The duplicates are based on schedule date and media.  If the schedule date is 3/7/2009 and the media number is JCKA0904, this record should appear only once in the tmp_logs_vciasrun table.

2) Duplicates are then removed from the tmp_logs_vciasrun  with a join on the TempDupeAsrun table and based on Schedule_Date and Media Number.

2) Duplicate data will remain in the tempdupedata table for future review

However, after checking my process, I notice that it is still not correct.  I have an sample data set below:

This is wrong:

Production      Media            Schedule_Date      Date Inserted
SOUND CHECK       JCKA0904       03/07/2009       5/11/2011 5:27:00 PM
SOUND CHECK       JCKA0904       03/07/2009       5/11/2011 5:27:00 PM
SOUND CHECK       JCKA0904       03/07/2009       5/11/2011 5:27:00 PM

SOUND CHECK       JCKA0904       04/25/2009       5/11/2011 5:27:00 PM
SOUND CHECK       JCKA0904       04/25/2009       5/11/2011 5:27:00 PM
SOUND CHECK       JCKA0904       04/25/2009       5/11/2011 5:27:00 PM
SOUND CHECK       JCKA0904       07/25/2009       5/11/2011 5:27:00 PM

The data should appear as the following:



Production      Media            Schedule_Date      Date Inserted
SOUND CHECK       JCKA0904       03/07/2009       5/11/2011 5:27:00 PM
SOUND CHECK       JCKA0904       04/25/2009       5/11/2011 5:27:00 PM
SOUND CHECK       JCKA0904       07/25/2009       5/11/2011 5:27:00 PM

What am I doing wrong in my stored procedure?
0
ayoZen
Asked:
ayoZen
  • 5
  • 4
  • 2
1 Solution
 
lcohanDatabase AnalystCommented:
Can you check the data type on the following columns as I think they are not unique data just because of the time portion of it not showing in your question:
Schedule_Date      Date Inserted
This is why they show on your list and you would have to drop the time portion by doing something like


select DATEADD(dd, DATEDIFF(dd,0,getdate()),0);
--vs.
select getdate();
0
 
lcohanDatabase AnalystCommented:
If that's not the case you could simply try

....SELECT DISTINCT Title, PrdNumber, dateInserted, schedule_date....

instead of

SELECT Title, PrdNumber, dateInserted, schedule_date




0
 
ayoZenAuthor Commented:
Hello Icohan,

Thank you for your reply.  I think the distinct selection will work.  I will do the following then:

SELECT DISTINCT Title, PrdNumber, schedule_date from tmp_logs_vciasrun

so, my completed duplicate remover stored procedure would be:

ALTER PROCEDURE [dbo].[stp_CleanAsrun]
	
AS
BEGIN
	SET NOCOUNT ON;

INSERT INTO TempDupeAsrun (Title, PrdNumber, schedule_date, dateInserted)
SELECT DISTINCT Title, PrdNumber, schedule_date, dateInserted
from 
(
select *, row_number() over (partition by PrdNumber order by Schedule_date desc) rn from tmp_logs_vciasrun
) a
where rn > 1


DELETE FROM tmp_logs_vciasrun 
FROM tmp_logs_vciasrun
INNER JOIN TempDupeAsrun
ON tmp_logs_vciasrun.PrdNumber= TempDupeAsrun.PrdNumber
AND 
tmp_logs_vciasrun.Schedule_date= TempDupeAsrun.Schedule_date
WHERE tmp_logs_vciasrun.Schedule_date IN
	(select max(schedule_date) from tmp_logs_vciasrun group by prdNumber) 

END
GO

Open in new window



can you tell me if this is correct?


0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
SharathData EngineerCommented:
No need to insert into another temp table. You can delete using CTE.
;with CTE as (
select *, row_number() over (partition by PrdNumber,DATEADD(dd,0,DATEDIFF(DD,0,Schedule_date)) order by Schedule_date desc) rn 
  from tmp_logs_vciasrun)
delete from CTE where rn > 1

Open in new window

0
 
ayoZenAuthor Commented:


My updated stored procedure to remove duplicates, is not properly removing duplicates.  I am still getting:

Production      Media            Schedule_Date      Date Inserted
SOUND CHECK       JCKA0904       03/07/2009       5/12/2011 5:27:00 PM
SOUND CHECK       JCKA0904       03/07/2009       5/12/2011 5:27:00 PM
SOUND CHECK       JCKA0904       03/07/2009       5/12/2011 5:27:00 PM

SOUND CHECK       JCKA0904       04/25/2009       5/12/2011 5:27:00 PM
SOUND CHECK       JCKA0904       04/25/2009       5/12/2011 5:27:00 PM
SOUND CHECK       JCKA0904       04/25/2009       5/12/2011 5:27:00 PM
SOUND CHECK       JCKA0904       07/25/2009       5/12/2011 5:27:00 PM

My updated stored procedure is:

INSERT INTO TempDupeAsrun (Title, PrdNumber, schedule_date, dateInserted)
SELECT DISTINCT Title, PrdNumber, schedule_date, dateInserted
from 
(
select *, row_number() over (partition by PrdNumber order by Schedule_date desc) rn from tmp_logs_vciasrun
) a
where rn > 1


DELETE FROM tmp_logs_vciasrun 
FROM tmp_logs_vciasrun
INNER JOIN TempDupeAsrun
ON tmp_logs_vciasrun.PrdNumber= TempDupeAsrun.PrdNumber
AND 
tmp_logs_vciasrun.Schedule_date= TempDupeAsrun.Schedule_date
WHERE tmp_logs_vciasrun.Schedule_date IN
	(select max(schedule_date) from tmp_logs_vciasrun group by prdNumber) 

Open in new window


I think my issue may be with the latter part of the procedure, but I am not sure.
0
 
SharathData EngineerCommented:
Did you try my suggestion?
0
 
ayoZenAuthor Commented:
Yes,  I tried the following, but bigger problems of duplicity surfaced.

ALTER PROCEDURE [dbo].[stp_CleanAsrun]
	
AS
BEGIN
	SET NOCOUNT ON;

;with CTE as (
select *, row_number() over (partition by PrdNumber,DATEADD(dd,0,DATEDIFF(DD,0,Schedule_date)) order by Schedule_date desc) rn 
  from tmp_logs_vciasrun)
delete from CTE where rn > 1

END 

GO

Open in new window

0
 
SharathData EngineerCommented:
My bad, it should be this.
;with CTE as (
select *, row_number() over (partition by Media,DATEADD(dd,0,DATEDIFF(DD,0,Schedule_date)) order by Schedule_date desc) rn 
  from tmp_logs_vciasrun)
delete from CTE where rn > 1

Open in new window

0
 
ayoZenAuthor Commented:
Hello Sharath_123,

The update seems to work, but could you explain it?  Is CTE a temporary table that is created?
0
 
SharathData EngineerCommented:
CTE is Common Table Expression and used as a temp table here. If you perform delete on CTE, it deletes the records from the underlying table.
0
 
ayoZenAuthor Commented:
Thank you!
0

Featured Post

Microsoft Certification Exam 74-409

VeeamĀ® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

  • 5
  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now