Need to remove duplicated records that do have unique id

I put in an archive function a few days ago that had a join problem. I ended up with some duplicated records that I need to purge out. There is a unique ID field and the CreateDt is different for each record that is duplicated. There should be just one record for each sorter,wave,order,chute combo. The attached file shows 38 rows of sample data. There should only be 3 rows. What is the best way to save the earliest record and delete the others that were duplicated?
OrderData.xlsx
coperations07Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jan LouwerensSoftware EngineerCommented:
This should give you the IDs you want to delete.

SELECT ID FROM
(
   SELECT ID, ROW_NUMBER() OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY ID) AS Idx FROM <table_name>
WHERE
   Idx > 1

Open in new window

0
Dustin SaundersDirector of OperationsCommented:
Be sure to make a backup before you start cleaning up data, but you can delete the duplicates with:
DELETE DUPES
FROM 
(
	SELECT ID, ROW_NUMBER() OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY ID) AS Idx FROM <table_name>
) DUPES
WHERE DUPES.Idx > 1

Open in new window


And you can preview it with the code Jan posted above if you want to see what will be deleted first.
SELECT ID, ROW_NUMBER() OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY ID) AS Idx FROM <table_name> AS Idx
WHERE
   Idx > 1

Open in new window

0
coperations07Author Commented:
Thanks guys! I will give this a try.
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

coperations07Author Commented:
I've tried

    SELECT  ID, ROW_NUMBER() OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY ID) AS Idx FROM db_sort00.dbo.tbl_arch_seq_header AS Idx
WHERE
   Idx > 1

It is telling me Idx is an invalid column. I've tried a few different things to tweak it but haven't got it to work for me yet.
0
Jan LouwerensSoftware EngineerCommented:
You have 2 things named "AS Idx".

SELECT  ID, ROW_NUMBER() OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY ID) AS Idx FROM db_sort00.dbo.tbl_arch_seq_header
WHERE
   Idx > 1

Open in new window


Looking back, it looks like the typo originated in my previous comment. Sorry about that.
0
coperations07Author Commented:
I've tried again today with the edit, but Idx is still not recognized as a column.

    SELECT  ID, (ROW_NUMBER() OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY ID)) AS Idx 
    FROM db_sort00.dbo.tbl_arch_seq_header
	WHERE Idx > 1

Open in new window


I was able to get this to return results. I haven't deleted anything yet though:

;WITH CTE
            AS (
    SELECT ID,
                Row_number()OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY (SELECT 1) ) AS Rn
        FROM   db_sort00.dbo.tbl_arch_seq_header)
       SELECT *
    FROM   CTE
    WHERE  Rn > 1

Open in new window

0
Jan LouwerensSoftware EngineerCommented:
The Idx column has to be defined within the inner select, and then filtered in an outer select.

SELECT ID FROM
(
   SELECT  ID, ROW_NUMBER() OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY ID) AS Idx  FROM db_sort00.dbo.tbl_arch_seq_header
)
WHERE
   Idx > 1

Open in new window

0
coperations07Author Commented:
The Idx column is still not being recognized for some reason.

I ended up getting the dupes cleared up with this:

 --Find Duplicates   
    ;WITH CTE
            AS (
    SELECT *,
                Row_number()OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY (SELECT 1) ) AS Rn
        FROM   db_sort00.dbo.tbl_arch_seq_header)
    SELECT * FROM CTE 
    WHERE  Rn > 1
    
--Delete Duplicates
;WITH CTE
            AS (
    SELECT *,
                Row_number()OVER (PARTITION BY SorterID, WaveID, OrderID, chute ORDER BY (SELECT 1) ) AS Rn
        FROM   db_sort00.dbo.tbl_arch_seq_header)
    DELETE FROM CTE WHERE  Rn > 1
    order by ID desc

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
coperations07Author Commented:
I accepted my own solution because it is what ended up working for me.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
SQL

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.