henrikatwork
asked on
Deleting duplicate entries based on three columns
Hi again,
I have a table, where entries (according to the source) are supposed to be uniquely identified by the columns Distrikt, Kund and ProdNr.
The script below is modified (taken from a post from scottpletcher). I wonder, does this script remove entries that have the same value on those three columns? When I run it, it seems to delete way too many entries (original table had 1271369 records (that are supposed to be somewhat OK) and only 21837 entries remain after this script has run. ProdNr is actually the ordernumber, but my client says it's not unique unless combined with district and customer (Distrikt, Kund).
Best regards,
henrik
DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT Distrikt, Kund,ProdNr, COUNT(*) AS numDups FROM tbakOrder
GROUP BY Distrikt, Kund,ProdNr HAVING COUNT(*) > 1 -- Change if needed
DECLARE @Distrikt int
DECLARE @Kund int
DECLARE @ProdNr int
DECLARE @numDups INT
OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @Distrikt, @Kund, @ProdNr, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
SET @numDups = @numDups - 1 --Chg if needed: delete all but [1] of the duplicates
SET ROWCOUNT @numDups
DELETE FROM tbakOrder
WHERE Distrikt = @Distrikt AND (Kund = @Kund AND ProdNr = @ProdNr)
FETCH NEXT FROM dupsCsr INTO @Distrikt, @Kund, @ProdNr, @numDups
END --WHILE
CLOSE dupsCsr
DEALLOCATE dupsCsr
SET ROWCOUNT 0 --restore default
I have a table, where entries (according to the source) are supposed to be uniquely identified by the columns Distrikt, Kund and ProdNr.
The script below is modified (taken from a post from scottpletcher). I wonder, does this script remove entries that have the same value on those three columns? When I run it, it seems to delete way too many entries (original table had 1271369 records (that are supposed to be somewhat OK) and only 21837 entries remain after this script has run. ProdNr is actually the ordernumber, but my client says it's not unique unless combined with district and customer (Distrikt, Kund).
Best regards,
henrik
DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT Distrikt, Kund,ProdNr, COUNT(*) AS numDups FROM tbakOrder
GROUP BY Distrikt, Kund,ProdNr HAVING COUNT(*) > 1 -- Change if needed
DECLARE @Distrikt int
DECLARE @Kund int
DECLARE @ProdNr int
DECLARE @numDups INT
OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @Distrikt, @Kund, @ProdNr, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
SET @numDups = @numDups - 1 --Chg if needed: delete all but [1] of the duplicates
SET ROWCOUNT @numDups
DELETE FROM tbakOrder
WHERE Distrikt = @Distrikt AND (Kund = @Kund AND ProdNr = @ProdNr)
FETCH NEXT FROM dupsCsr INTO @Distrikt, @Kund, @ProdNr, @numDups
END --WHILE
CLOSE dupsCsr
DEALLOCATE dupsCsr
SET ROWCOUNT 0 --restore default
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Hi
If you need to delete duplicated rows you can do something like that:
SELECT INTO tempTable DISTINCT Distrikt, Kund, ProdNr from yourTable
TRUNCATE table yourTable
INSERT INTO yourTable SELECT * FROM tempTable
If you need to delete duplicated rows you can do something like that:
SELECT INTO tempTable DISTINCT Distrikt, Kund, ProdNr from yourTable
TRUNCATE table yourTable
INSERT INTO yourTable SELECT * FROM tempTable
ASKER
Great - auke&steve I'll split those 125 pts btw you two.
I called my client and they now informed me that the uniqueness is only guaranteed when using 4 columns (thanks for telling me that now). Anyway, those three columns correspond to one order. Now I would like to give all orders in the system an ID, i.e. add an OrderID for each row in the table, with those 3 columns in common. Which is the most effective way of doing this?
cheers,
henrik
I called my client and they now informed me that the uniqueness is only guaranteed when using 4 columns (thanks for telling me that now). Anyway, those three columns correspond to one order. Now I would like to give all orders in the system an ID, i.e. add an OrderID for each row in the table, with those 3 columns in common. Which is the most effective way of doing this?
cheers,
henrik
when reading about the deleted records i didnt even think about the good records :)
my solution is more creative though :)
Steve