asked on

Deleting duplicate entries based on three columns

Hi again,

I have a table, where entries (according to the source) are supposed to be uniquely identified by the columns Distrikt, Kund and ProdNr.

The script below is modified (taken from a post from scottpletcher). I wonder, does this script remove entries that have the same value on those three columns? When I run it, it seems to delete way too many entries (original table had 1271369 records (that are supposed to be somewhat OK) and only 21837 entries remain after this script has run. ProdNr is actually the ordernumber, but my client says it's not unique unless combined with district and customer (Distrikt, Kund).

Best regards,

henrik

DECLARE dupsCsr CURSOR READ_ONLY FOR
SELECT Distrikt, Kund,ProdNr, COUNT(*) AS numDups FROM tbakOrder
GROUP BY Distrikt, Kund,ProdNr HAVING COUNT(*) > 1 -- Change if needed

DECLARE @Distrikt int
DECLARE @Kund int
DECLARE @ProdNr int
DECLARE @numDups INT

OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @Distrikt, @Kund, @ProdNr, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
SET @numDups = @numDups - 1 --Chg if needed: delete all but [1] of the duplicates
SET ROWCOUNT @numDups
DELETE FROM tbakOrder
WHERE Distrikt = @Distrikt AND (Kund = @Kund AND ProdNr = @ProdNr)
FETCH NEXT FROM dupsCsr INTO @Distrikt, @Kund, @ProdNr, @numDups
END --WHILE
CLOSE dupsCsr
DEALLOCATE dupsCsr

SET ROWCOUNT 0 --restore default

SOLUTION

stevetheski

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ASKER CERTIFIED SOLUTION

auke_t

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

stevetheski

damn i must be tired auke_t
when reading about the deleted records i didnt even think about the good records :)
my solution is more creative though :)

Steve

worldserga

Hi

If you need to delete duplicated rows you can do something like that:

SELECT INTO tempTable DISTINCT Distrikt, Kund, ProdNr from yourTable
TRUNCATE table yourTable
INSERT INTO yourTable SELECT * FROM tempTable

henrikatwork

ASKER

Great - auke&steve I'll split those 125 pts btw you two.

I called my client and they now informed me that the uniqueness is only guaranteed when using 4 columns (thanks for telling me that now). Anyway, those three columns correspond to one order. Now I would like to give all orders in the system an ID, i.e. add an OrderID for each row in the table, with those 3 columns in common. Which is the most effective way of doing this?

cheers,

henrik