Link to home
Start Free TrialLog in
Avatar of henrikatwork
henrikatworkFlag for Sweden

asked on

Deleting duplicate entries based on three columns

Hi again,

I have a table, where entries (according to the source) are supposed to be uniquely identified by the columns Distrikt, Kund and ProdNr.

The script below is modified (taken from a post from scottpletcher). I wonder, does this script remove entries that have the same value on those three columns? When I run it, it seems to delete way too many entries (original table had 1271369 records (that are supposed to be somewhat OK) and only 21837 entries remain after this script has run. ProdNr is actually the ordernumber, but my client says it's not unique unless combined with district and customer (Distrikt, Kund).

Best regards,

henrik


DECLARE dupsCsr CURSOR READ_ONLY FOR
      SELECT Distrikt, Kund,ProdNr, COUNT(*) AS numDups FROM tbakOrder
      GROUP BY Distrikt, Kund,ProdNr HAVING COUNT(*) > 1  -- Change if needed

DECLARE @Distrikt int
DECLARE @Kund int
DECLARE @ProdNr int
DECLARE @numDups INT

OPEN dupsCsr
FETCH NEXT FROM dupsCsr INTO @Distrikt, @Kund, @ProdNr, @numDups
WHILE @@FETCH_STATUS = 0
BEGIN
     SET @numDups = @numDups - 1 --Chg if needed: delete all but [1] of the duplicates
     SET ROWCOUNT @numDups
     DELETE FROM tbakOrder
     WHERE Distrikt = @Distrikt AND (Kund = @Kund AND ProdNr = @ProdNr)
     FETCH NEXT FROM dupsCsr INTO @Distrikt, @Kund, @ProdNr, @numDups
END --WHILE
CLOSE dupsCsr
DEALLOCATE dupsCsr

SET ROWCOUNT 0 --restore default
SOLUTION
Avatar of stevetheski
stevetheski
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Avatar of auke_t
auke_t
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
damn i must be tired auke_t
when reading about the deleted records i didnt even think about the good records :)
my solution is more creative though :)

Steve
Avatar of worldserga
worldserga

Hi

If you need to delete duplicated rows you can do something like that:

SELECT INTO tempTable DISTINCT Distrikt, Kund, ProdNr from yourTable
TRUNCATE table yourTable
INSERT INTO yourTable SELECT * FROM tempTable
Avatar of henrikatwork

ASKER

Great - auke&steve I'll split those 125 pts btw you two.

I called my client and they now informed me that the uniqueness is only guaranteed when using 4 columns (thanks for telling me that now). Anyway, those three columns correspond to one order. Now I would like to give all orders in the system an ID, i.e. add an OrderID for each row in the table, with those 3 columns in common. Which is the most effective way of doing this?

cheers,

henrik