SQL Update statement "apparently" duplicating rows

Posted on 2012-09-13
Last Modified: 2012-09-14
This has had me scratching my head for a while.

I have two tables ~50 columns x 350k+ rows in MSSQL Server 2000.

The two tables are almost identical in structure.

There are 3 important columns: The primary key column (“ABC_ID”), an “id” column (“id”) created by “others” and a datetime field with a getdate() default (“Created_On”). The remaining columns are a mix of char, decimal, text, datetime etc.

Table A contains “correct” data and Table B needs to be made a copy of Table A.

First I perform an Insert for all rows in Table A that are not in Table B – This works correctly.
Then I perform an Update for rows where the “id” columns are equal.

The behaviour that I am seeing is:

1.      After the Insert there are no duplicate rows with the same values for “id”
2.      After the Update there ARE duplicate rows with the same “id” but unique ABC_IDs
3.      The duplicate rows have the SAME datetime value in “Created_On”

So, can anyone suggest what’s going on here? Naturally, I don’t want these duplications. The duplicate rows look identical with the exception of the PK.

For completeness I should mention that I also perform a Delete for rows in Table B but missing from Table A. Although I have excluded this operation from having any influence on the observed behaviour.

If I have missed some vital info I apologise.

Thanks in advance!
Question by:Austrian_Des
    LVL 12

    Expert Comment

    It kind of sounds like your update statement is updating where ABC_IDs match in the two tables when it should be matching on ABC_IDs and "id".

    Author Comment

    Hi Jared,

    Thanks for your reply.

    The match is on the "id" for both tables. The ABC_ID only exists in Table B (I created it to help me track "when" the rows were being created. Also the "Created_On" field for a similar purpose.

    Could you clarify one part of your answer: Are you saying that it is possible for an incorrect join in an update statement to result in the creation of new rows? I'm interested to understand this as my understanding is that this wouldn't happen - but perhaps that's why I'm not an expert ;-)

    LVL 12

    Assisted Solution

    How are your row counts for Table B affected by the update?

    Can you post your update statement?
    LVL 67

    Accepted Solution

    You can't insert new rows by using an UPDATE, so you probably update more than one row per ID - though I have no idea why, without seeing your SQL.

    Your sequence (INSERT missing rows, UPDATE all rows with same ID) is ok if you only insert basic data (like the key columns). Otherwise you update too many rows, though nothing has changed, as you will update the new records in addition.

    Author Closing Comment

    Jared: Your question about row counts at first seemed irrelevant, but as I had hit a wall, I figured “Don’t ask a question and then ignore the help!” So, I started digging into this and the answers were very revealing. I truncated Table B and then ran the Insert code. The row count was 352992. I then checked for duplicate “id”s and there were none. I then ran the Update code and the row count was 352992 (the same) but now I had 500+ duplicate “id”s … How could this be?

    To determine duplicates I used the following code:

    SELECT     id, COUNT(id), field_x
    FROM         Table B
    GROUP BY id, field_x
    HAVING      (COUNT(id) > 1)

    For the sake of this example let’s say field_x is the invoice number. So my code was designed to show any invoice that contained duplicate “id”s. After the truncate, Insert process the above code returned Zero rows. After the update it showed 500+.

    What I had failed to realise was that in Table A there were 500+ duplicate “id”s but with DIFFERENT invoice numbers. After the Update the invoice numbers (field_x) had been modified to show the same number.

    Starting from scratch, I ran the truncate, Insert cycle and then used:

    SELECT     id, COUNT(id)
    FROM         Table B
    GROUP BY id
    HAVING      (COUNT(id) > 1)

    And sure enough, up came the duplicates!

    Qlemo: I’m glad that you were able to confirm that I can’t insert new rows using an Update. Also, I will modify my code to execute in the following sequence: Delete, Update, Insert – A good point about unnecessary updates! Finally, your comment, “you probably update more than one row per ID” was spot on, I just couldn’t see it!

    Hope you are both OK with a split of the points?

    Many thanks,


    Featured Post

    How to run any project with ease

    Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
    - Combine task lists, docs, spreadsheets, and chat in one
    - View and edit from mobile/offline
    - Cut down on emails

    Join & Write a Comment

    Let's review the features of new SQL Server 2012 (Denali CTP3). It listed as below: PERCENT_RANK(): PERCENT_RANK() function will returns the percentage value of rank of the values among its group. PERCENT_RANK() function value always in be…
    Introduction In my previous article ( I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
    Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
    Via a live example, show how to setup several different housekeeping processes for a SQL Server.

    728 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    20 Experts available now in Live!

    Get 1:1 Help Now