I have a very simple import in SQL from table1 to table2 where table2 is an identical copy of table1 (by structure) and table2 has 0 rows in it. Table1 has about 225 million rows. It's just a straight 1 -> 1 import using the SSIS wizard right within SQL. No joins or lookups, etc. I ran this and checked on the progress a few hours later and SSIS told me it had inserted 350 million rows, and it was still going. Thinking this must be a mistake I stopped the SSIS package, did an update statistics on both tables and then a count query. Sure enough SSIS had transferred 350 million rows to the new table and there were only 225 million in the source.
Thinking again I must have made a mistake, I truncated the data in table2 and re-ran it and replicated the scenario. The only difference between the two tables is that on table2, I only created the clustered index on the table first, and there are several non-clustered indexes I was going to defer creating until after the data had been all inserted. I can't imagine how that would affect this though.
The table is huge so I haven't begun analyzing what the dupes are - but how in the world could this be possible?