Solved

Removing Duplicate Rows

Posted on 2004-04-13
19
551 Views
Last Modified: 2008-02-01
Using SQL Server 2000...

I have a table that is getting some duplicate records.  I found the following code to remove dupes at msn support:

SELECT DISTINCT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

   DELETE edata_test
      WHERE TicketNumber
      IN (SELECT TicketNumber
             FROM edata_dupe)

   INSERT edata_test
      SELECT *
         FROM edata_dupe

     DROP TABLE edata_dupe

When I run it, I get the following error messages:

Server: Msg 213, Level 16, State 4, Line 12
Insert Error: Column name or number of supplied values does not match table definition.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.DateIssued' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.TravelDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Traveler' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.PNRLocator' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.IATA' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Endorsements' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.NameData' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.AccountNbr' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.AgentSine1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.AgentSine2' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Col012' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Airlines' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Branch' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Status' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.DateUpdated' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.ExpirationDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Email' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.DateAdded' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.


Can anyone tell me how to fix this?
0
Comment
Question by:forrest321
  • 4
  • 4
  • 4
  • +5
19 Comments
 
LVL 17

Expert Comment

by:walterecook
ID: 10814406
SELECT DISTINCT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

This error:
Insert Error: Column name or number of supplied values does not match table definition.
Is because the fields returned by * do not match your INTO table

Column 'edata_test.DateIssued' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
These error are because you have to put all the fields returned by * in your group by clause.

Hope that helps
Walt
0
 
LVL 17

Expert Comment

by:walterecook
ID: 10814417
So if * returns
field1, fielda, someOtherField
in your results, your GROUP BY must contain those

GROUP BY field1, fielda, someotherField

Make sense?

Walt
0
 
LVL 6

Expert Comment

by:lausz
ID: 10814513
Try this, but you are deleting all the records when count(ticketnumber )>1

SELECT TICKETNUMBER
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

   DELETE edata_test
      WHERE TicketNumber
      IN (SELECT TicketNumber
             FROM edata_dupe)

   INSERT edata_test
      SELECT *
         FROM edata_dupe

     DROP TABLE edata_dupe
0
 
LVL 2

Author Comment

by:forrest321
ID: 10814562
The problem with that is that I need all of the columns to be reinserted into the original table after the duped rows are deleted.  That method would only reinsert the TicketNumber.
0
 
LVL 6

Expert Comment

by:lausz
ID: 10815922
Ok, you have to explain each field in the select ... And take care because you are deletin not only dup rows, you are deleting all (when the count(*)>1)

You can solve this problem adding this query ...


SELECT TICKETNUMBER , field1, field2, etc
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

   DELETE edata_test
      WHERE TicketNumber
      IN (SELECT TicketNumber
             FROM edata_dupe)
+++++++++++++++++++++

insert into edata_test
select field1,field2, max(some field to obtaine only one record for each ticketnumber)
from edata_dupe
group by field1, field2, etc

+++++++++++++++++++++

   INSERT edata_test
      SELECT *
         FROM edata_dupe

     DROP TABLE edata_dupe
0
 
LVL 7

Expert Comment

by:ChrisFretwell
ID: 10817881
If the entire row, not just ticketnumber is duplicated then you can do it much simpler....

select distinct {field list here}
      INTO edata_dupe
      FROM edata_test

drop table edata_test

exec sp_rename 'edata_dupe', 'edata_test'


The select distinct would of course return only distinct rows, then you have a clean dup-less table. drop your original and rename the new one.
0
 
LVL 8

Expert Comment

by:annamalai77
ID: 10819966
hi forrest

i would suggest that u delete the duplicate rows rather than re-inserting the distinct records. the below query will delete the duplicate rows from the table.

there's one problem with the delete, if the no.of records in the table is very high then it will take a longer time to delete
in that case u can go for cursors.

i will give the query in both ways.

<-- SINGLE QUERY METHOD -->
delete from edata_test a
where a.rowid < (select max(b.rowid)
                          from edata_test b
                          where a.ticketnumber = b.ticketnumber);

commit;

<-- USING CURSOR METHOD-->
declare
cursor c1 is
  select ticketnumber, max(rowid) rid
  from edata_test
  group by ticketnumber
  having count(*) > 1;
begin
  for i in c1
  loop
    delete from edata_test
    where ticketnumber = i.ticketnumber
    and   rowid != i.rid;
    commit;
end loop;
end;
/

the cursor method will run faster since there's a commit after every record;

regards
annamalai
0
 
LVL 18

Expert Comment

by:diasroshan
ID: 10820954
hi,
why don't u try the following to delete duplicate rows from a table...

As there is no such RowID type in Sql Server like Oracle, so you have to create an Identity Column with other fields as under:-

Create Table myTable
(myID int identity,
name varchar(10),
add1 varchar(10),
add2 varchar(10))

go


Further, if in case of sql server, you don't have identity column in your table, then you can add it without deleting the existing data with the following sql, before running the above Delete Query.

alter table mytable
add myid int identity


Insert into myTable
values('Jacky','Room 25','Drig Road')

go

Insert into myTable
values('Jack2','Room 29','Drig Road')

go

Insert into myTable
values('Jacky','Room 25','Drig Road')

go

Delete from myTable
where myID not in
(select min(myID) from myTable
group by name,add1,add2)

go


...need any further clarification ... do ask...


Cheers,
Rosh
0
 
LVL 26

Expert Comment

by:Hilaire
ID: 10824409
Can you try this

declare c1 cursor for SELECT TicketNumber, count(*) -1 as ToBeRemoved
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(*) > 1
declare @ticketnumber int, @removecnt int
open c1
fetch c1 into @ticketnumber, @removecnt
while @@fetch_status = 0
begin
    set rowcount @removecnt
    delete edata_test where ticketnumber = @ticketnumber
    fetch c1 into @ticketnumber, @removecnt
end
close c1
deallocate c1
set rowcount 0


HTH

Hilaire

0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Expert Comment

by:jacobymatt
ID: 10901066
Okay...I'm picking up this problem from here.  In the edata_test table, I created a copy of 2 different records.  With one of the records, I left the copy as a complete duplicate.  With the other copy, I changed one field in the row (not the TicketNumber) and left the rest the same as the original.  I did this because there may be times when the we get records with one field that is slightly different but all we care about is that the ticket number is the same and one of them needs to be removed, it doesnt matter which.  All of the errors that were posted by Forrest321 were occuring in the first portion of the statement:

SELECT DISTINCT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

So to fix the errors I used some suggestions mentioned above and now the statement looks like

SELECT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY DateIssued, TicketNumber, TravelDate, Traveler, PNRLocator, IATA, Endorsements, NameData, AccountNbr, AgentSine1, AgentSine2, Col012, Airlines, Branch, Status, DateUpdated, ExpirationDate, Email, id, DateAdded  
      HAVING COUNT(TicketNumber) > 1

Now all the fields are listed in the Group By clause.  This command is completing without errors and it copies one instance of the identical pair of records into the edata_dupe table but thats all.  There are 2 problems I cant understand.  One is that since I am no longer using disinct in my select than I thought this statement would copy both instances of the identical pair of records into edata_dupe, but it doesnt.  Also, the record that I changed only one field in still has the same ticket number so why wouldn't that pair of nearly duplicate records also be copied into edata_dupe?  So edata_dupe should contain 4 records as far as I can see, but I only have one.  Please help me.


0
 
LVL 2

Author Comment

by:forrest321
ID: 10901144
jacobymatt has taken over this project for me....I will award points at his request.

Thanks for the help!
0
 
LVL 2

Author Comment

by:forrest321
ID: 10928771
Okay...I'm picking up this problem from here.  In the edata_test table, I created a copy of 2 different records.  With one of the records, I left the copy as a complete duplicate.  With the other copy, I changed one field in the row (not the TicketNumber) and left the rest the same as the original.  I did this because there may be times when the we get records with one field that is slightly different but all we care about is that the ticket number is the same and one of them needs to be removed, it doesnt matter which.  All of the errors that were posted by Forrest321 were occuring in the first portion of the statement:

SELECT DISTINCT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

So to fix the errors I used some suggestions mentioned above and now the statement looks like

SELECT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY DateIssued, TicketNumber, TravelDate, Traveler, PNRLocator, IATA, Endorsements, NameData, AccountNbr, AgentSine1, AgentSine2, Col012, Airlines, Branch, Status, DateUpdated, ExpirationDate, Email, id, DateAdded  
      HAVING COUNT(TicketNumber) > 1

Now all the fields are listed in the Group By clause.  This command is completing without errors and it copies one instance of the identical pair of records into the edata_dupe table but thats all.  There are 2 problems I cant understand.  One is that since I am no longer using disinct in my select than I thought this statement would copy both instances of the identical pair of records into edata_dupe, but it doesnt.  Also, the record that I changed only one field in still has the same ticket number so why wouldn't that pair of nearly duplicate records also be copied into edata_dupe?  So edata_dupe should contain 4 records as far as I can see, but I only have one.  Please help me.
0
 
LVL 26

Expert Comment

by:Hilaire
ID: 10928974
You're going nowhere with this code
DISTINCT and GROUP BY are exactly the same in the above statement.

"Nearly equal" does not exist in SQL : either it's equal, or not
So if you group by all the fields, you'll get no duplicates unless ALL the fields are equal

The HAVING Clause filters records than have at least one dup

run the following code, it will help you figure out

SELECT count(*) as #occurences , *
FROM edata_test
GROUP BY DateIssued, TicketNumber, TravelDate, Traveler, PNRLocator, IATA, Endorsements, NameData, AccountNbr, AgentSine1, AgentSine2, Col012, Airlines, Branch, Status, DateUpdated, ExpirationDate, Email, id, DateAdded  

The "Having count(*) > 1" just removes the records where #occurences is <= 1

If you just need to make the TicketNumber UNIQUE (ie keep onlly one record per ticketnumber),
just run the code I posted above and you'll get a clean table

Hilaire
0
 

Expert Comment

by:jacobymatt
ID: 10933632
I tried to run the above statement in Edata_test using SQL Query Analyzer and all it did was return all of my records along with an extra field called Occurrences.  On every record the occurence is 1.  How do I modify this to get the records where the occurence is two.  There has got to be a way to remove the dupes from tables.
0
 
LVL 26

Expert Comment

by:Hilaire
ID: 10935950
Did you try the code in my first post ?
Paste the following code and run it in query analyser and it will remove duplicates, leaving one row per distinct TicketNumber

declare c1 cursor for SELECT TicketNumber, count(*) -1 as ToBeRemoved
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(*) > 1
declare @ticketnumber int, @removecnt int
open c1
fetch c1 into @ticketnumber, @removecnt
while @@fetch_status = 0
begin
    set rowcount @removecnt
    delete edata_test where ticketnumber = @ticketnumber
    fetch c1 into @ticketnumber, @removecnt
end
close c1
deallocate c1
set rowcount 0
0
 

Expert Comment

by:jacobymatt
ID: 10939396
Yes I tried that code as well and this is the error I got:

Server: Msg 248, Level 16, State 1, Line 7
The conversion of the varchar value '0017513606834' overflowed an int column. Maximum integer value exceeded.

0
 
LVL 26

Accepted Solution

by:
Hilaire earned 150 total points
ID: 10939580
OK, can you try this one now ?
(changed datatype of my @TicketNumber variable to fit your datatypes)

declare c1 cursor for SELECT TicketNumber, count(*) -1 as ToBeRemoved
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(*) > 1
declare @ticketnumber varchar(50), @removecnt int
open c1
fetch c1 into @ticketnumber, @removecnt
while @@fetch_status = 0
begin
    set rowcount @removecnt
    delete edata_test where ticketnumber = @ticketnumber
    fetch c1 into @ticketnumber, @removecnt
end
close c1
deallocate c1
set rowcount 0
0
 

Expert Comment

by:jacobymatt
ID: 10940035
Wow!  It looks like you found the answer.  I have no idea how any of it works but it seems to do the trick.  I'm telling Forrest321 to award points to you.  Thanks a bunch!
0
 
LVL 2

Author Comment

by:forrest321
ID: 10940047
Thanks for the help everyone!
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Database tuning – How to start and what to tune. This question is frequently asked by many people, both online and offline. There is no hard and fast rule-of-thumb for performance tuning, however, before beginning the tuning process one should a…
CCModeler offers a way to enter basic information like entities, attributes and relationships and export them as yEd or erviz diagram. It also can import existing Access or SQL Server tables with relationships.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now