[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Removing Duplicate Rows

Posted on 2004-04-13
19
Medium Priority
?
569 Views
Last Modified: 2008-02-01
Using SQL Server 2000...

I have a table that is getting some duplicate records.  I found the following code to remove dupes at msn support:

SELECT DISTINCT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

   DELETE edata_test
      WHERE TicketNumber
      IN (SELECT TicketNumber
             FROM edata_dupe)

   INSERT edata_test
      SELECT *
         FROM edata_dupe

     DROP TABLE edata_dupe

When I run it, I get the following error messages:

Server: Msg 213, Level 16, State 4, Line 12
Insert Error: Column name or number of supplied values does not match table definition.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.DateIssued' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.TravelDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Traveler' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.PNRLocator' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.IATA' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Endorsements' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.NameData' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.AccountNbr' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.AgentSine1' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.AgentSine2' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Col012' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Airlines' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Branch' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Status' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.DateUpdated' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.ExpirationDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.Email' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Server: Msg 8120, Level 16, State 1, Line 1
Column 'edata_test.DateAdded' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.


Can anyone tell me how to fix this?
0
Comment
Question by:forrest321
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 4
  • +5
19 Comments
 
LVL 17

Expert Comment

by:walterecook
ID: 10814406
SELECT DISTINCT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

This error:
Insert Error: Column name or number of supplied values does not match table definition.
Is because the fields returned by * do not match your INTO table

Column 'edata_test.DateIssued' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
These error are because you have to put all the fields returned by * in your group by clause.

Hope that helps
Walt
0
 
LVL 17

Expert Comment

by:walterecook
ID: 10814417
So if * returns
field1, fielda, someOtherField
in your results, your GROUP BY must contain those

GROUP BY field1, fielda, someotherField

Make sense?

Walt
0
 
LVL 6

Expert Comment

by:lausz
ID: 10814513
Try this, but you are deleting all the records when count(ticketnumber )>1

SELECT TICKETNUMBER
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

   DELETE edata_test
      WHERE TicketNumber
      IN (SELECT TicketNumber
             FROM edata_dupe)

   INSERT edata_test
      SELECT *
         FROM edata_dupe

     DROP TABLE edata_dupe
0
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

 
LVL 2

Author Comment

by:forrest321
ID: 10814562
The problem with that is that I need all of the columns to be reinserted into the original table after the duped rows are deleted.  That method would only reinsert the TicketNumber.
0
 
LVL 6

Expert Comment

by:lausz
ID: 10815922
Ok, you have to explain each field in the select ... And take care because you are deletin not only dup rows, you are deleting all (when the count(*)>1)

You can solve this problem adding this query ...


SELECT TICKETNUMBER , field1, field2, etc
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

   DELETE edata_test
      WHERE TicketNumber
      IN (SELECT TicketNumber
             FROM edata_dupe)
+++++++++++++++++++++

insert into edata_test
select field1,field2, max(some field to obtaine only one record for each ticketnumber)
from edata_dupe
group by field1, field2, etc

+++++++++++++++++++++

   INSERT edata_test
      SELECT *
         FROM edata_dupe

     DROP TABLE edata_dupe
0
 
LVL 7

Expert Comment

by:ChrisFretwell
ID: 10817881
If the entire row, not just ticketnumber is duplicated then you can do it much simpler....

select distinct {field list here}
      INTO edata_dupe
      FROM edata_test

drop table edata_test

exec sp_rename 'edata_dupe', 'edata_test'


The select distinct would of course return only distinct rows, then you have a clean dup-less table. drop your original and rename the new one.
0
 
LVL 8

Expert Comment

by:annamalai77
ID: 10819966
hi forrest

i would suggest that u delete the duplicate rows rather than re-inserting the distinct records. the below query will delete the duplicate rows from the table.

there's one problem with the delete, if the no.of records in the table is very high then it will take a longer time to delete
in that case u can go for cursors.

i will give the query in both ways.

<-- SINGLE QUERY METHOD -->
delete from edata_test a
where a.rowid < (select max(b.rowid)
                          from edata_test b
                          where a.ticketnumber = b.ticketnumber);

commit;

<-- USING CURSOR METHOD-->
declare
cursor c1 is
  select ticketnumber, max(rowid) rid
  from edata_test
  group by ticketnumber
  having count(*) > 1;
begin
  for i in c1
  loop
    delete from edata_test
    where ticketnumber = i.ticketnumber
    and   rowid != i.rid;
    commit;
end loop;
end;
/

the cursor method will run faster since there's a commit after every record;

regards
annamalai
0
 
LVL 18

Expert Comment

by:diasroshan
ID: 10820954
hi,
why don't u try the following to delete duplicate rows from a table...

As there is no such RowID type in Sql Server like Oracle, so you have to create an Identity Column with other fields as under:-

Create Table myTable
(myID int identity,
name varchar(10),
add1 varchar(10),
add2 varchar(10))

go


Further, if in case of sql server, you don't have identity column in your table, then you can add it without deleting the existing data with the following sql, before running the above Delete Query.

alter table mytable
add myid int identity


Insert into myTable
values('Jacky','Room 25','Drig Road')

go

Insert into myTable
values('Jack2','Room 29','Drig Road')

go

Insert into myTable
values('Jacky','Room 25','Drig Road')

go

Delete from myTable
where myID not in
(select min(myID) from myTable
group by name,add1,add2)

go


...need any further clarification ... do ask...


Cheers,
Rosh
0
 
LVL 26

Expert Comment

by:Hilaire
ID: 10824409
Can you try this

declare c1 cursor for SELECT TicketNumber, count(*) -1 as ToBeRemoved
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(*) > 1
declare @ticketnumber int, @removecnt int
open c1
fetch c1 into @ticketnumber, @removecnt
while @@fetch_status = 0
begin
    set rowcount @removecnt
    delete edata_test where ticketnumber = @ticketnumber
    fetch c1 into @ticketnumber, @removecnt
end
close c1
deallocate c1
set rowcount 0


HTH

Hilaire

0
 

Expert Comment

by:jacobymatt
ID: 10901066
Okay...I'm picking up this problem from here.  In the edata_test table, I created a copy of 2 different records.  With one of the records, I left the copy as a complete duplicate.  With the other copy, I changed one field in the row (not the TicketNumber) and left the rest the same as the original.  I did this because there may be times when the we get records with one field that is slightly different but all we care about is that the ticket number is the same and one of them needs to be removed, it doesnt matter which.  All of the errors that were posted by Forrest321 were occuring in the first portion of the statement:

SELECT DISTINCT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

So to fix the errors I used some suggestions mentioned above and now the statement looks like

SELECT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY DateIssued, TicketNumber, TravelDate, Traveler, PNRLocator, IATA, Endorsements, NameData, AccountNbr, AgentSine1, AgentSine2, Col012, Airlines, Branch, Status, DateUpdated, ExpirationDate, Email, id, DateAdded  
      HAVING COUNT(TicketNumber) > 1

Now all the fields are listed in the Group By clause.  This command is completing without errors and it copies one instance of the identical pair of records into the edata_dupe table but thats all.  There are 2 problems I cant understand.  One is that since I am no longer using disinct in my select than I thought this statement would copy both instances of the identical pair of records into edata_dupe, but it doesnt.  Also, the record that I changed only one field in still has the same ticket number so why wouldn't that pair of nearly duplicate records also be copied into edata_dupe?  So edata_dupe should contain 4 records as far as I can see, but I only have one.  Please help me.


0
 
LVL 2

Author Comment

by:forrest321
ID: 10901144
jacobymatt has taken over this project for me....I will award points at his request.

Thanks for the help!
0
 
LVL 2

Author Comment

by:forrest321
ID: 10928771
Okay...I'm picking up this problem from here.  In the edata_test table, I created a copy of 2 different records.  With one of the records, I left the copy as a complete duplicate.  With the other copy, I changed one field in the row (not the TicketNumber) and left the rest the same as the original.  I did this because there may be times when the we get records with one field that is slightly different but all we care about is that the ticket number is the same and one of them needs to be removed, it doesnt matter which.  All of the errors that were posted by Forrest321 were occuring in the first portion of the statement:

SELECT DISTINCT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(TicketNumber) > 1

So to fix the errors I used some suggestions mentioned above and now the statement looks like

SELECT *
      INTO edata_dupe
      FROM edata_test
      GROUP BY DateIssued, TicketNumber, TravelDate, Traveler, PNRLocator, IATA, Endorsements, NameData, AccountNbr, AgentSine1, AgentSine2, Col012, Airlines, Branch, Status, DateUpdated, ExpirationDate, Email, id, DateAdded  
      HAVING COUNT(TicketNumber) > 1

Now all the fields are listed in the Group By clause.  This command is completing without errors and it copies one instance of the identical pair of records into the edata_dupe table but thats all.  There are 2 problems I cant understand.  One is that since I am no longer using disinct in my select than I thought this statement would copy both instances of the identical pair of records into edata_dupe, but it doesnt.  Also, the record that I changed only one field in still has the same ticket number so why wouldn't that pair of nearly duplicate records also be copied into edata_dupe?  So edata_dupe should contain 4 records as far as I can see, but I only have one.  Please help me.
0
 
LVL 26

Expert Comment

by:Hilaire
ID: 10928974
You're going nowhere with this code
DISTINCT and GROUP BY are exactly the same in the above statement.

"Nearly equal" does not exist in SQL : either it's equal, or not
So if you group by all the fields, you'll get no duplicates unless ALL the fields are equal

The HAVING Clause filters records than have at least one dup

run the following code, it will help you figure out

SELECT count(*) as #occurences , *
FROM edata_test
GROUP BY DateIssued, TicketNumber, TravelDate, Traveler, PNRLocator, IATA, Endorsements, NameData, AccountNbr, AgentSine1, AgentSine2, Col012, Airlines, Branch, Status, DateUpdated, ExpirationDate, Email, id, DateAdded  

The "Having count(*) > 1" just removes the records where #occurences is <= 1

If you just need to make the TicketNumber UNIQUE (ie keep onlly one record per ticketnumber),
just run the code I posted above and you'll get a clean table

Hilaire
0
 

Expert Comment

by:jacobymatt
ID: 10933632
I tried to run the above statement in Edata_test using SQL Query Analyzer and all it did was return all of my records along with an extra field called Occurrences.  On every record the occurence is 1.  How do I modify this to get the records where the occurence is two.  There has got to be a way to remove the dupes from tables.
0
 
LVL 26

Expert Comment

by:Hilaire
ID: 10935950
Did you try the code in my first post ?
Paste the following code and run it in query analyser and it will remove duplicates, leaving one row per distinct TicketNumber

declare c1 cursor for SELECT TicketNumber, count(*) -1 as ToBeRemoved
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(*) > 1
declare @ticketnumber int, @removecnt int
open c1
fetch c1 into @ticketnumber, @removecnt
while @@fetch_status = 0
begin
    set rowcount @removecnt
    delete edata_test where ticketnumber = @ticketnumber
    fetch c1 into @ticketnumber, @removecnt
end
close c1
deallocate c1
set rowcount 0
0
 

Expert Comment

by:jacobymatt
ID: 10939396
Yes I tried that code as well and this is the error I got:

Server: Msg 248, Level 16, State 1, Line 7
The conversion of the varchar value '0017513606834' overflowed an int column. Maximum integer value exceeded.

0
 
LVL 26

Accepted Solution

by:
Hilaire earned 600 total points
ID: 10939580
OK, can you try this one now ?
(changed datatype of my @TicketNumber variable to fit your datatypes)

declare c1 cursor for SELECT TicketNumber, count(*) -1 as ToBeRemoved
      FROM edata_test
      GROUP BY TicketNumber
      HAVING COUNT(*) > 1
declare @ticketnumber varchar(50), @removecnt int
open c1
fetch c1 into @ticketnumber, @removecnt
while @@fetch_status = 0
begin
    set rowcount @removecnt
    delete edata_test where ticketnumber = @ticketnumber
    fetch c1 into @ticketnumber, @removecnt
end
close c1
deallocate c1
set rowcount 0
0
 

Expert Comment

by:jacobymatt
ID: 10940035
Wow!  It looks like you found the answer.  I have no idea how any of it works but it seems to do the trick.  I'm telling Forrest321 to award points to you.  Thanks a bunch!
0
 
LVL 2

Author Comment

by:forrest321
ID: 10940047
Thanks for the help everyone!
0

Featured Post

Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, Microsoft released a best-practice guide for securing Active Directory. It's a whopping 300+ pages long. Those of us tasked with securing our company’s databases and systems would, ideally, have time to devote to learning the ins and outs…
This article shows how to get a list of available printers for display in a drop-down list, and then to use the selected printer to print an Access report or a Word document filled with Access data, using different syntax as needed for working with …
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question