Remove the Duplicate Data

Posted on 2005-05-11
Medium Priority
Last Modified: 2010-03-19
Dear Friend

Can any one please tell me how can i remove the duplicates data and i want only 1 unique records .

i have the table having these structure

Create table xyz(srno numeric,sname varchar(12),age numeric)

1      manish      26
1      manish      26
1      manish      26
2      Rohan      27
2      Rohan      27
3      Riyaz      28
3      Riyaz      28
3      Riyaz      28
4      shabbir      29
4      shabbir      29
4      shabbir      29
4      shabbir      29

Thank & Regards
Manish Kaushik
Question by:manishkaushik

Accepted Solution

worldserga earned 100 total points
ID: 13983830

Follow these three steps:

1) alter table xyz
     add seq_num int identity

    FROM  xyz a JOIN (SELECT     srno, sname, age, MAX(seq_num) max_seq_num FROM xyz GROUP BY srno, sname, age
    HAVING COUNT(*) > 1) b ON a.srno = b.srno AND a.sname = b.sname AND a.age = b.age AND a.seq_num < b.max_seq_num

3) alter table xyz
     drop column seq_num

This will definitely solve your problem.


Assisted Solution

Julianva earned 100 total points
ID: 13984016
run this scripts - Please test it first before you run it on live db

Declare @srno int,
        @sname varchar (50),
        @cnt int,
        @age numeric

Declare getallrecords cursor local static For
 Select count (1), srno, sname, age
   from xyz (nolock)
   group by srno, sname,age having count(1)>1
Open getallrecords

Fetch next from getallrecords into @cnt,@srno,@sname,@age
--Cursor to check with all other records
While @@fetch_status=0
  Set @cnt= @cnt-1
  Set rowcount @cnt

  -- Deleting the duplicate records. Observe that all fields are mentioned at the where condition
  Delete from employee where srno=@srno and sname=@sname
  and age=@age
  Set rowcount 0

  Fetch next from getallrecords into @cnt,@srno,@sname,@age


Close getallrecords
Deallocate getallrecords
LVL 17

Expert Comment

ID: 13984396
either of the above methods will work OK, a simpler solution may be to simply extract out the unique records, and reinsert them - it really depends on how many duplicates you have. If you only have a few, the most efficient way will be Julianva's, but if you have a lot of dupliactes, all the deletes will be time-consuming.
instead, try the following :

select into #temp_xyz distinct srno, sname, age from xyz
truncate table xyz
insert into xyz select * from #temp_xyz
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.


Expert Comment

ID: 13984540
Hi everybody,

I cannot agree with BillAn1.

A SET based solution is almost always faster and efficient than an iterative or loop based solution.
LVL 17

Expert Comment

ID: 13986095
worldserga, I appologise, I did not mean to point to Julianva's answer over yours, I should have said
"...most efficient way will be Julianva's or Worldserga's" (the merits / demerits of the 2 approaches again depends on the data. It is often found that an approach like Julianva's is preferred, since it is ittertative, and does not have such an impact on other processes in a live environment. Also, the overhead of adding an extra column (an identity) and then dropping it again, calculating the max etc can be quite resource consuming, so it will still depend heavily on the data which approach is more efficient )

What I really wanted to do was to offer an alternative approach from deleting the offending rows at all, no matter how you did it, and instead to reload the table with the correct rows, which is often the more efficient solution - again depending on the data.

Author Comment

ID: 14008079
Dear all Friends

Thank you to all of your for spending time for me.

i got various concepts of doing the task in my one of the huge ( live ) database having 22457892 records.

Thank u so much.

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have a large data set and a SSIS package. How can I load this file in multi threading?
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question