Solved

Remove Duplicate rows in Sql

Posted on 2002-06-26
18
603 Views
Last Modified: 2008-02-01
I need a little help with a sql script

I have a huge table of addresses with numerous duplicates, I need a script that will look at the table and if it see's more than one row with identical data it will delete any of the extra rows and only leave one row of that data.
My table name is "Person_Address", I want the script to look at the "master_name_key","house_number" and "Street" field. If there is more than one row with the identical three peices of data (or identical nulls) in all three fields then delete all the extras leaving me just one.
0
Comment
Question by:DavidNPD
  • 8
  • 5
  • 4
  • +1
18 Comments
 
LVL 75

Expert Comment

by:Anthony Perkins
Comment Utility
Please maintain these open questions:
Logon Scirpts not mapping drives in XP Date: 03/17/2002 06:29AM PST
http://www.experts-exchange.com/jsp/qShow.jsp?ta=winxp&qid=20277927
Outlook on a SLOW network connection Date: 03/28/2002 08:12PM PST
http://www.experts-exchange.com/jsp/qShow.jsp?ta=msoutlook&qid=20282814
Crystal Reports RDC making ADO connection in VB Date: 04/17/2002 07:38PM PST  
http://www.experts-exchange.com/jsp/qShow.jsp?ta=crystal&qid=20290747
Formatting dates in Crystal Dictionaries Date: 04/28/2002 03:04PM PST
http://www.experts-exchange.com/jsp/qShow.jsp?ta=crystal&qid=20294676
Exporting crystal to a PDF file from a VB Application Date: 04/29/2002 01:02PM PST
http://www.experts-exchange.com/jsp/qShow.jsp?ta=crystal&qid=20295092

Thanks,
Anthony

 
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
I can see two approaches.  

(1) Create a statement that finds and deletes one duplicate.  Loop thru it repeatedly until no more duplicates exist.  For example:

SET ROWCOUNT 1
WHILE 1 = 1
BEGIN
    DELETE Person_Address
    FROM Person_Address
    INNER JOIN
    (SELECT TOP 1 master_key_name, house_number, street
    FROM Person_Address
    GROUP BY master_key_name, house_number, street
    HAVING COUNT(*) > 1)
    AS dup ON person_address.master_key_name = dup.master_key_name AND
              person_address.house_number = dup.house_number AND
              person_address.street = dup.street
    IF @@ROWCOUNT = 0
        BREAK
END --WHILE

(2) Create a cursor to find the duplicates.  Fetch thru the cursor deleting the duplicates.  For example:
DECLARE csrDupAddr CURSOR FOR
SELECT master_key_name, house_number, street, COUNT(*)
FROM Person_Address
GROUP BY master_key_name, house_number, street
HAVING COUNT(*) > 1
FOR READ ONLY
DECLARE @master_key_name VARCHAR(??) --match to orig col
DECLARE @house_number VARCHAR(??) --match to orig col
DECLARE @street VARCHAR(??) --match to orig col


0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Oops, here is the full version of (2):

(2)
DECLARE csrDupAddr CURSOR FOR
SELECT master_key_name, house_number, street, COUNT(*)
FROM Person_Address
GROUP BY master_key_name, house_number, street
HAVING COUNT(*) > 1
FOR READ ONLY
DECLARE @master_key_name VARCHAR(5) --match to orig col
DECLARE @house_number VARCHAR(5) --match to orig col
DECLARE @street VARCHAR(5) --match to orig col
DECLARE @num_dups INT
FETCH NEXT FROM csrDupAddr INTO @master_key_name, @house_number, @street, @num_dups
WHILE @@FETCH_STATUS <> -1
BEGIN
    SET @num_dups = @num_dups - 1  --delete all dups except 1
    SET ROWCOUNT @num_dups
    DELETE FROM Person_Address
    WHERE master_key_name = @master_key_name
    AND house_number = @house_number
    AND street = @street
    FETCH NEXT FROM csrDupAddr INTO @master_key_name, @house_number, @street, @num_dups
END --WHILE

If this is a very large table and/or there are a lot of duplicates, the second method will probably perform better.  If you don't have an INDEX on those columns, for a large table it would probably be worth creating one just while running the DELETE process.
0
 

Author Comment

by:DavidNPD
Comment Utility
Scott, Thanks for the info. I'm not really good, with SQL just learning how to do some basic scripts. Can I run this as a script through Query Analyzer?
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Yes, you should be able to run either from Query Analyzer.  Of course be sure to run it on a test system and back up the table first just in case.
0
 

Author Comment

by:DavidNPD
Comment Utility
Great I will run it tonight and let you know
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
As I said, be sure to run a test first on a test server.  And before the test, back up the test table.

DEFINITELY back up the production table before attempting to delete any rows.
0
 

Author Comment

by:DavidNPD
Comment Utility
Scott I am having a little trouble with both of these. I created a new table named person)address2 to test with. I changed the appropritate table name in your scripts. When I run the first one I get this error:

Server: Msg 208, Level 16, State 1, Line 4
Invalid object name 'person_address2'.
Server: Msg 208, Level 16, State 1, Line 4
Invalid object name 'person_address2'.

I then took the table name out between the delete and the from statement and ran it again and got:

Server: Msg 156, Level 15, State 1, Line 6
Incorrect syntax near the keyword 'INNER'.
Server: Msg 156, Level 15, State 1, Line 11
Incorrect syntax near the keyword 'AS'.

I then tried the second one, when I ran that I got this error:


Server: Msg 208, Level 16, State 1, Line 2
Invalid object name 'person_address2'.

which is strange because the table name is not in the second line. Any idea what these errors could be?

0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
It looks as if it can't find the new test table you created.  Make sure that you are in the same db in which the table was created and that the owner of the table is 'dbo' (or that you specify the table as owner.table in the SQL commands).
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 

Author Comment

by:DavidNPD
Comment Utility
The owner is most likely the issue. All my tables are dba. I will change the table name in your scripts to dba.person_address2, that should solve it right? I will check tonight.
0
 
LVL 9

Accepted Solution

by:
miron earned 50 total points
Comment Utility
well,

there is always one more way to do the same thing.

select master_key_name, house_number, street into temp_copy from Person_Address
group by master_key_name, house_number, street

execute sp_rename 'Person_Address', 'old_Person_Address', 'OBJECT'

execute sp_rename 'temp_copy', 'Person_Address', 'OBJECT'
0
 
LVL 9

Expert Comment

by:miron
Comment Utility
well,

there is always one more way to do the same thing.

select master_key_name, house_number, street into temp_copy from Person_Address
group by master_key_name, house_number, street

execute sp_rename 'Person_Address', 'old_Person_Address', 'OBJECT'

execute sp_rename 'temp_copy', 'Person_Address', 'OBJECT'
0
 

Author Comment

by:DavidNPD
Comment Utility
Miron, I am a little confused what will your script do?
0
 
LVL 9

Expert Comment

by:miron
Comment Utility
rather then removing ( sic ) duplicates one by one, it will create set of unique records using some temporary name, then rename old set of records ( table ) sonsiting of some unique records and some duplicates, to some other name, and finally, rename the set of records free of duplicates to the name appropriate.

It's good, is that you are not loosing original set of records, but replacing it with set of unique ( what you were looking for ) records and at the same time keep completely old records un-touched, sort of a backup.
0
 

Author Comment

by:DavidNPD
Comment Utility
This is the weirdest thing, If I run your script against a test table with hundreds of rows pasted in it works fine. If I try to run it against my actual table though it comes up with no errors but doesnt effect any rows. I have imported the data from my good table to my test table and ran the script again. At that point I run the query and it only effects 5 rows. This is out of 105k. I can still see hundreds of rows that should have been done. If I run the script again it then shows "0 Rows Effected" I then copy and paste a row to make sure they are identical and run the script again, still it shows "0 Rows Effected". Any idea what I am doing wrong?

Below is the code I am using now with the correct field names


SET ROWCOUNT 1
WHILE 1 = 1
BEGIN
   DELETE dba.person_address3
   FROM dba.person_address3
   INNER JOIN
   (SELECT TOP 1 master_name_key, house_no, street
   FROM dba.person_address3
   GROUP BY master_name_key, house_no, street
   HAVING COUNT(*) > 1)
   AS dup ON dba.person_address3.master_name_key = dup.master_name_key AND
             dba.person_address3.house_no = dup.house_no AND
             dba.person_address3.street = dup.street
   IF @@ROWCOUNT = 0
       BREAK
END --WHILE
0
 

Author Comment

by:DavidNPD
Comment Utility
OK Miron, that seemed to work pretty good, I just need to rewrite it a little bit to take in the Town State and Zips, If I am creating a new table I will lose that. I will let you know.
0
 
LVL 9

Expert Comment

by:miron
Comment Utility
include Town, State, and Zip columns, expected some troubles with NULL values.

-- create unique key for each row
alter table Person_Address add record_id int identity( 1, 1 )
-- make NULL to be equael to NULL
SET ANSY_NULLS ON

select
   master_key_name
  , house_number
  , street
  , Town
  , State
  , Zip
into temp_copy
from
Person_Address where record_id in
(
select max( record_id )
from Person_Address
-- --optional where clause will fend off NULL values
-- where
-- master_key_name is not NULL
-- OR house_number is not NULL
-- OR street is not  NULL
group by master_key_name, house_number, street
)

alter table Person_Address drop column record_id
execute  sp_rename 'Person_Address', 'old_Person_Address', 'OBJECT'
execute sp_rename 'temp_copy', 'Person_Address', 'OBJECT'


0
 

Author Comment

by:DavidNPD
Comment Utility
Miron, this seems to be working very well, and I like the idea of creating a new table, thanks.

Peter thanks for your help also.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Via a live example, show how to shrink a transaction log file down to a reasonable size.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now