?
Solved

Flag Duplicate Records in a table

Posted on 2005-05-10
8
Medium Priority
?
1,036 Views
Last Modified: 2008-02-26
I have already asked this question and now realize it did not fully do what I expectd:

http://www.experts-exchange.com/Databases/Microsoft_SQL_Server/Q_21408384.html

What I need to fix is the first part.  Based on my question the solution works great.  I did not give enough info, so I hope you guys can help me.

I have a table with 173 fields in it.  It contains a lot of customer data.  One of the fields is OPHONE.  This OPHONE is not a unique ID and the tmp table I am working with does not have a unique ID.

What happens is I get a new file in with new customer data.  I import that into my TMPIMPORT.  I need to flag a field called PROCFLAG in TMPIMPORT for all duplicates.  As an example

OPHONE          MADDRESS                                         MCITY
2027572481    1600 PENNSYLVANIA AVE NW               WASHINGTON
2027572481    1600 PENNSYLVANIA AVE NW # 105      WASHINGTON
2027572481    1600 PENNSYLVANIA AVE NW #45         WASHINGTON
2027572481    1600 PENNSYLVANIA AVE SW                WASHINGTON

2027572500    1600 PENNSYLVANIA AVE NW # 105      WASHINGTON
2027572500    1600 PENNSYLVANIA AVE NW #45         WASHINGTON
2027572500    1600 PENNSYLVANIA AVE SW                WASHINGTON

I would want to flag 3 records (any 3) from the first number "2027572481" and 2 records from the second number "2027572500".

This will net me the following fields with no flag: (this assumes I did not flag the first record for each phone number)
OPHONE          MADDRESS                                         MCITY
2027572481    1600 PENNSYLVANIA AVE NW               WASHINGTON
2027572500    1600 PENNSYLVANIA AVE NW # 105      WASHINGTON

The answer in the Question above does not allow me to get a full distinct row per phone number as the entire row is not distince, but rather only the phone number.





0
Comment
Question by:rrollinsny
  • 4
  • 2
  • 2
8 Comments
 

Author Comment

by:rrollinsny
ID: 13973455
Just to show the SP i currently use:

/* FLAG DUPS */
update a
set procflag=@dupflag
FROM         dbo.TMPIMPORT a INNER JOIN
(SELECT ophone , max(ophone ) AS Dups
FROM          dbo.TMPIMPORT
GROUP BY ophone
HAVING (COUNT(*) > 1)) b ON a.ophone = b.ophone


This results in a flag on all phone numbers that are duplicate.  It does not leave one blank and flag the others.
0
 
LVL 28

Expert Comment

by:rafrancisco
ID: 13974901
Try this:

update a
set procflag=@dupflag
FROM         dbo.TMPIMPORT a INNER JOIN
(SELECT ophone , max(MAddress ) AS MAddress
FROM          dbo.TMPIMPORT
GROUP BY ophone
HAVING (COUNT(*) > 1)) b ON a.ophone = b.ophone and A.MAddress != B.MAddress
0
 

Author Comment

by:rrollinsny
ID: 13974995
rafrancisco,

Thanks for the info.  I tried that and it worked for those records that the address was different.  The issue is it may be one of any of the 173 fields that are different or the entire row may be an exact duplicate.  

Any other suggestions?
0
Configuration Guide and Best Practices

Read the guide to learn how to orchestrate Data ONTAP, create application-consistent backups and enable fast recovery from NetApp storage snapshots. Version 9.5 also contains performance and scalability enhancements to meet the needs of the largest enterprise environments.

 
LVL 8

Assisted Solution

by:Julianva
Julianva earned 1000 total points
ID: 13975933
do this as a test first to see if it will give you the results
then you can modify it to suit your tables.
 
/* Script for Creation of Employee Table*/
CREATE TABLE [dbo].[Employee] (
            [id] [int] NULL ,
            [name] [Varchar] (50) ,
            [salary] [Numeric](18, 2) NULL,
            [Duplicate] [varchar] (5) null            
 ) ON [PRIMARY]
GO

Insert into employee values (1,'Ram', 1000.00)
Insert into employee values (1,'Ram', 1000.00)
Insert into employee values (2,'Joe', 2000.00)
Insert into employee values (2,'Joe', 1000.00)
Insert into employee values (3,'Mary', 1000.00)
Insert into employee values (4,'Julie', 5000.00)
Insert into employee values (2,'Joe', 1000.00)
Insert into employee values (1,'Ram', 1000.00)

Declare @id int,
        @name varchar (50),
        @cnt int,
        @salary numeric

Declare getallrecords cursor local static For
 Select count (1), id, name, salary
   from employee (nolock)
   group by id, name,salary having count(1)>1
 
Open getallrecords

Fetch next from getallrecords into @cnt,@id,@name,@salary
--Cursor to check with all other records
While @@fetch_status=0
 Begin
  Set @cnt= @cnt-1
  Set rowcount @cnt

 - updating the fields that are duplicates leaving one

  update  employee
set Duplicate = 'Dup'
where id=@id and name=@name and salary=@salary

  Set rowcount 0

  Fetch next from getallrecords into @cnt,@id,@name,@salary
 End

Close getallrecords
Deallocate getallrecords




0
 

Author Comment

by:rrollinsny
ID: 13978039
Julianva,

This solution worked well.  It takes a while to run about 3 minutes on less than 100,000 records.  Here is the modified code I am using:

Declare @idphone nvarchar(10),
        @cnt int

Declare getallrecords cursor local static For
 Select count (1), ophone
   from tmpimport (nolock)
   group by ophone having count(1)>1
 
Open getallrecords

Fetch next from getallrecords into @cnt,@idphone
--Cursor to check with all other records
While @@fetch_status=0
 Begin
  Set @cnt= @cnt-1
  Set rowcount @cnt

 -- updating the fields that are duplicates leaving one

update  tmpimport
set procflag = 'D'
where ophone=@idphone

  Set rowcount 0

  Fetch next from getallrecords into @cnt,@idphone
 End

Close getallrecords
Deallocate getallrecords


Is there any other way this can be done, much faster?  There are days when I need to process close to 1 million new records.
0
 
LVL 28

Accepted Solution

by:
rafrancisco earned 1000 total points
ID: 13978094
How about this:

SELECT *, IDENTITY(INT, 1, 1) AS IdentityColumn
INTO TMPImport2
FROM TMPImport
GO

update a
set procflag=@dupflag
FROM         dbo.TMPIMPORT2 a INNER JOIN
(SELECT ophone , MIN(IdentityColumn) AS IdentityColumn
FROM          dbo.TMPIMPORT2
GROUP BY ophone
HAVING (COUNT(*) > 1)) b ON a.ophone = b.ophone and A.IdentityColumn != B.IdentityColumn
0
 
LVL 8

Expert Comment

by:Julianva
ID: 13978315
rrollinsny

Sorry i dont know any other way to scroll through the records except with cursors.

I agree that cursors are slow on large data.

I initially wrote that code for deleting dups from my database.

Hope experts can help.





0
 

Author Comment

by:rrollinsny
ID: 13994982
Thanks to both of you!  Both solutions worked, the latter was a little faster for me.
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
When trying to connect from SSMS v17.x to a SQL Server Integration Services 2016 instance or previous version, you get the error “Connecting to the Integration Services service on the computer failed with the following error: 'The specified service …
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Viewers will learn how the fundamental information of how to create a table.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question