How to redeclare SELECT statement for not delaying in million records database

Hi!

I have a 3 million records sql server table. I need to deduplicate based in a certain group of fields.

I've already made an application in vb to do so. The way I do it is:

1. Open a recordset with this statement:

SELECT  TOP 1000 * FROM NACIMIENTO AS TABLA1 WHERE EXISTS (SELECT MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP, NOMBRE_MADRE, PRIMER_AP_MADRE, SEGUNDO_AP_MADRE, NOMBRE_ABAM, PRIMER_AP_ABAM, SEGUNDO_AP_ABAM, COUNT(*) FROM NACIMIENTO AS TABLA2 WHERE [TABLA1].[MUN_OFI]=[TABLA2].[MUN_OFI] AND [TABLA1].[NOMBRE]=[TABLA2].[NOMBRE] AND [TABLA1].[PRIMER_AP]=[TABLA2].[PRIMER_AP] AND [TABLA1].[SEGUNDO_AP]=[TABLA2].[SEGUNDO_AP] AND [TABLA1].[NOMBRE_MADRE]=[TABLA2].[NOMBRE_MADRE] AND [TABLA1].[PRIMER_AP_MADRE]=[TABLA2].[PRIMER_AP_MADRE] AND [TABLA1].[SEGUNDO_AP_MADRE]=[TABLA2].[SEGUNDO_AP_MADRE] AND [TABLA1].[NOMBRE_ABAM]=[TABLA2].[NOMBRE_ABAM] AND [TABLA1].[PRIMER_AP_ABAM]=[TABLA2].[PRIMER_AP_ABAM] AND [TABLA1].[SEGUNDO_AP_ABAM]=[TABLA2].[SEGUNDO_AP_ABAM] GROUP BY MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP, NOMBRE_MADRE, PRIMER_AP_MADRE, SEGUNDO_AP_MADRE, NOMBRE_ABAM, PRIMER_AP_ABAM, SEGUNDO_AP_ABAM HAVING COUNT(*) > 1) ORDER BY MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP, NOMBRE_MADRE, PRIMER_AP_MADRE, SEGUNDO_AP_MADRE, NOMBRE_ABAM, PRIMER_AP_ABAM, SEGUNDO_AP_ABAM**

** I used the TOP clause in order to get it run, since when I didn't use it the application just hanged on...  But it keeps on hanging! And I tried to run this query on query analyzer but it doesn't function

2. When I can get the recordset opened (with tables of less than 300K records) I then move between the records looking for the duplicates (there can be 2, 3 or more duplicated records for each one). I apply a certain group of criteria in order to select the record to be kept.

3. Then I create a DeletedTable table and copy the duplicated records.

4. When this has been done I make a DELETE FROM Table WHERE EXISTS (SELECT * FROM DELETEDTABLE WHERE DELETEDTABLE.KEY1 = TABLE.KEY1 ...) I have around 6 key fields but only one of this is usually used to deduplicate on.

The question is, is there any way to optimize the sql mentioned in 1)

Or some other way to detect duplicated records with this kind of data?

The fields I'm deduplicated on are all text fields.

Any help will be greatly appreciated
bethzycbAsked:
Who is Participating?
 
imran_fastConnect With a Mentor Commented:
>>what happen to the recordset if while I'm looping it I delete some of the records?
In the above there is no looping it is not a recordset it is direct delete stmt.

if you want to keep copy than you have to execute two statements


first to record duplicate rows
===================
select * Into YourbackupTable from NACIMIENTO
where keyfield not in
(
select min(A.keyfield) from NACIMIENTO A
inner join
(SELECT MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP,
NOMBRE_MADRE, PRIMER_AP_MADRE, SEGUNDO_AP_MADRE,
NOMBRE_ABAM, PRIMER_AP_ABAM, SEGUNDO_AP_ABAM, COUNT(*)
FROM NACIMIENTO
GROUP BY MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP, NOMBRE_MADRE,
PRIMER_AP_MADRE, SEGUNDO_AP_MADRE, NOMBRE_ABAM, PRIMER_AP_ABAM,
SEGUNDO_AP_ABAM HAVING COUNT(*) > 1) B
ON
A.MUN_OFI=B.MUN_OFI AND
A.NOMBRE = B.NOMBRE AND
A.PRIMER_AP =B.PRIMER_AP AND
A.SEGUNDO_AP = B.SEGUNDO_AP AND
A.NOMBRE_MADRE = B.NOMBRE_MADRE AND
A.PRIMER_AP_MADRE = B.PRIMER_AP_MADRE AND
A.SEGUNDO_AP_MADRE = B.SEGUNDO_AP_MADRE AND
A.NOMBRE_ABAM = B.NOMBRE_ABAM AND
A.PRIMER_AP_ABAM = B.PRIMER_AP_ABAM AND
A.SEGUNDO_AP_ABAM =B.SEGUNDO_AP_ABAM  
GROUP BY A.MUN_OFI, A.NOMBRE, A.PRIMER_AP, A.SEGUNDO_AP, A.NOMBRE_MADRE,
A.PRIMER_AP_MADRE, A.SEGUNDO_AP_MADRE, A.NOMBRE_ABAM, A.PRIMER_AP_ABAM,
A.SEGUNDO_AP_ABAM
)


then to delete them
==============
delete from NACIMIENTO
where keyfield not in
(
select min(A.keyfield) from NACIMIENTO A
inner join
(SELECT MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP,
NOMBRE_MADRE, PRIMER_AP_MADRE, SEGUNDO_AP_MADRE,
NOMBRE_ABAM, PRIMER_AP_ABAM, SEGUNDO_AP_ABAM, COUNT(*)
FROM NACIMIENTO
GROUP BY MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP, NOMBRE_MADRE,
PRIMER_AP_MADRE, SEGUNDO_AP_MADRE, NOMBRE_ABAM, PRIMER_AP_ABAM,
SEGUNDO_AP_ABAM HAVING COUNT(*) > 1) B
ON
A.MUN_OFI=B.MUN_OFI AND
A.NOMBRE = B.NOMBRE AND
A.PRIMER_AP =B.PRIMER_AP AND
A.SEGUNDO_AP = B.SEGUNDO_AP AND
A.NOMBRE_MADRE = B.NOMBRE_MADRE AND
A.PRIMER_AP_MADRE = B.PRIMER_AP_MADRE AND
A.SEGUNDO_AP_MADRE = B.SEGUNDO_AP_MADRE AND
A.NOMBRE_ABAM = B.NOMBRE_ABAM AND
A.PRIMER_AP_ABAM = B.PRIMER_AP_ABAM AND
A.SEGUNDO_AP_ABAM =B.SEGUNDO_AP_ABAM  
GROUP BY A.MUN_OFI, A.NOMBRE, A.PRIMER_AP, A.SEGUNDO_AP, A.NOMBRE_MADRE,
A.PRIMER_AP_MADRE, A.SEGUNDO_AP_MADRE, A.NOMBRE_ABAM, A.PRIMER_AP_ABAM,
A.SEGUNDO_AP_ABAM
)
0
 
imran_fastCommented:
/*
 This is the best way to find duplicate records delete it
 You don't need to put the records in temporary table to delete them  */

delete from NACIMIENTO
where keyfield not in
(
select min(A.keyfield) from NACIMIENTO A
inner join
(SELECT MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP,
NOMBRE_MADRE, PRIMER_AP_MADRE, SEGUNDO_AP_MADRE,
NOMBRE_ABAM, PRIMER_AP_ABAM, SEGUNDO_AP_ABAM, COUNT(*)
FROM NACIMIENTO
GROUP BY MUN_OFI, NOMBRE, PRIMER_AP, SEGUNDO_AP, NOMBRE_MADRE,
PRIMER_AP_MADRE, SEGUNDO_AP_MADRE, NOMBRE_ABAM, PRIMER_AP_ABAM,
SEGUNDO_AP_ABAM HAVING COUNT(*) > 1) B
ON
A.MUN_OFI=B.MUN_OFI AND
A.NOMBRE = B.NOMBRE AND
A.PRIMER_AP =B.PRIMER_AP AND
A.SEGUNDO_AP = B.SEGUNDO_AP AND
A.NOMBRE_MADRE = B.NOMBRE_MADRE AND
A.PRIMER_AP_MADRE = B.PRIMER_AP_MADRE AND
A.SEGUNDO_AP_MADRE = B.SEGUNDO_AP_MADRE AND
A.NOMBRE_ABAM = B.NOMBRE_ABAM AND
A.PRIMER_AP_ABAM = B.PRIMER_AP_ABAM AND
A.SEGUNDO_AP_ABAM =B.SEGUNDO_AP_ABAM  
GROUP BY A.MUN_OFI, A.NOMBRE, A.PRIMER_AP, A.SEGUNDO_AP, A.NOMBRE_MADRE,
A.PRIMER_AP_MADRE, A.SEGUNDO_AP_MADRE, A.NOMBRE_ABAM, A.PRIMER_AP_ABAM,
A.SEGUNDO_AP_ABAM
)
0
 
bethzycbAuthor Commented:
The problem is  I have to keep a copy of the deleted records.

And what happen to the recordset if while I'm looping it I delete some of the records? Does the absoluteposition of them modify?
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.