Link to home
Start Free TrialLog in
Avatar of Powerhousecomputing
Powerhousecomputing

asked on

Duplicate Records Query

I need to run a query (I guess) to delete duplicate records in my table.  This should be based on 3 fields being true [first name], [surname] and [date of birth].  Thus if more than 1 record exists that matches this same data then any later records should be deleted.

How do I do this?
Avatar of hampus_b
hampus_b
Flag of Sweden image

Avatar of Powerhousecomputing
Powerhousecomputing

ASKER

what is the exact syntax I need to use?
this article is no good at all - I have one table and this info does not relate to what I am trying to do.
Have you tried the "find duplicates" wizard in Access? This will help you to visualize the duplicated data better... Once you get it to display the correct records, you can then view the sql and get a better understanding of how the syntax works.

You can't just delete all of the records produced by that query, though, or none of the records would be left behind!

Are all duplicated records 100% equal? In other words, are there maybe related records which may indicate activity on one record versus another?

Often, when deleting duplicated records, additional logic must be used to determine which record to leave behind.

Post some more info, though, and we should have no problem sorting this out!
Thanks - but as I said in my question, I need to run a query to delete duplicate records in my table.  This should be based on 3 fields being true [first name], [surname] and [date of birth].  Thus if more than 1 record exists that matches this same data then any later records should be deleted.
Avatar of Rey Obrero (Capricorn1)
if you have a unique id field

select min([Id]),[first name], [surname], [date of birth]
from tablex
group by [first name], [surname], [date of birth]

if not, create a backup copy of table

add an autonumber field ( ID )
you can create a new table without duplicates with

select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)
sorry, very confused.....

I do have a unique ID in the table.
what is the name of the field with unique ID? what is the name of the table?

All I want  to do is delete records that are duplicates of others - I have created a duplicates query using the wizard - can I turn that into a delete query and then use it to delete the duplicate records only?
you will be doing your self a favor if you will just create a new table * without duplicates* and delete the old table with duplicates
and this will do it for your


you can create a new table without duplicates with

select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)



I think this could too easily screw up the rest of of my DB - can I not do it the other way using a query?
you see I will need to run this process frequently.
that is a query i posted above...

(I think this could too easily screw up the rest of of my DB) why do you think so??

create a backup copy of your db.

close your db,  locate the db, right click on it and select copy and paste
I am really unsure how to do this
I was hoping for an easy query I could use to run this process.
(you see I will need to run this process frequently.) what???

after you cleaned your table with the duplicates, prevent this from happening again..
by screening the data being inputted...
I am importing data almost daily and I will need to check and remove duplicates daily.
The first thing you need to create is a query that helps you to see the duplicated records. Here is a query that I just created using the access "Find Duplicates" wizard, against an actual database of mine, which actually shows duplicated records... It is very similar to yours...

SELECT tblClients.ClientNo, tblClients.DateEntered, tblClients.First, tblClients.Middle, tblClients.Last
FROM tblClients
WHERE (((tblClients.First) In (SELECT [First] FROM [tblClients] As Tmp GROUP BY [First],[Middle],[Last] HAVING Count(*)>1  And [Middle] = [tblClients].[Middle] And [Last] = [tblClients].[Last])))
ORDER BY tblClients.First, tblClients.Middle, tblClients.Last;


Running this on my db produces results like this:

ClientNo      DateEntered       First            Middle      Last
14768      8/23/2001      Amelia      B      Herrington
4141      9/2/1998            Amelia      B      Herrington
19048      11/21/2005      Andrea      R.      Holmes
4815      4/14/2003      Andrea      R.      Holmes
15588      11/16/2001      Bernetta      B      Anderson
16603      4/1/2002            Bernetta      B      Anderson
15545      11/16/2001      Bobbie      M      Thomas
15823      11/28/2001      Bobbie      M      Thomas
13642      4/3/2008        Carmen (Garrett)      K      Clay
28382      4/3/2008        Carmen (Garrett)      K      Clay

Just looking at this, I know that I can then write a loop in VBA (RBAR: Row By Agonizing Row) which would use logic to skip the oldest occurence, and delete all others. In the case of my database, there is usually activity on duplicate records, so dealing with this requires more effort.

To convert that query to your database, might look something like this:

SELECT ClientNo, DateEntered, [first name], [surname], [date of birth]
FROM tblClients
WHERE ((([first name]) In (SELECT [first name] FROM [tblClients] As Tmp GROUP BY [first name],[surname],[date of birth] HAVING Count(*)>1  And [surname] = [tblClients].[surname] And [Last] = [date of birth])))
ORDER BY [first name], [surname], [date of birth];

Give it a try.
Ok I have created a duplicates query using the wizard.  It shows me 3 records with all the same first name, surname and date of birth.  So is there now an easy way to automatically delete records 2 and 3 so i can just keep the original one?
Hmmm, of course,  as somebody else is suggesting, the better route is to PREVENT the duplicates from happening in the import... One way of doing this would be to create a composite primary key on the table that you're importing to, and then the duplicated records would just fail to import!
select the 2 and 3 records and hit the delete key
yeah doing it manually is not ideal and is what I knew how to do in the ifrst place so I am no further forward.
I would probably use VBA to intelligently remove those records... post your SQL so that I can get field names right, and I cold write you a bit of code to put into a command button on a form to remove those extra records... remember that we have to have access to the unique record id, and the field which tells us the date that the record was entered.
field names are:

[First Name]
[Surname]
[Date Of Birth]
and the table name? unique identifier? date added to the db?

If, by chance, you are not storing the date added, then there is no way to determine which was there first, unless, maybe, you could simply accept the one with the lower unique identifier.
try this, make a back up of your table first


delete A.*, A.ID
from Table1 AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
[Audition Applications] is the form name
[ID] is the unique field
Sweeeeeeet... I suspected there was a pure sql solution, just wasn't sure what it was!
so?...
Check out Capricorns possible solution! As he says, backup the database first.
nope doesn't like the syntax:

Private Sub Test_Click()
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
End Sub
Powerhousecomputing,

please take a good look at what you are doing...

what is the name of the table? is it [Audition Applications] ?



delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])
oops!

It still doesn't like it though - it's the first line.
and you don't run it like that, it is a delete query

copy and paste

delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])

to a SQL view of a query and click { ! }
Nope, you need to load that query into a new query, that's all, as opposed to running it directly from a button click....

Although, you might be able to execute it like this:

(Code not tested)
Private Sub Test_Click()
 Dim strSql as string
 strSql = "delete A.*, A.ID " & _
   "from [Audition Applications] AS A " & _
   "where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])"
 debug.pring strSql
 docmd.runsql strsql
End Sub

Open in new window

oh - I thought this was going into a button! sorry
will try.....
OK Capricorn1 it worked - fab - is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
Yup, sorry, a typo above: "debug.pring" should be ".print".
I have followed Capricorn1 and it works, but- is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
ASKER CERTIFIED SOLUTION
Avatar of Rey Obrero (Capricorn1)
Rey Obrero (Capricorn1)
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial