Powerhousecomputing
asked on
Duplicate Records Query
I need to run a query (I guess) to delete duplicate records in my table. This should be based on 3 fields being true [first name], [surname] and [date of birth]. Thus if more than 1 record exists that matches this same data then any later records should be deleted.
How do I do this?
How do I do this?
Try http://www.cryer.co.uk/brian/sql/sql_delete_duplicates.htm
ASKER
what is the exact syntax I need to use?
ASKER
this article is no good at all - I have one table and this info does not relate to what I am trying to do.
Have you tried the "find duplicates" wizard in Access? This will help you to visualize the duplicated data better... Once you get it to display the correct records, you can then view the sql and get a better understanding of how the syntax works.
You can't just delete all of the records produced by that query, though, or none of the records would be left behind!
Are all duplicated records 100% equal? In other words, are there maybe related records which may indicate activity on one record versus another?
Often, when deleting duplicated records, additional logic must be used to determine which record to leave behind.
Post some more info, though, and we should have no problem sorting this out!
You can't just delete all of the records produced by that query, though, or none of the records would be left behind!
Are all duplicated records 100% equal? In other words, are there maybe related records which may indicate activity on one record versus another?
Often, when deleting duplicated records, additional logic must be used to determine which record to leave behind.
Post some more info, though, and we should have no problem sorting this out!
ASKER
Thanks - but as I said in my question, I need to run a query to delete duplicate records in my table. This should be based on 3 fields being true [first name], [surname] and [date of birth]. Thus if more than 1 record exists that matches this same data then any later records should be deleted.
if you have a unique id field
select min([Id]),[first name], [surname], [date of birth]
from tablex
group by [first name], [surname], [date of birth]
if not, create a backup copy of table
add an autonumber field ( ID )
select min([Id]),[first name], [surname], [date of birth]
from tablex
group by [first name], [surname], [date of birth]
if not, create a backup copy of table
add an autonumber field ( ID )
you can create a new table without duplicates with
select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)
select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)
ASKER
sorry, very confused.....
I do have a unique ID in the table.
I do have a unique ID in the table.
what is the name of the field with unique ID? what is the name of the table?
ASKER
All I want to do is delete records that are duplicates of others - I have created a duplicates query using the wizard - can I turn that into a delete query and then use it to delete the duplicate records only?
you will be doing your self a favor if you will just create a new table * without duplicates* and delete the old table with duplicates
and this will do it for your
you can create a new table without duplicates with
select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)
and this will do it for your
you can create a new table without duplicates with
select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)
ASKER
I think this could too easily screw up the rest of of my DB - can I not do it the other way using a query?
ASKER
you see I will need to run this process frequently.
that is a query i posted above...
(I think this could too easily screw up the rest of of my DB) why do you think so??
create a backup copy of your db.
close your db, locate the db, right click on it and select copy and paste
(I think this could too easily screw up the rest of of my DB) why do you think so??
create a backup copy of your db.
close your db, locate the db, right click on it and select copy and paste
ASKER
I am really unsure how to do this
ASKER
I was hoping for an easy query I could use to run this process.
(you see I will need to run this process frequently.) what???
after you cleaned your table with the duplicates, prevent this from happening again..
by screening the data being inputted...
after you cleaned your table with the duplicates, prevent this from happening again..
by screening the data being inputted...
ASKER
I am importing data almost daily and I will need to check and remove duplicates daily.
The first thing you need to create is a query that helps you to see the duplicated records. Here is a query that I just created using the access "Find Duplicates" wizard, against an actual database of mine, which actually shows duplicated records... It is very similar to yours...
SELECT tblClients.ClientNo, tblClients.DateEntered, tblClients.First, tblClients.Middle, tblClients.Last
FROM tblClients
WHERE (((tblClients.First) In (SELECT [First] FROM [tblClients] As Tmp GROUP BY [First],[Middle],[Last] HAVING Count(*)>1 And [Middle] = [tblClients].[Middle] And [Last] = [tblClients].[Last])))
ORDER BY tblClients.First, tblClients.Middle, tblClients.Last;
Running this on my db produces results like this:
ClientNo DateEntered First Middle Last
14768 8/23/2001 Amelia B Herrington
4141 9/2/1998 Amelia B Herrington
19048 11/21/2005 Andrea R. Holmes
4815 4/14/2003 Andrea R. Holmes
15588 11/16/2001 Bernetta B Anderson
16603 4/1/2002 Bernetta B Anderson
15545 11/16/2001 Bobbie M Thomas
15823 11/28/2001 Bobbie M Thomas
13642 4/3/2008 Carmen (Garrett) K Clay
28382 4/3/2008 Carmen (Garrett) K Clay
Just looking at this, I know that I can then write a loop in VBA (RBAR: Row By Agonizing Row) which would use logic to skip the oldest occurence, and delete all others. In the case of my database, there is usually activity on duplicate records, so dealing with this requires more effort.
To convert that query to your database, might look something like this:
SELECT ClientNo, DateEntered, [first name], [surname], [date of birth]
FROM tblClients
WHERE ((([first name]) In (SELECT [first name] FROM [tblClients] As Tmp GROUP BY [first name],[surname],[date of birth] HAVING Count(*)>1 And [surname] = [tblClients].[surname] And [Last] = [date of birth])))
ORDER BY [first name], [surname], [date of birth];
Give it a try.
SELECT tblClients.ClientNo, tblClients.DateEntered, tblClients.First, tblClients.Middle, tblClients.Last
FROM tblClients
WHERE (((tblClients.First) In (SELECT [First] FROM [tblClients] As Tmp GROUP BY [First],[Middle],[Last] HAVING Count(*)>1 And [Middle] = [tblClients].[Middle] And [Last] = [tblClients].[Last])))
ORDER BY tblClients.First, tblClients.Middle, tblClients.Last;
Running this on my db produces results like this:
ClientNo DateEntered First Middle Last
14768 8/23/2001 Amelia B Herrington
4141 9/2/1998 Amelia B Herrington
19048 11/21/2005 Andrea R. Holmes
4815 4/14/2003 Andrea R. Holmes
15588 11/16/2001 Bernetta B Anderson
16603 4/1/2002 Bernetta B Anderson
15545 11/16/2001 Bobbie M Thomas
15823 11/28/2001 Bobbie M Thomas
13642 4/3/2008 Carmen (Garrett) K Clay
28382 4/3/2008 Carmen (Garrett) K Clay
Just looking at this, I know that I can then write a loop in VBA (RBAR: Row By Agonizing Row) which would use logic to skip the oldest occurence, and delete all others. In the case of my database, there is usually activity on duplicate records, so dealing with this requires more effort.
To convert that query to your database, might look something like this:
SELECT ClientNo, DateEntered, [first name], [surname], [date of birth]
FROM tblClients
WHERE ((([first name]) In (SELECT [first name] FROM [tblClients] As Tmp GROUP BY [first name],[surname],[date of birth] HAVING Count(*)>1 And [surname] = [tblClients].[surname] And [Last] = [date of birth])))
ORDER BY [first name], [surname], [date of birth];
Give it a try.
ASKER
Ok I have created a duplicates query using the wizard. It shows me 3 records with all the same first name, surname and date of birth. So is there now an easy way to automatically delete records 2 and 3 so i can just keep the original one?
Hmmm, of course, as somebody else is suggesting, the better route is to PREVENT the duplicates from happening in the import... One way of doing this would be to create a composite primary key on the table that you're importing to, and then the duplicated records would just fail to import!
select the 2 and 3 records and hit the delete key
ASKER
yeah doing it manually is not ideal and is what I knew how to do in the ifrst place so I am no further forward.
I would probably use VBA to intelligently remove those records... post your SQL so that I can get field names right, and I cold write you a bit of code to put into a command button on a form to remove those extra records... remember that we have to have access to the unique record id, and the field which tells us the date that the record was entered.
ASKER
field names are:
[First Name]
[Surname]
[Date Of Birth]
[First Name]
[Surname]
[Date Of Birth]
and the table name? unique identifier? date added to the db?
If, by chance, you are not storing the date added, then there is no way to determine which was there first, unless, maybe, you could simply accept the one with the lower unique identifier.
If, by chance, you are not storing the date added, then there is no way to determine which was there first, unless, maybe, you could simply accept the one with the lower unique identifier.
try this, make a back up of your table first
delete A.*, A.ID
from Table1 AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
delete A.*, A.ID
from Table1 AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
ASKER
[Audition Applications] is the form name
[ID] is the unique field
[ID] is the unique field
Sweeeeeeet... I suspected there was a pure sql solution, just wasn't sure what it was!
ASKER
so?...
Check out Capricorns possible solution! As he says, backup the database first.
ASKER
nope doesn't like the syntax:
Private Sub Test_Click()
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
End Sub
Private Sub Test_Click()
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
End Sub
Powerhousecomputing,
please take a good look at what you are doing...
what is the name of the table? is it [Audition Applications] ?
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])
please take a good look at what you are doing...
what is the name of the table? is it [Audition Applications] ?
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])
ASKER
oops!
It still doesn't like it though - it's the first line.
It still doesn't like it though - it's the first line.
and you don't run it like that, it is a delete query
copy and paste
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])
to a SQL view of a query and click { ! }
copy and paste
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])
to a SQL view of a query and click { ! }
Nope, you need to load that query into a new query, that's all, as opposed to running it directly from a button click....
Although, you might be able to execute it like this:
(Code not tested)
Although, you might be able to execute it like this:
(Code not tested)
Private Sub Test_Click()
Dim strSql as string
strSql = "delete A.*, A.ID " & _
"from [Audition Applications] AS A " & _
"where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])"
debug.pring strSql
docmd.runsql strsql
End Sub
ASKER
oh - I thought this was going into a button! sorry
will try.....
will try.....
ASKER
OK Capricorn1 it worked - fab - is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
Yup, sorry, a typo above: "debug.pring" should be ".print".
ASKER
I have followed Capricorn1 and it works, but- is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.