[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1380
  • Last Modified:

Duplicate Records Query

I need to run a query (I guess) to delete duplicate records in my table.  This should be based on 3 fields being true [first name], [surname] and [date of birth].  Thus if more than 1 record exists that matches this same data then any later records should be deleted.

How do I do this?
0
Powerhousecomputing
Asked:
Powerhousecomputing
  • 20
  • 12
  • 9
  • +1
1 Solution
 
PowerhousecomputingAuthor Commented:
what is the exact syntax I need to use?
0
 
PowerhousecomputingAuthor Commented:
this article is no good at all - I have one table and this info does not relate to what I am trying to do.
0
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

 
CoastalDataCommented:
Have you tried the "find duplicates" wizard in Access? This will help you to visualize the duplicated data better... Once you get it to display the correct records, you can then view the sql and get a better understanding of how the syntax works.

You can't just delete all of the records produced by that query, though, or none of the records would be left behind!

Are all duplicated records 100% equal? In other words, are there maybe related records which may indicate activity on one record versus another?

Often, when deleting duplicated records, additional logic must be used to determine which record to leave behind.

Post some more info, though, and we should have no problem sorting this out!
0
 
PowerhousecomputingAuthor Commented:
Thanks - but as I said in my question, I need to run a query to delete duplicate records in my table.  This should be based on 3 fields being true [first name], [surname] and [date of birth].  Thus if more than 1 record exists that matches this same data then any later records should be deleted.
0
 
Rey Obrero (Capricorn1)Commented:
if you have a unique id field

select min([Id]),[first name], [surname], [date of birth]
from tablex
group by [first name], [surname], [date of birth]

if not, create a backup copy of table

add an autonumber field ( ID )
0
 
Rey Obrero (Capricorn1)Commented:
you can create a new table without duplicates with

select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)
0
 
PowerhousecomputingAuthor Commented:
sorry, very confused.....

I do have a unique ID in the table.
0
 
Rey Obrero (Capricorn1)Commented:
what is the name of the field with unique ID? what is the name of the table?

0
 
PowerhousecomputingAuthor Commented:
All I want  to do is delete records that are duplicates of others - I have created a duplicates query using the wizard - can I turn that into a delete query and then use it to delete the duplicate records only?
0
 
Rey Obrero (Capricorn1)Commented:
you will be doing your self a favor if you will just create a new table * without duplicates* and delete the old table with duplicates
and this will do it for your


you can create a new table without duplicates with

select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)



0
 
PowerhousecomputingAuthor Commented:
I think this could too easily screw up the rest of of my DB - can I not do it the other way using a query?
0
 
PowerhousecomputingAuthor Commented:
you see I will need to run this process frequently.
0
 
Rey Obrero (Capricorn1)Commented:
that is a query i posted above...

(I think this could too easily screw up the rest of of my DB) why do you think so??

create a backup copy of your db.

close your db,  locate the db, right click on it and select copy and paste
0
 
PowerhousecomputingAuthor Commented:
I am really unsure how to do this
0
 
PowerhousecomputingAuthor Commented:
I was hoping for an easy query I could use to run this process.
0
 
Rey Obrero (Capricorn1)Commented:
(you see I will need to run this process frequently.) what???

after you cleaned your table with the duplicates, prevent this from happening again..
by screening the data being inputted...
0
 
PowerhousecomputingAuthor Commented:
I am importing data almost daily and I will need to check and remove duplicates daily.
0
 
CoastalDataCommented:
The first thing you need to create is a query that helps you to see the duplicated records. Here is a query that I just created using the access "Find Duplicates" wizard, against an actual database of mine, which actually shows duplicated records... It is very similar to yours...

SELECT tblClients.ClientNo, tblClients.DateEntered, tblClients.First, tblClients.Middle, tblClients.Last
FROM tblClients
WHERE (((tblClients.First) In (SELECT [First] FROM [tblClients] As Tmp GROUP BY [First],[Middle],[Last] HAVING Count(*)>1  And [Middle] = [tblClients].[Middle] And [Last] = [tblClients].[Last])))
ORDER BY tblClients.First, tblClients.Middle, tblClients.Last;


Running this on my db produces results like this:

ClientNo      DateEntered       First            Middle      Last
14768      8/23/2001      Amelia      B      Herrington
4141      9/2/1998            Amelia      B      Herrington
19048      11/21/2005      Andrea      R.      Holmes
4815      4/14/2003      Andrea      R.      Holmes
15588      11/16/2001      Bernetta      B      Anderson
16603      4/1/2002            Bernetta      B      Anderson
15545      11/16/2001      Bobbie      M      Thomas
15823      11/28/2001      Bobbie      M      Thomas
13642      4/3/2008        Carmen (Garrett)      K      Clay
28382      4/3/2008        Carmen (Garrett)      K      Clay

Just looking at this, I know that I can then write a loop in VBA (RBAR: Row By Agonizing Row) which would use logic to skip the oldest occurence, and delete all others. In the case of my database, there is usually activity on duplicate records, so dealing with this requires more effort.

To convert that query to your database, might look something like this:

SELECT ClientNo, DateEntered, [first name], [surname], [date of birth]
FROM tblClients
WHERE ((([first name]) In (SELECT [first name] FROM [tblClients] As Tmp GROUP BY [first name],[surname],[date of birth] HAVING Count(*)>1  And [surname] = [tblClients].[surname] And [Last] = [date of birth])))
ORDER BY [first name], [surname], [date of birth];

Give it a try.
0
 
PowerhousecomputingAuthor Commented:
Ok I have created a duplicates query using the wizard.  It shows me 3 records with all the same first name, surname and date of birth.  So is there now an easy way to automatically delete records 2 and 3 so i can just keep the original one?
0
 
CoastalDataCommented:
Hmmm, of course,  as somebody else is suggesting, the better route is to PREVENT the duplicates from happening in the import... One way of doing this would be to create a composite primary key on the table that you're importing to, and then the duplicated records would just fail to import!
0
 
Rey Obrero (Capricorn1)Commented:
select the 2 and 3 records and hit the delete key
0
 
PowerhousecomputingAuthor Commented:
yeah doing it manually is not ideal and is what I knew how to do in the ifrst place so I am no further forward.
0
 
CoastalDataCommented:
I would probably use VBA to intelligently remove those records... post your SQL so that I can get field names right, and I cold write you a bit of code to put into a command button on a form to remove those extra records... remember that we have to have access to the unique record id, and the field which tells us the date that the record was entered.
0
 
PowerhousecomputingAuthor Commented:
field names are:

[First Name]
[Surname]
[Date Of Birth]
0
 
CoastalDataCommented:
and the table name? unique identifier? date added to the db?

If, by chance, you are not storing the date added, then there is no way to determine which was there first, unless, maybe, you could simply accept the one with the lower unique identifier.
0
 
Rey Obrero (Capricorn1)Commented:
try this, make a back up of your table first


delete A.*, A.ID
from Table1 AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
0
 
PowerhousecomputingAuthor Commented:
[Audition Applications] is the form name
[ID] is the unique field
0
 
CoastalDataCommented:
Sweeeeeeet... I suspected there was a pure sql solution, just wasn't sure what it was!
0
 
PowerhousecomputingAuthor Commented:
so?...
0
 
Rey Obrero (Capricorn1)Commented:


see my post at http:#a22694541
0
 
CoastalDataCommented:
Check out Capricorns possible solution! As he says, backup the database first.
0
 
PowerhousecomputingAuthor Commented:
nope doesn't like the syntax:

Private Sub Test_Click()
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
End Sub
0
 
Rey Obrero (Capricorn1)Commented:
Powerhousecomputing,

please take a good look at what you are doing...

what is the name of the table? is it [Audition Applications] ?



delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])
0
 
PowerhousecomputingAuthor Commented:
oops!

It still doesn't like it though - it's the first line.
0
 
Rey Obrero (Capricorn1)Commented:
and you don't run it like that, it is a delete query

copy and paste

delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])

to a SQL view of a query and click { ! }
0
 
CoastalDataCommented:
Nope, you need to load that query into a new query, that's all, as opposed to running it directly from a button click....

Although, you might be able to execute it like this:

(Code not tested)
Private Sub Test_Click()
 Dim strSql as string
 strSql = "delete A.*, A.ID " & _
   "from [Audition Applications] AS A " & _
   "where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])"
 debug.pring strSql
 docmd.runsql strsql
End Sub

Open in new window

0
 
PowerhousecomputingAuthor Commented:
oh - I thought this was going into a button! sorry
will try.....
0
 
PowerhousecomputingAuthor Commented:
OK Capricorn1 it worked - fab - is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
0
 
CoastalDataCommented:
Yup, sorry, a typo above: "debug.pring" should be ".print".
0
 
PowerhousecomputingAuthor Commented:
I have followed Capricorn1 and it works, but- is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
0
 
Rey Obrero (Capricorn1)Commented:
just change "delete" to "select" and saved as another query
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 20
  • 12
  • 9
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now