Solved

Duplicate Records Query

Posted on 2008-10-11
42
1,348 Views
Last Modified: 2008-10-11
I need to run a query (I guess) to delete duplicate records in my table.  This should be based on 3 fields being true [first name], [surname] and [date of birth].  Thus if more than 1 record exists that matches this same data then any later records should be deleted.

How do I do this?
0
Comment
Question by:Powerhousecomputing
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 20
  • 12
  • 9
  • +1
42 Comments
 
LVL 6

Expert Comment

by:hampus_b
ID: 22693481
0
 

Author Comment

by:Powerhousecomputing
ID: 22693761
what is the exact syntax I need to use?
0
 

Author Comment

by:Powerhousecomputing
ID: 22693779
this article is no good at all - I have one table and this info does not relate to what I am trying to do.
0
Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

 
LVL 3

Expert Comment

by:CoastalData
ID: 22693828
Have you tried the "find duplicates" wizard in Access? This will help you to visualize the duplicated data better... Once you get it to display the correct records, you can then view the sql and get a better understanding of how the syntax works.

You can't just delete all of the records produced by that query, though, or none of the records would be left behind!

Are all duplicated records 100% equal? In other words, are there maybe related records which may indicate activity on one record versus another?

Often, when deleting duplicated records, additional logic must be used to determine which record to leave behind.

Post some more info, though, and we should have no problem sorting this out!
0
 

Author Comment

by:Powerhousecomputing
ID: 22693838
Thanks - but as I said in my question, I need to run a query to delete duplicate records in my table.  This should be based on 3 fields being true [first name], [surname] and [date of birth].  Thus if more than 1 record exists that matches this same data then any later records should be deleted.
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22693951
if you have a unique id field

select min([Id]),[first name], [surname], [date of birth]
from tablex
group by [first name], [surname], [date of birth]

if not, create a backup copy of table

add an autonumber field ( ID )
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22693964
you can create a new table without duplicates with

select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)
0
 

Author Comment

by:Powerhousecomputing
ID: 22694260
sorry, very confused.....

I do have a unique ID in the table.
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694296
what is the name of the field with unique ID? what is the name of the table?

0
 

Author Comment

by:Powerhousecomputing
ID: 22694303
All I want  to do is delete records that are duplicates of others - I have created a duplicates query using the wizard - can I turn that into a delete query and then use it to delete the duplicate records only?
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694342
you will be doing your self a favor if you will just create a new table * without duplicates* and delete the old table with duplicates
and this will do it for your


you can create a new table without duplicates with

select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)



0
 

Author Comment

by:Powerhousecomputing
ID: 22694413
I think this could too easily screw up the rest of of my DB - can I not do it the other way using a query?
0
 

Author Comment

by:Powerhousecomputing
ID: 22694427
you see I will need to run this process frequently.
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694434
that is a query i posted above...

(I think this could too easily screw up the rest of of my DB) why do you think so??

create a backup copy of your db.

close your db,  locate the db, right click on it and select copy and paste
0
 

Author Comment

by:Powerhousecomputing
ID: 22694441
I am really unsure how to do this
0
 

Author Comment

by:Powerhousecomputing
ID: 22694444
I was hoping for an easy query I could use to run this process.
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694446
(you see I will need to run this process frequently.) what???

after you cleaned your table with the duplicates, prevent this from happening again..
by screening the data being inputted...
0
 

Author Comment

by:Powerhousecomputing
ID: 22694451
I am importing data almost daily and I will need to check and remove duplicates daily.
0
 
LVL 3

Expert Comment

by:CoastalData
ID: 22694497
The first thing you need to create is a query that helps you to see the duplicated records. Here is a query that I just created using the access "Find Duplicates" wizard, against an actual database of mine, which actually shows duplicated records... It is very similar to yours...

SELECT tblClients.ClientNo, tblClients.DateEntered, tblClients.First, tblClients.Middle, tblClients.Last
FROM tblClients
WHERE (((tblClients.First) In (SELECT [First] FROM [tblClients] As Tmp GROUP BY [First],[Middle],[Last] HAVING Count(*)>1  And [Middle] = [tblClients].[Middle] And [Last] = [tblClients].[Last])))
ORDER BY tblClients.First, tblClients.Middle, tblClients.Last;


Running this on my db produces results like this:

ClientNo      DateEntered       First            Middle      Last
14768      8/23/2001      Amelia      B      Herrington
4141      9/2/1998            Amelia      B      Herrington
19048      11/21/2005      Andrea      R.      Holmes
4815      4/14/2003      Andrea      R.      Holmes
15588      11/16/2001      Bernetta      B      Anderson
16603      4/1/2002            Bernetta      B      Anderson
15545      11/16/2001      Bobbie      M      Thomas
15823      11/28/2001      Bobbie      M      Thomas
13642      4/3/2008        Carmen (Garrett)      K      Clay
28382      4/3/2008        Carmen (Garrett)      K      Clay

Just looking at this, I know that I can then write a loop in VBA (RBAR: Row By Agonizing Row) which would use logic to skip the oldest occurence, and delete all others. In the case of my database, there is usually activity on duplicate records, so dealing with this requires more effort.

To convert that query to your database, might look something like this:

SELECT ClientNo, DateEntered, [first name], [surname], [date of birth]
FROM tblClients
WHERE ((([first name]) In (SELECT [first name] FROM [tblClients] As Tmp GROUP BY [first name],[surname],[date of birth] HAVING Count(*)>1  And [surname] = [tblClients].[surname] And [Last] = [date of birth])))
ORDER BY [first name], [surname], [date of birth];

Give it a try.
0
 

Author Comment

by:Powerhousecomputing
ID: 22694504
Ok I have created a duplicates query using the wizard.  It shows me 3 records with all the same first name, surname and date of birth.  So is there now an easy way to automatically delete records 2 and 3 so i can just keep the original one?
0
 
LVL 3

Expert Comment

by:CoastalData
ID: 22694520
Hmmm, of course,  as somebody else is suggesting, the better route is to PREVENT the duplicates from happening in the import... One way of doing this would be to create a composite primary key on the table that you're importing to, and then the duplicated records would just fail to import!
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694521
select the 2 and 3 records and hit the delete key
0
 

Author Comment

by:Powerhousecomputing
ID: 22694525
yeah doing it manually is not ideal and is what I knew how to do in the ifrst place so I am no further forward.
0
 
LVL 3

Expert Comment

by:CoastalData
ID: 22694524
I would probably use VBA to intelligently remove those records... post your SQL so that I can get field names right, and I cold write you a bit of code to put into a command button on a form to remove those extra records... remember that we have to have access to the unique record id, and the field which tells us the date that the record was entered.
0
 

Author Comment

by:Powerhousecomputing
ID: 22694527
field names are:

[First Name]
[Surname]
[Date Of Birth]
0
 
LVL 3

Expert Comment

by:CoastalData
ID: 22694535
and the table name? unique identifier? date added to the db?

If, by chance, you are not storing the date added, then there is no way to determine which was there first, unless, maybe, you could simply accept the one with the lower unique identifier.
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694541
try this, make a back up of your table first


delete A.*, A.ID
from Table1 AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
0
 

Author Comment

by:Powerhousecomputing
ID: 22694548
[Audition Applications] is the form name
[ID] is the unique field
0
 
LVL 3

Expert Comment

by:CoastalData
ID: 22694551
Sweeeeeeet... I suspected there was a pure sql solution, just wasn't sure what it was!
0
 

Author Comment

by:Powerhousecomputing
ID: 22694558
so?...
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694557


see my post at http:#a22694541
0
 
LVL 3

Expert Comment

by:CoastalData
ID: 22694561
Check out Capricorns possible solution! As he says, backup the database first.
0
 

Author Comment

by:Powerhousecomputing
ID: 22694568
nope doesn't like the syntax:

Private Sub Test_Click()
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
End Sub
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694578
Powerhousecomputing,

please take a good look at what you are doing...

what is the name of the table? is it [Audition Applications] ?



delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])
0
 

Author Comment

by:Powerhousecomputing
ID: 22694585
oops!

It still doesn't like it though - it's the first line.
0
 
LVL 120

Expert Comment

by:Rey Obrero (Capricorn1)
ID: 22694583
and you don't run it like that, it is a delete query

copy and paste

delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])

to a SQL view of a query and click { ! }
0
 
LVL 3

Expert Comment

by:CoastalData
ID: 22694604
Nope, you need to load that query into a new query, that's all, as opposed to running it directly from a button click....

Although, you might be able to execute it like this:

(Code not tested)
Private Sub Test_Click()
 Dim strSql as string
 strSql = "delete A.*, A.ID " & _
   "from [Audition Applications] AS A " & _
   "where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])"
 debug.pring strSql
 docmd.runsql strsql
End Sub

Open in new window

0
 

Author Comment

by:Powerhousecomputing
ID: 22694606
oh - I thought this was going into a button! sorry
will try.....
0
 

Author Comment

by:Powerhousecomputing
ID: 22694610
OK Capricorn1 it worked - fab - is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
0
 
LVL 3

Expert Comment

by:CoastalData
ID: 22694612
Yup, sorry, a typo above: "debug.pring" should be ".print".
0
 

Author Comment

by:Powerhousecomputing
ID: 22694619
I have followed Capricorn1 and it works, but- is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
0
 
LVL 120

Accepted Solution

by:
Rey Obrero (Capricorn1) earned 500 total points
ID: 22694621
just change "delete" to "select" and saved as another query
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes two methods for creating a combo box that can be used to add new items to the row source -- one for simple lookup tables, and one for a more complex row source where the new item needs data for several fields.
In Part II of this series, I will discuss how to identify all open instances of Excel and enumerate the workbooks, spreadsheets, and named ranges within each of those instances.
What’s inside an Access Desktop Database. Will look at the basic interface, Navigation Pane (Database Container), Tables, Queries, Forms, Report, Macro’s, and VBA code.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question