Solved

Duplicate Records Query

Posted on 2008-10-11
42
1,339 Views
Last Modified: 2008-10-11
I need to run a query (I guess) to delete duplicate records in my table.  This should be based on 3 fields being true [first name], [surname] and [date of birth].  Thus if more than 1 record exists that matches this same data then any later records should be deleted.

How do I do this?
0
Comment
Question by:Powerhousecomputing
  • 20
  • 12
  • 9
  • +1
42 Comments
 
LVL 6

Expert Comment

by:hampus_b
Comment Utility
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
what is the exact syntax I need to use?
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
this article is no good at all - I have one table and this info does not relate to what I am trying to do.
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
Have you tried the "find duplicates" wizard in Access? This will help you to visualize the duplicated data better... Once you get it to display the correct records, you can then view the sql and get a better understanding of how the syntax works.

You can't just delete all of the records produced by that query, though, or none of the records would be left behind!

Are all duplicated records 100% equal? In other words, are there maybe related records which may indicate activity on one record versus another?

Often, when deleting duplicated records, additional logic must be used to determine which record to leave behind.

Post some more info, though, and we should have no problem sorting this out!
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
Thanks - but as I said in my question, I need to run a query to delete duplicate records in my table.  This should be based on 3 fields being true [first name], [surname] and [date of birth].  Thus if more than 1 record exists that matches this same data then any later records should be deleted.
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
if you have a unique id field

select min([Id]),[first name], [surname], [date of birth]
from tablex
group by [first name], [surname], [date of birth]

if not, create a backup copy of table

add an autonumber field ( ID )
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
you can create a new table without duplicates with

select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
sorry, very confused.....

I do have a unique ID in the table.
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
what is the name of the field with unique ID? what is the name of the table?

0
 

Author Comment

by:Powerhousecomputing
Comment Utility
All I want  to do is delete records that are duplicates of others - I have created a duplicates query using the wizard - can I turn that into a delete query and then use it to delete the duplicate records only?
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
you will be doing your self a favor if you will just create a new table * without duplicates* and delete the old table with duplicates
and this will do it for your


you can create a new table without duplicates with

select * into Newtable
from Oldtable
where [id] in
(
select min([id])
from oldtable
group by [first name], [surname], [date of birth]
)



0
 

Author Comment

by:Powerhousecomputing
Comment Utility
I think this could too easily screw up the rest of of my DB - can I not do it the other way using a query?
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
you see I will need to run this process frequently.
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
that is a query i posted above...

(I think this could too easily screw up the rest of of my DB) why do you think so??

create a backup copy of your db.

close your db,  locate the db, right click on it and select copy and paste
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
I am really unsure how to do this
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
I was hoping for an easy query I could use to run this process.
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
(you see I will need to run this process frequently.) what???

after you cleaned your table with the duplicates, prevent this from happening again..
by screening the data being inputted...
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
I am importing data almost daily and I will need to check and remove duplicates daily.
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
The first thing you need to create is a query that helps you to see the duplicated records. Here is a query that I just created using the access "Find Duplicates" wizard, against an actual database of mine, which actually shows duplicated records... It is very similar to yours...

SELECT tblClients.ClientNo, tblClients.DateEntered, tblClients.First, tblClients.Middle, tblClients.Last
FROM tblClients
WHERE (((tblClients.First) In (SELECT [First] FROM [tblClients] As Tmp GROUP BY [First],[Middle],[Last] HAVING Count(*)>1  And [Middle] = [tblClients].[Middle] And [Last] = [tblClients].[Last])))
ORDER BY tblClients.First, tblClients.Middle, tblClients.Last;


Running this on my db produces results like this:

ClientNo      DateEntered       First            Middle      Last
14768      8/23/2001      Amelia      B      Herrington
4141      9/2/1998            Amelia      B      Herrington
19048      11/21/2005      Andrea      R.      Holmes
4815      4/14/2003      Andrea      R.      Holmes
15588      11/16/2001      Bernetta      B      Anderson
16603      4/1/2002            Bernetta      B      Anderson
15545      11/16/2001      Bobbie      M      Thomas
15823      11/28/2001      Bobbie      M      Thomas
13642      4/3/2008        Carmen (Garrett)      K      Clay
28382      4/3/2008        Carmen (Garrett)      K      Clay

Just looking at this, I know that I can then write a loop in VBA (RBAR: Row By Agonizing Row) which would use logic to skip the oldest occurence, and delete all others. In the case of my database, there is usually activity on duplicate records, so dealing with this requires more effort.

To convert that query to your database, might look something like this:

SELECT ClientNo, DateEntered, [first name], [surname], [date of birth]
FROM tblClients
WHERE ((([first name]) In (SELECT [first name] FROM [tblClients] As Tmp GROUP BY [first name],[surname],[date of birth] HAVING Count(*)>1  And [surname] = [tblClients].[surname] And [Last] = [date of birth])))
ORDER BY [first name], [surname], [date of birth];

Give it a try.
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
Ok I have created a duplicates query using the wizard.  It shows me 3 records with all the same first name, surname and date of birth.  So is there now an easy way to automatically delete records 2 and 3 so i can just keep the original one?
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
Hmmm, of course,  as somebody else is suggesting, the better route is to PREVENT the duplicates from happening in the import... One way of doing this would be to create a composite primary key on the table that you're importing to, and then the duplicated records would just fail to import!
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
select the 2 and 3 records and hit the delete key
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
yeah doing it manually is not ideal and is what I knew how to do in the ifrst place so I am no further forward.
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
I would probably use VBA to intelligently remove those records... post your SQL so that I can get field names right, and I cold write you a bit of code to put into a command button on a form to remove those extra records... remember that we have to have access to the unique record id, and the field which tells us the date that the record was entered.
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
field names are:

[First Name]
[Surname]
[Date Of Birth]
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
and the table name? unique identifier? date added to the db?

If, by chance, you are not storing the date added, then there is no way to determine which was there first, unless, maybe, you could simply accept the one with the lower unique identifier.
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
try this, make a back up of your table first


delete A.*, A.ID
from Table1 AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
[Audition Applications] is the form name
[ID] is the unique field
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
Sweeeeeeet... I suspected there was a pure sql solution, just wasn't sure what it was!
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
so?...
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility


see my post at http:#a22694541
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
Check out Capricorns possible solution! As he says, backup the database first.
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
nope doesn't like the syntax:

Private Sub Test_Click()
delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])
End Sub
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
Powerhousecomputing,

please take a good look at what you are doing...

what is the name of the table? is it [Audition Applications] ?



delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
oops!

It still doesn't like it though - it's the first line.
0
 
LVL 119

Expert Comment

by:Rey Obrero
Comment Utility
and you don't run it like that, it is a delete query

copy and paste

delete A.*, A.ID
from [Audition Applications] AS A
where (A.ID) Not In (select min([ID]) from [Audition Applications] group by [first name], [surname], [date of birth])

to a SQL view of a query and click { ! }
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
Nope, you need to load that query into a new query, that's all, as opposed to running it directly from a button click....

Although, you might be able to execute it like this:

(Code not tested)
Private Sub Test_Click()

 Dim strSql as string

 strSql = "delete A.*, A.ID " & _

   "from [Audition Applications] AS A " & _

   "where (A.ID) Not In (select min([ID]) from Table1 group by [first name], [surname], [date of birth])"

 debug.pring strSql

 docmd.runsql strsql

End Sub

Open in new window

0
 

Author Comment

by:Powerhousecomputing
Comment Utility
oh - I thought this was going into a button! sorry
will try.....
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
OK Capricorn1 it worked - fab - is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
0
 
LVL 3

Expert Comment

by:CoastalData
Comment Utility
Yup, sorry, a typo above: "debug.pring" should be ".print".
0
 

Author Comment

by:Powerhousecomputing
Comment Utility
I have followed Capricorn1 and it works, but- is it possible to view what it is going to delete first just to make sure it is not going to doing anything horrid?
0
 
LVL 119

Accepted Solution

by:
Rey Obrero earned 500 total points
Comment Utility
just change "delete" to "select" and saved as another query
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

The first two articles in this short series — Using a Criteria Form to Filter Records (http://www.experts-exchange.com/A_6069.html) and Building a Custom Filter (http://www.experts-exchange.com/A_6070.html) — discuss in some detail how a form can be…
Introduction The Visual Basic for Applications (VBA) language is at the heart of every application that you write. It is your key to taking Access beyond the world of wizards into a world where anything is possible. This article introduces you to…
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
In Microsoft Access, learn how to “cascade” or have the displayed data of one combo control depend upon what’s entered in another. Base the dependent combo on a query for its row source: Add a reference to the first combo on the form as criteria i…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now