Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

How to delete thousands of duplicate records in access?

Posted on 2014-01-24
7
Medium Priority
?
956 Views
Last Modified: 2014-01-25
Dear All,

There has to be an easier way to do this aside from doing it manually.

See, I have a table with like 150k records in it and there are tonnes of duplicates that will take days to delete and filter manually.

Wondering if someone can come up with a re-useable routine that will allow me to delete large quantities of duplicate records in varying tables. I am comfortable with copying and pasting vba and modifying variables to suit a certain situation if this helps.

I've enclosed a sample of the problem in the db attachment.

Any help appreciated.

Thanks.
Database2.accdb
0
Comment
Question by:discogs
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 48

Expert Comment

by:Dale Fye
ID: 39807732
cannot download your database here, but a couple of questions.

1.  how do you determine duplicates (1 field, 2 fields, ...)?

2.  Once you have identified duplicates, how do you determine which one to keep?

One option is to use this criteria to Select the records you want to keep and push them to another, temporary, table.  Then delete all of the records from that table and insert the ones you saved into the temporary table back into your table.
0
 
LVL 7

Accepted Solution

by:
Christopher Martinez earned 2000 total points
ID: 39807745
Hey bud! What i would suggest is using the "Find Duplicate" wizard and then using the Append Query wizard!

There is a great article on this over at Techrepublic.com that i wont try to beat with my explination but i believe this "how to" is what you need...even has nifty screenshots! Ive used this many times to clean up my duplicates.

http://www.techrepublic.com/article/eliminate-duplicate-records-with-this-built-in-access-query/
0
 

Author Comment

by:discogs
ID: 39807761
Thanks for the answers guys. @ fyed, since you were the first to respond, I have the following answers for your question.

1. Fields - pkCountryCode, pkLocation, pkPortCode;
2. It does not matter which record is kept as the records are identical across all fields.

Hope this helps.

Ta
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:discogs
ID: 39807762
Would it help if I cut down the file size?
0
 

Author Closing Comment

by:discogs
ID: 39807885
Cheers.
0
 
LVL 75
ID: 39808268
Seems that fyed answered the question first.
0
 
LVL 48

Expert Comment

by:Dale Fye
ID: 39808540
Thanks for the vote of confidence, Joe.  

The technique described in the article was not  quite the same technique as what I was recommending.  It was a great article with very clear instructions and visuals.  

My problem with duplicates is that they are generally not exact duplicates (as in the article), they may have the same street address, but don't have the same name, or phone number, or ...  and the challenge is determining which "duplicate" to get rid of.

In instances like this, the "keeper" record for each duplicate may be the one with the most recent [date modified], or simply the one with the largest autonumber ID (counter).  Those are generally easy to handle, you simply create a query that selects the earliest [data modified] or smallest [ID] from among each of the "duplicates".  I didn't have time to write this yesterday, so I will now for the sake of those that follow.

Assume myTable contains duplicate records based upon the Bolo_ID and docDate fields, and that the records you want to keep are the ones with the largest ID (primary key, counter/autonumber) field in the table.  Start out with the query to identify the unique records (based on Bolo_ID and docDate) within the dataset, and the maximum ID (counter/autonumber) value within each group.

SELECT Max([ID]) as KeepThisID
FROM myTable
GROUP By Bolo_ID, docDate

Then, you simply join this query back to the original table and copy the matching records into a temporary table.

SELECT myTable.* INTO myTable_Temp
FROM myTable
INNER JOIN (
SELECT Max([ID]) as KeepThisID
FROM myTable
GROUP By Bolo_ID, docDate) as KeepThese
ON myTable.ID = KeepThese.KeepThisID

If only it were this easy all of the time!
0

Featured Post

Nothing ever in the clear!

This technical paper will help you implement VMware’s VM encryption as well as implement Veeam encryption which together will achieve the nothing ever in the clear goal. If a bad guy steals VMs, backups or traffic they get nothing.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes two methods for creating a combo box that can be used to add new items to the row source -- one for simple lookup tables, and one for a more complex row source where the new item needs data for several fields.
You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are …
In Microsoft Access, when working with VBA, learn some techniques for writing readable and easily maintained code.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

597 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question