Removing duplicates access db

I have a client db with about 5000 duplicates records. I will query last name and address to find them. How do I go about removing the dupes however. Also, the record with the "customer id" field with the higher number is the one that needs to be kept, since it is the more recent record.

Can someone help?
LVL 1
greghollAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dale FyeCommented:
Be careful of doing this if the [CustomerID] is used anywhere else in the database, you must first identify what the correct [CustomerID] should be, then change all of the references to the incorrect one in all of the other tables, and then delete the older "duplicate" from your table.

I generally do this by creating another table to work with.  In your case, I would probably run a "Find Duplicates" query based on last name and address, then use that as the basis for a MakeTable query.  The new table (temporary) table should have [CustomerID], [LastName], [Address] fields and you should add a new field [ReplaceWith] and set the values to NULL for that field.

Then you need to fill in the [ReplaceWith] column in the "duplicate" records with the [CustomerID] of the most recent (your determination) value for each of those records.

Then you can run queries against all of the other tables and replace the [CustomerID] values with the [ReplaceWith] values from this temporary table.
0
AccessGuy1763Commented:
I cannot think of a way to do this through a single SQL statement, but that doesn't mean it's not possible that way.  I know this can be done using recordsets and VBA.  Obviously you need to be a bit careful here as a problem with the code could cause you to delete the wrong records.  If there's a way you can run it in a test environment first, please do that.  Now warnings aside, here is some "air code" for you:

Public Sub subDeleteCustomerDups()

   Dim strSQL as string
   Dim objCustomerRS as DAO.Recordset2
   Dim strPrevAddress as string
   Dim strPrevCustomerLastName as string

   'Construct SQL statement
   strSQL = "SELECT * FROM tblCustomers ORDER BY LastName, Address, CustomerID"

   'Instantiate objCustomerRS
   Set objCustomerRS = CurrentDB.OpenRecordset(strSQL, dbopendynaset, dbseechanges)

   'First Record
   objCustomerRS.MoveFirst

   'Loop through Customers
   do while objCustomerRS.EOF = False
	
      'Check to see if this is the same Customer as the previous record
      if (objCustomerRS.Fields("LastName")=strPrevCustomerLastName) AND _
         (objCustomerRS.Fields("Address")=strPrevAddress Then

         'Same Customer as Prev Record... delete the record
	 objCustomerRS.Delete

      Else

         'New Customer... store new values for Previous Last Name and Address
         strPrevAddress = objCustomerRS.Fields("Address")
         strPrevCustomerLastName = objCustomerRS.Fields("LastName")

      End If
      
      'Next Record
      objCustomerRS.MoveNext

   Loop
   
   'Close RS
   objCustomerRS.Close

   'Clean Up
   Set objCustomerRS = Nothing

End Sub

Open in new window


The idea is to have your recordset sorted in such a way that Dups appear right next to each other with the CustomerID in Ascending order.  This way, we'll always be on the "higher" CustomerID when the delete is performed because we're always deleting the "second" instance of the duplicate Customer.

Again, this is untested air code and I highly recommend testing it outside of your Production environment if at all possible.  Good Luck!
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Rey Obrero (Capricorn1)Commented:
you can do this using a delete query... do this in a copy of your db

delete *
from tableA
where CustomerID Not In(select max(t2.CustomerID), t2.[LastName], t2.[Address] From tableA as t2 group by t2.[LastName], t2.[Address])
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

greghollAuthor Commented:
thanks all. I will check these out.

Please note fyed - the customers table is the ONLY table. We're starting a new system from scratch and need the data cleaned up before import to the new system.

Question for capricorn1 and accessguy1763 - in some cases there many be more than one duplicate record. Is this going to cause an issue?
0
AccessGuy1763Commented:
I can't answer for Capricorn's solution.  However, my solution SHOULD delete every duplicate Customer except the "first" one it finds, which should be the lowest CustomerID due to the sorting used by the SQL statement.
0
Dale FyeCommented:
gregholl,

Since you said you want to keep the most recent (highestID) for each lastname/address combination, I think you need to change AccessGuy's SQL statement to:

 strSQL = "SELECT * FROM tblCustomers ORDER BY LastName, Address, CustomerID DESC"

It looks like Capricorn1's should take care of all of the duplicates for each lastname/address combination.

- Another point is are you certain that the "last", most recent, largest CustomerID contains all of the right information?  Although it might be time consuming, I would recommend you consider just using the Wizard to develop a FindDuplicates query on the LastName/Address combination, then add all of the other fields, and then review all of the records.

-Yet another point, what if you have two different people at the same address with the same last name, but different first names?  What if you have two generations living in the same house and you should actually retain data for both generations?  The solutions provided above will not allow this.

- Also, you need to be cognizant of the fact that last name and address fields can (and frequently are) misspelled.  If your data table is that large, I'll bet that you will encounter a number of those as well, which will not be captured by either of the methods mentioned above.
0
AccessGuy1763Commented:
WHOOPS!!

I misread the part about the CustomerID and thought you wanted to keep the lowest one... fyed is absolutely correct about the appropriate SQL statement.

Nice catch fyed!
0
greghollAuthor Commented:
Thanks fyed. There would be 10,000 records to go through. It's not the end of the world if some data gets missed or removed that shouldn't be

also, I realize that this is simply the first step and will be re-checking manually after the bulk of the dupes are removed.

accessguy: Where exactly do I enter in this code you've come up with.

Thanks ALL!!
0
Rey Obrero (Capricorn1)Commented:
gregholl,

did you try the query i posted?
0
AccessGuy1763Commented:
There's a number of ways you could implement it.  The way I wrote it above would be to  create a new, global module, paste the entirety of the code in there, and then either call subDeleteCustomerDups from the Immediate window or a command button.

Honestly though, if you're not real experienced in coding I'd probably recommend just creating a form, sticking a command button on it (skip the wizard), then right click the command button and select Build Event > Code Builder.  This will take you to the VBA Editor and place your cursor within the routine for that command button.  It would look something like this:

Private Sub cmdDeleteCustomerDups_Click()

End Sub

Open in new window


In the above example obviously I gave the command button the Name "cmdDeleteCustomerDups".  If you aren't going to go with the randomly generated command button name make sure to change the name before you create the Click Event.  Now paste everything from my first post inside of the above sample lines, excluding the "Public Sub subDeleteCustomerDups()" and "End Sub" lines.  These are replaced by the lines above which apply to the Click event of your command button.

Now open the form in standard view and click your button each time you want to try out the code.  Don't forget to add the "DESC" part to your SQL statement as described by fyed.
0
greghollAuthor Commented:
thanks to you all for such EXCELLENT feedback!
0
Dale FyeCommented:
I'm with capricorn1 on this one.  His 2 line SQL statement will do exactly what you asked for, and is by far the easiest method to accomplish what you asked for.
0
DatabaseMX (Joe Anderson - Microsoft Access MVP)Database Architect / Systems AnalystCommented:
@gregholl
Did you try the query SQL that Capricorn1 posted ?

mx
0
greghollAuthor Commented:
I awarded points to both.
0
AccessGuy1763Commented:
I agree with fyed.. capricorn's solution is much easier but I'm a programmer at heart and not great at constructing complex SQL statements like those so I went with what I knew :).
0
DatabaseMX (Joe Anderson - Microsoft Access MVP)Database Architect / Systems AnalystCommented:
Just noting that the SQL approach will be infinitely faster than the code approach.

mx
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.