We help IT Professionals succeed at work.

Select distinct records based on ID regardless of other field differences

I am trying to create a query in Access 2016 that will remove duplicate entries from an access table, based off 1 field and regardless of if the other fields are the same or different. I have tried to use different query methods including nested and I can't seem to get it quite right.

Here is an example of the database as well as the results I hope to achieve. There are multiple types of records, single (keep all), duplicate ID with the same name (keep the newest), and duplicate ID with a different name (keep the newest).

Existing Database
ID      FirstName  LastName  Date
111AAA  John       Snow      1/1/2020
111BBB  Arya       Stark     1/1/2020
111BBB  Arya       Stark     1/1/2019
111CCC  Eddard     Stark     1/1/2020
111CCC  Sansa      Stark     1/1/2019

Open in new window


End Result
ID      FirstName  LastName  Date
111AAA  John       Snow      1/1/2020
111BBB  Arya       Stark     1/1/2020
111CCC  Eddard     Stark     1/1/2020

Open in new window

Comment
Watch Question

crystal (strive4peace) - Microsoft MVP, AccessRemote Training and Programming
Top Expert 2015

Commented:
hi Aludaan,

you will need a unique field in the table. Is there an AutoNumber? If not, you'll need to make one. What kind of data is this? It is good for the primary key to be logical ... what you have labeled as ID is TEXT and is not unique so, pardon me, but not a good name either.

Once we know the name of a unique field, we can help you with the SQL to remove duplicate records for what you're calling ID

kind regards,
crystal
Distinguished Expert 2017

Commented:

The simplest way to remove duplicates is to rebuild the table with the appropriate unique index.  Select the rows from the existing table, sort them ascending by ID and descending by date and append them to the new empty table.

Software Team Lead
Commented:

I don't have MS Access with me right now to test, but something like below should work for you


Select a.* from yourTable as a
inner join
(
    Select ID, max(YourDate) YourDate
    from yourTable
    Group By ID
) as b on a.ID = b.ID and a.YourDate = b.YourDate

IF the Date field is unique per user

Distinguished Expert 2017

Commented:

That selects the records he wants but doesn't delete the records he doesn't want.

Ryan ChongSoftware Team Lead

Commented:

Ok, missed out the requirement that have to remove the duplicate entries . Tks for highlighting


the solution will be to generate a unique identifier field, which both Crystal and Pat already mentioned.

Mark EdwardsChief Technology Officer

Commented:
OK, there's a few things to consider:
The [ID] and [Date] columns are the determining factors here, and let's try to do this without having to add an autonumber column, so
Using a query...

IF you have only ONE record with any combination of [ID] and [Date], then the [ID] and [Date] TOGETHER (concatenated) make a unique record id for the records.  We will call this our "Concat" query and our new column the "RecID" column.

NOTE:  IF you have MORE THAN ONE record with a particular [ID] and [DATE] combination, THEN you'll need to add additional column(s) until you create a unique record id, however, you will need to determine WHICH record with the unique [ID] and Max [Date] values will be the one kept.  For the sake of this exercise, we will assume that there is only ONE record with a particular [ID] and Max [Date] values.

Using another query as a Summation query, calculate the Max([Date]) for each unique value of [ID].  These are the only records you want - the ones with these [ID] and [Date] values.  This will be our "Filter" query.

Now create a 3rd query by linking the "Concat" query to the filter query by [ID] and [Date].  This will give you the [RecID] column for just those records we want filtered for the [ID] and Max([Date]) values.  Let's call this query our "Keep" query.

We will now use our "Keep" query for the IN() function in our DELETE query.  (In Access, you can't create a DELETE query with joins).
So our query will look something like this:
DELETE * FROM [Table] WHERE NOT (([ID] & [Date]) IN (SELECT [RecID] FROM [Keep]))

Open in new window

This will delete the unwanted records.

I've attached a working example using your data.  Keep in mind that this does NOT take into account any ID or Date column with missing values, etc.
Keep-ID-and-MaxDate.accdb
Mark EdwardsChief Technology Officer

Commented:
p.s.  If you have more than one record for a particular [ID] and [Date] combination, you'll have to keep them, or do some more thinking as to what additional filtering logic you want to use.  

If it's the "last one added to the table", then adding an autonumber column to the table will do that as the autonumber will give you the "ordinal position" of the records (order appended to the table).

Commented:
From the results you showed when eliminating duplicates the only fields that differed when having the same ID was Date.  Is this always how the duplication occurs?  And are we correct in assuming the most recent year (Date) is the record to keep?

Author

Commented:
Thanks for all the answers.

When I said remove it was bad wording on my part. I just wanted to select the table and get all the records without the duplicates keeping the newest one, I didn't mean to permanently delete any records. I apologize for anyone who worked on my issue thinking delete.

There is no unique identifier, they don't have a table key (bad on them) so it allows them to create duplicates, I couldn't even guarantee the date won't match but from what I see it doesn't seem to.

Also, I did want to keep the newest one.

Based off that, the query that Ryan provided probably is what I need. I ran a quick test and it looks correct. I will do a better test when I return to work on Monday and can run larger tests.
Select DISTINCT a.* from yourTable as a
inner join
(
    Select ID, max(YourDate) YourDate
    from yourTable
    Group By ID
) as b on a.ID = b.ID and a.YourDate = b.YourDate

DISTINCT will eliminate duplicates.

Author

Commented:
Ryan's query was the one I will end up using. The good news is they are looking at fixing their database table so hopefully this won't even be necessary for too long.

Commented:
Thanks for the points, Aludaan..