rye004
asked on
How do I do de-duplication in Excel 2013, but have the ability to filter on the first occurrence?
I have an excel sheet with approximately 200K of rows in it. I have a single column that I want to de-duplicate against. When I use the default “duplicate” conditional filter in Excel 2013, it highlights all rows which are duplicates.
What I am trying to do is filter out all duplicates, however I want to identify the “first occurrence”. Hopefully this makes sense.
Does anyone know how I can go about this?
Many Thanks
What I am trying to do is filter out all duplicates, however I want to identify the “first occurrence”. Hopefully this makes sense.
Does anyone know how I can go about this?
Many Thanks
There is no option under Conditional Formatting to achieve what you require.
An alternative is to add a column that will show the duplicate status of each cell. In that column, apply the following formula and replicate down:
This assumes that:
Column A is the one you are searching for duplicates, if not, change all occurrences of A in the formula to the required column letter.
The data starts at row 1, if not, replace all occurrences of 1 in the formula to the row number where the data starts.
An alternative is to add a column that will show the duplicate status of each cell. In that column, apply the following formula and replicate down:
=IF(COUNTIF(A:A,A1)=1,"Not duplicated","Duplicate number " & COUNTIF(A$1:A1,A1))
This assumes that:
Column A is the one you are searching for duplicates, if not, change all occurrences of A in the formula to the required column letter.
The data starts at row 1, if not, replace all occurrences of 1 in the formula to the row number where the data starts.
Actually, there is a way that you could use Conditional Formatting.
Apply CF such that duplicates are highlighted in a particular colour. Would probably use the formula option and a count like the suggestions above within the CF.
You could then use AutoFilter and Filter on colour, excluding the duplicate colour.
Thanks
Rob H
Apply CF such that duplicates are highlighted in a particular colour. Would probably use the formula option and a count like the suggestions above within the CF.
You could then use AutoFilter and Filter on colour, excluding the duplicate colour.
Thanks
Rob H
ASKER
Glenn,
To answer your question, I am looking for your 1st option.
After I put up the posting, I did do something similar to what you suggested. However, instead of using countif, I used a vlookup. Below is an example of what I did:
=IF(LEN(IFERROR(VLOOKUP([@ [MD5 - Static]], $N$3:N3, 1, FALSE), "")) > 1, "Yes", "No")
It took over an hour to run this in Excel 2013 64 bit. I am assuming your method would be faster since it uses 2 excel functions instead of the 4 that I used.
Can you please let me know your thoughts?
Many Thanks.
To answer your question, I am looking for your 1st option.
After I put up the posting, I did do something similar to what you suggested. However, instead of using countif, I used a vlookup. Below is an example of what I did:
=IF(LEN(IFERROR(VLOOKUP([@
It took over an hour to run this in Excel 2013 64 bit. I am assuming your method would be faster since it uses 2 excel functions instead of the 4 that I used.
Can you please let me know your thoughts?
Many Thanks.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you for your help with this. Tweaking this to use less excel formulas did improve on performance time.
* filter out duplicate values after the first occurrence, or
* remove duplicate values after the first occurrence
If the 2nd option, you can use the Remove Duplicates function (Menu: Data tab, Data Tools section, Remove Duplicates).
If the 1st option, you'll need to add a helper column to help identify the subsequent duplicate records and then filter out those. For example, if your data is in column A and starts on row 2, add this formula in an adjacent column on row 2 and copy down:
=IF(COUNTIF($A$2:A2,A2)>1,
Then filter on blank cells in this new column. Only the unique or first-occurrence values will display.
There's another option - Advanced Filter - that will let you copy a list of the unique values in that column to another location.
Let us know which method you need and we'll help out.
-Glenn