rye004

asked on

# How do I do de-duplication in Excel 2013, but have the ability to filter on the first occurrence?

I have an excel sheet with approximately 200K of rows in it. I have a single column that I want to de-duplicate against. When I use the default “duplicate” conditional filter in Excel 2013, it highlights all rows which are duplicates.

What I am trying to do is filter out all duplicates, however I want to identify the “first occurrence”. Hopefully this makes sense.

Does anyone know how I can go about this?

Many Thanks

What I am trying to do is filter out all duplicates, however I want to identify the “first occurrence”. Hopefully this makes sense.

Does anyone know how I can go about this?

Many Thanks

There is no option under Conditional Formatting to achieve what you require.

An alternative is to add a column that will show the duplicate status of each cell. In that column, apply the following formula and replicate down:

This assumes that:

Column A is the one you are searching for duplicates, if not, change all occurrences of A in the formula to the required column letter.

The data starts at row 1, if not, replace all occurrences of 1 in the formula to the row number where the data starts.

An alternative is to add a column that will show the duplicate status of each cell. In that column, apply the following formula and replicate down:

`=IF(COUNTIF(A:A,A1)=1,"Not duplicated","Duplicate number " & COUNTIF(A$1:A1,A1))`

This assumes that:

Column A is the one you are searching for duplicates, if not, change all occurrences of A in the formula to the required column letter.

The data starts at row 1, if not, replace all occurrences of 1 in the formula to the row number where the data starts.

Actually, there is a way that you could use Conditional Formatting.

Apply CF such that duplicates are highlighted in a particular colour. Would probably use the formula option and a count like the suggestions above within the CF.

You could then use AutoFilter and Filter on colour, excluding the duplicate colour.

Thanks

Rob H

Apply CF such that duplicates are highlighted in a particular colour. Would probably use the formula option and a count like the suggestions above within the CF.

You could then use AutoFilter and Filter on colour, excluding the duplicate colour.

Thanks

Rob H

ASKER

Glenn,

To answer your question, I am looking for your 1st option.

After I put up the posting, I did do something similar to what you suggested. However, instead of using countif, I used a vlookup. Below is an example of what I did:

=IF(LEN(IFERROR(VLOOKUP([@[MD5 - Static]], $N$3:N3, 1, FALSE), "")) > 1, "Yes", "No")

It took over an hour to run this in Excel 2013 64 bit. I am assuming your method would be faster since it uses 2 excel functions instead of the 4 that I used.

Can you please let me know your thoughts?

Many Thanks.

To answer your question, I am looking for your 1st option.

After I put up the posting, I did do something similar to what you suggested. However, instead of using countif, I used a vlookup. Below is an example of what I did:

=IF(LEN(IFERROR(VLOOKUP([@

It took over an hour to run this in Excel 2013 64 bit. I am assuming your method would be faster since it uses 2 excel functions instead of the 4 that I used.

Can you please let me know your thoughts?

Many Thanks.

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

Thank you for your help with this. Tweaking this to use less excel formulas did improve on performance time.

* filter out duplicate values after the first occurrence, or

* remove duplicate values after the first occurrence

If the 2nd option, you can use the Remove Duplicates function (Menu: Data tab, Data Tools section, Remove Duplicates).

If the 1st option, you'll need to add a helper column to help identify the subsequent duplicate records and then filter out those. For example, if your data is in column A and starts on row 2, add this formula in an adjacent column on row 2 and copy down:

=IF(COUNTIF($A$2:A2,A2)>1,"Duplicate","") Then filter on blank cells in this new column. Only the unique or first-occurrence values will display.

There's another option - Advanced Filter - that will let you copy a list of the unique values in that column to another location.

Let us know which method you need and we'll help out.

-Glenn