Solved

How do I do de-duplication in Excel 2013, but have the ability to filter on the first occurrence?

Posted on 2014-11-06
6
107 Views
Last Modified: 2014-11-10
I have an excel sheet with approximately 200K of rows in it.  I have a single column that I want to de-duplicate against.  When I use the default “duplicate” conditional filter in Excel 2013, it highlights all rows which are duplicates.

What I am trying to do is filter out all duplicates, however I want to identify the “first occurrence”.  Hopefully this makes sense.

Does anyone know how I can go about this?

Many Thanks
0
Comment
Question by:rye004
6 Comments
 
LVL 27

Expert Comment

by:Glenn Ray
ID: 40426884
Do you want to:
* filter out duplicate values after the first occurrence, or
* remove duplicate values after the first occurrence

If the 2nd option, you can use the Remove Duplicates function (Menu:  Data tab, Data Tools section, Remove Duplicates).

If the 1st option, you'll need to add a helper column to help identify the subsequent duplicate records and then filter out those.  For example, if your data is in column A and starts on row 2, add this formula in an adjacent column on row 2 and copy down:
=IF(COUNTIF($A$2:A2,A2)>1,"Duplicate","")

Then filter on blank cells in this new column.  Only the unique or first-occurrence values will display.

There's another option - Advanced Filter - that will let you copy a list of the unique values in that column to another location.


Let us know which method you need and we'll help out.

-Glenn
0
 
LVL 4

Expert Comment

by:AnthonyHamon
ID: 40426926
There is no option under Conditional Formatting to achieve what you require.

An alternative is to add a column that will show the duplicate status of each cell.  In that column, apply the following formula and replicate down:
=IF(COUNTIF(A:A,A1)=1,"Not duplicated","Duplicate number " & COUNTIF(A$1:A1,A1))

Open in new window


This assumes that:
Column A is the one you are searching for duplicates, if not, change all occurrences of A in the formula to the required column letter.
The data starts at row 1, if not, replace all occurrences of 1 in the formula to the row number where the data starts.
0
 
LVL 32

Expert Comment

by:Rob Henson
ID: 40428206
Actually, there is a way that you could use Conditional Formatting.

Apply CF such that duplicates are highlighted in a particular colour. Would probably use the formula option and a count like the suggestions above within the CF.

You could then use AutoFilter and Filter on colour, excluding the duplicate colour.

Thanks
Rob H
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:rye004
ID: 40429234
Glenn,

To answer your question, I am looking for your 1st option.

After I put up the posting, I did do something similar to what you suggested.  However, instead of using countif, I used a vlookup.  Below is an example of what I did:

=IF(LEN(IFERROR(VLOOKUP([@[MD5 - Static]], $N$3:N3, 1, FALSE), "")) > 1, "Yes", "No")

It took over an hour to run this in Excel 2013 64 bit.  I am assuming your method would be faster since it uses 2 excel functions instead of the 4 that I used.

Can you please let me know your thoughts?

Many Thanks.
0
 
LVL 27

Accepted Solution

by:
Glenn Ray earned 500 total points
ID: 40429342
I tried replicating a table with a field titled "MD5 - Static" in column N and the results all returned "Yes", so maybe I'm missing something in your requirement.

If you're looking for the duplicate values in the MD5 - Static column, and the table data begins in row 3 (as it appears from your example) then add this formula in an adjacent column on row 3 also and copy down (if you add this to a new Excel Table column, it should automatically copy it down for you):
=IF(COUNTIF($N$3:N3,N3)>1,"Duplicate","")

I'm only testing on a small table - less than 50 rows - but it's very fast.

EDIT:  I just tried this on a table with randomly-generated values on 20,000 rows and it takes about 4 seconds to process. See attached example.

-Glenn
EE-28552267.xlsx
0
 

Author Closing Comment

by:rye004
ID: 40433130
Thank you for your help with this.  Tweaking this to use less excel formulas did improve on performance time.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
DBF to ... Converter 5 43
Excel 2016 - Black cell borders 11 27
ms office troubleshooting for users 8 36
Incorporate VBA Code to work with Original Workbook 23 35
A little background as to how I came to I design this code: Around 5 years ago I designed an add-in that formatted Excel files to a corporate standard, applying different cell colours and font type depending on whether the cells contained inputs,…
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial will demonstrate in Microsoft Excel how to add style and sexy appeal to horizontal bar charts.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now