Link to home
Start Free TrialLog in
Avatar of nainil
nainilFlag for United States of America

asked on

Find Duplicate Values in Excel using VBA

I am trying to find duplicate values in an excel sheet from multiple columns (via VBA / Macros)

It works for the most part, however, there are a few ambiguities when I am doing a comparison between numeric values. Numbers need to be compared literally rather than the actual value.

Attached is an excel sheet.

1.

Rows 2 and 4 are correctly identified as duplicates.

2.

Rows 6 and 9 are WRONGLY identified as duplicates.

3.

Rows 7 and 8 are CORRECTLY identified as NOT duplicates. When we have a some value in Column D (which may or may not contain any values)

4.

Rows 10 and 11 are WRONGLY identified as duplicates inspite of having unique values (atleast on display)
Any help is appreciated to resolve the inconsistency.
Duplicate.xlsm
Avatar of Martin Liss
Martin Liss
Flag of United States of America image

Sorry but what are you comparing when you say something is or isn't a duplicate? And what do you mean when you say "compared literally rather than the actual value"?
If I understand what you are looking at then it is a matter of the cell formation in Column H. Since it is formated as general then it considers 6 and 9 to be the same regardless of the .000 at the end of 9. The same is true of 10 and 11 with the .300 vs. .3000.

Is this what you are meaning?
Avatar of nainil

ASKER

Yes formatting is probably causing things to break.

.300 is different from .3000 (alteast visually). The same should hold true while comparing.
Yes it is visually different however numerically speaking it is the same.  I think the only way to get the distincition between the two is to alter the format in column H (I think that is what your formula/code is looking at) to text.
Avatar of nainil

ASKER

The problem is that people will paste the data into the columns from external sources. How can we enforce that the data in the column is set to TEXT?
You can compare the Text values which will reflect what you see and not the underlying value

Range("H6").Text
@MartinLiss

Thank you, for some reason I was drawing a blank on that one. I think because I was thinking about the compare of row 6 and row 9. In $H$9 it seems to have dropped the .00 so the comparison will still return true.
I'm sorry to have to ask but do you mean that I helped or do you still have a problem?
Avatar of nainil

ASKER

I still have issues. My aim is to filter out the duplicate values and notify users. If I have all alphabets, it is working like a charm. However, numbers are creating a problem here due to the column formatting.
If you use the Text value as I tried to show in my last post it will ignore the formatting.

So if A1 shows 1.2345 and A2 shows 1.23 only because of the number format (they are both 1.2345 behind the scenes)

If Range("A1").Text = Range("A2").Text Then

will be evaluated as False.
Avatar of nainil

ASKER

is it possible to work that out in my code please? I am a bit lost here...
SOLUTION
Avatar of Martin Liss
Martin Liss
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nainil

ASKER

Thanks, let me try this out.
Avatar of nainil

ASKER

It works great. Except for one small issue. If you can check the attached file, the entry in row 6 & 9 is termed as Duplicate when it should not.

Can you please help?
Duplicate.xlsm
That is because cell $b$9 is formated as number (2 decimal) instead of text like the rest of column B so the .00 is automatically truncated when combined with $d$9 resulting in a match in $h$9 with $h$6.
@nainikl: Is the formatting of $b$9 that pony10us pointed out always going to be different than the rest of the column?
Avatar of nainil

ASKER

There is a possibility that the data can be formatted differently. our vendors will paste data from multiple sources (which we have no control over). So, i am really looking for a better handling of inconsistencies.

If we can restrict pasting of data in a certain format, even better.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of nainil

ASKER

I just realized, the column B will NOT ALWAYS be a number. It can be text or alpha - numeric as well. How can we control that?
See my last post.
Avatar of nainil

ASKER

Great, that helps. Once last help.

Currently, we are highlighting / marking just the first occurrence of the duplicate values. Is it possible to include each and every duplicate value with the color?

There will be alteast 2000 rows which will need to be parsed. So, in that case, it will be helpful to see ANY AND ALL rows which are duplicated highlighted.
Avatar of nainil

ASKER

Any help please?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial