nainil
asked on
Find Duplicate Values in Excel using VBA
I am trying to find duplicate values in an excel sheet from multiple columns (via VBA / Macros)
It works for the most part, however, there are a few ambiguities when I am doing a comparison between numeric values. Numbers need to be compared literally rather than the actual value.
Attached is an excel sheet.
Any help is appreciated to resolve the inconsistency.
Duplicate.xlsm
It works for the most part, however, there are a few ambiguities when I am doing a comparison between numeric values. Numbers need to be compared literally rather than the actual value.
Attached is an excel sheet.
1.
Rows 2 and 4 are correctly identified as duplicates.2.
Rows 6 and 9 are WRONGLY identified as duplicates.3.
Rows 7 and 8 are CORRECTLY identified as NOT duplicates. When we have a some value in Column D (which may or may not contain any values)4.
Rows 10 and 11 are WRONGLY identified as duplicates inspite of having unique values (atleast on display)Any help is appreciated to resolve the inconsistency.
Duplicate.xlsm
Sorry but what are you comparing when you say something is or isn't a duplicate? And what do you mean when you say "compared literally rather than the actual value"?
If I understand what you are looking at then it is a matter of the cell formation in Column H. Since it is formated as general then it considers 6 and 9 to be the same regardless of the .000 at the end of 9. The same is true of 10 and 11 with the .300 vs. .3000.
Is this what you are meaning?
Is this what you are meaning?
ASKER
Yes formatting is probably causing things to break.
.300 is different from .3000 (alteast visually). The same should hold true while comparing.
.300 is different from .3000 (alteast visually). The same should hold true while comparing.
Yes it is visually different however numerically speaking it is the same. I think the only way to get the distincition between the two is to alter the format in column H (I think that is what your formula/code is looking at) to text.
ASKER
The problem is that people will paste the data into the columns from external sources. How can we enforce that the data in the column is set to TEXT?
You can compare the Text values which will reflect what you see and not the underlying value
Range("H6").Text
Range("H6").Text
@MartinLiss
Thank you, for some reason I was drawing a blank on that one. I think because I was thinking about the compare of row 6 and row 9. In $H$9 it seems to have dropped the .00 so the comparison will still return true.
Thank you, for some reason I was drawing a blank on that one. I think because I was thinking about the compare of row 6 and row 9. In $H$9 it seems to have dropped the .00 so the comparison will still return true.
I'm sorry to have to ask but do you mean that I helped or do you still have a problem?
ASKER
I still have issues. My aim is to filter out the duplicate values and notify users. If I have all alphabets, it is working like a charm. However, numbers are creating a problem here due to the column formatting.
If you use the Text value as I tried to show in my last post it will ignore the formatting.
So if A1 shows 1.2345 and A2 shows 1.23 only because of the number format (they are both 1.2345 behind the scenes)
If Range("A1").Text = Range("A2").Text Then
will be evaluated as False.
So if A1 shows 1.2345 and A2 shows 1.23 only because of the number format (they are both 1.2345 behind the scenes)
If Range("A1").Text = Range("A2").Text Then
will be evaluated as False.
ASKER
is it possible to work that out in my code please? I am a bit lost here...
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks, let me try this out.
ASKER
It works great. Except for one small issue. If you can check the attached file, the entry in row 6 & 9 is termed as Duplicate when it should not.
Can you please help?
Duplicate.xlsm
Can you please help?
Duplicate.xlsm
That is because cell $b$9 is formated as number (2 decimal) instead of text like the rest of column B so the .00 is automatically truncated when combined with $d$9 resulting in a match in $h$9 with $h$6.
@nainikl: Is the formatting of $b$9 that pony10us pointed out always going to be different than the rest of the column?
ASKER
There is a possibility that the data can be formatted differently. our vendors will paste data from multiple sources (which we have no control over). So, i am really looking for a better handling of inconsistencies.
If we can restrict pasting of data in a certain format, even better.
If we can restrict pasting of data in a certain format, even better.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I just realized, the column B will NOT ALWAYS be a number. It can be text or alpha - numeric as well. How can we control that?
See my last post.
ASKER
Great, that helps. Once last help.
Currently, we are highlighting / marking just the first occurrence of the duplicate values. Is it possible to include each and every duplicate value with the color?
There will be alteast 2000 rows which will need to be parsed. So, in that case, it will be helpful to see ANY AND ALL rows which are duplicated highlighted.
Currently, we are highlighting / marking just the first occurrence of the duplicate values. Is it possible to include each and every duplicate value with the color?
There will be alteast 2000 rows which will need to be parsed. So, in that case, it will be helpful to see ANY AND ALL rows which are duplicated highlighted.
ASKER
Any help please?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.