Solved

Find Duplicate Values in Excel using VBA

Posted on 2012-04-02
23
793 Views
Last Modified: 2012-04-16
I am trying to find duplicate values in an excel sheet from multiple columns (via VBA / Macros)

It works for the most part, however, there are a few ambiguities when I am doing a comparison between numeric values. Numbers need to be compared literally rather than the actual value.

Attached is an excel sheet.

1.

Rows 2 and 4 are correctly identified as duplicates.

2.

Rows 6 and 9 are WRONGLY identified as duplicates.

3.

Rows 7 and 8 are CORRECTLY identified as NOT duplicates. When we have a some value in Column D (which may or may not contain any values)

4.

Rows 10 and 11 are WRONGLY identified as duplicates inspite of having unique values (atleast on display)
Any help is appreciated to resolve the inconsistency.
Duplicate.xlsm
0
Comment
Question by:nainil
  • 10
  • 9
  • 4
23 Comments
 
LVL 45

Expert Comment

by:Martin Liss
ID: 37796767
Sorry but what are you comparing when you say something is or isn't a duplicate? And what do you mean when you say "compared literally rather than the actual value"?
0
 
LVL 26

Expert Comment

by:pony10us
ID: 37796875
If I understand what you are looking at then it is a matter of the cell formation in Column H. Since it is formated as general then it considers 6 and 9 to be the same regardless of the .000 at the end of 9. The same is true of 10 and 11 with the .300 vs. .3000.

Is this what you are meaning?
0
 

Author Comment

by:nainil
ID: 37797193
Yes formatting is probably causing things to break.

.300 is different from .3000 (alteast visually). The same should hold true while comparing.
0
 
LVL 26

Expert Comment

by:pony10us
ID: 37797392
Yes it is visually different however numerically speaking it is the same.  I think the only way to get the distincition between the two is to alter the format in column H (I think that is what your formula/code is looking at) to text.
0
 

Author Comment

by:nainil
ID: 37797428
The problem is that people will paste the data into the columns from external sources. How can we enforce that the data in the column is set to TEXT?
0
 
LVL 45

Expert Comment

by:Martin Liss
ID: 37797435
You can compare the Text values which will reflect what you see and not the underlying value

Range("H6").Text
0
 
LVL 26

Expert Comment

by:pony10us
ID: 37797470
@MartinLiss

Thank you, for some reason I was drawing a blank on that one. I think because I was thinking about the compare of row 6 and row 9. In $H$9 it seems to have dropped the .00 so the comparison will still return true.
0
 
LVL 45

Expert Comment

by:Martin Liss
ID: 37797482
I'm sorry to have to ask but do you mean that I helped or do you still have a problem?
0
 

Author Comment

by:nainil
ID: 37797577
I still have issues. My aim is to filter out the duplicate values and notify users. If I have all alphabets, it is working like a charm. However, numbers are creating a problem here due to the column formatting.
0
 
LVL 45

Expert Comment

by:Martin Liss
ID: 37797643
If you use the Text value as I tried to show in my last post it will ignore the formatting.

So if A1 shows 1.2345 and A2 shows 1.23 only because of the number format (they are both 1.2345 behind the scenes)

If Range("A1").Text = Range("A2").Text Then

will be evaluated as False.
0
 

Author Comment

by:nainil
ID: 37798195
is it possible to work that out in my code please? I am a bit lost here...
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 45

Assisted Solution

by:Martin Liss
Martin Liss earned 100 total points
ID: 37798431
Sub ValidateDuplicateCodes()

Dim RowCount As Integer
Dim RowCounter As Integer
Dim DupRow As Variant
Dim DuplicateErrorDetails As Variant
Dim r As Range
Dim i As Long
Dim DuplicateErrorCount As Variant

DuplicateErrorCount = 0
DuplicateErrorDetails = ""

    ActiveSheet.Range("K4").ClearContents
    ActiveSheet.Range("K4").ClearComments
'MsgBox ("in ValidateDuplicateLabCodes")

RowCount = ActiveSheet.Range("B:B").End(xlDown).Row - 1
'MsgBox (RowCount)
   
'    For RowCounter = 2 To RowCount + 1
   
'        Range("A" & RowCounter).Interior.ColorIndex = WHITE
'        Range("I" & RowCounter).Value = ""
       
        '''' =COUNTIF($AA$2:$AA$65536,AA2)
        'Range("I" & RowCounter).Formula = "=COUNTIF($H$2:$H$65536,H" & RowCounter & ")"
       
        Set r = Range("H1").End(xlDown).Offset(0, 0)
       
        For RowCounter = 2 To r.Row
            Range("A" & RowCounter).Interior.ColorIndex = WHITE
            Range("I" & RowCounter).Value = ""
            For i = RowCounter + 1 To r.Row
                If Range("H" & RowCounter).Text = Range("H" & i).Text Then
                    DupRow = DupRow + 1
                End If
            Next
            If DupRow >= 1 Then
                DuplicateErrorCount = DuplicateErrorCount + 1
                Range("A" & RowCounter).Interior.ColorIndex = 31
                DuplicateErrorDetails = DuplicateErrorDetails & "Duplicate values found in row " & RowCounter & vbLf
                DupRow = 0
            End If
        Next
       ' DupRow = Range("I" & RowCounter).Value
'        MsgBox ("Value at AB" & RowCounter & " : " & DupRow)
'        If DupRow > 1 Then
'            DuplicateErrorCount = DuplicateErrorCount + 1
'            Range("A" & RowCounter).Interior.ColorIndex = 31
'            DuplicateErrorDetails = DuplicateErrorDetails & "Duplicate values found in row " & RowCounter & vbLf
'        End If
       
'        DupRow = 0
'        Range("I" & RowCounter).Value = ""
   
'    Next RowCounter
   
    If (DuplicateErrorDetails = "") Then
            DuplicateErrorDetails = "No Error"
    End If

    ActiveSheet.Range("K4").HorizontalAlignment = xlCenter
    ActiveSheet.Range("K4").Value = DuplicateErrorCount
    ActiveSheet.Range("K4").AddComment(DuplicateErrorDetails).Shape.TextFrame.AutoSize = True

DuplicateErrorCount = 0
DuplicateErrorDetails = ""


End Sub
0
 

Author Comment

by:nainil
ID: 37800718
Thanks, let me try this out.
0
 

Author Comment

by:nainil
ID: 37801768
It works great. Except for one small issue. If you can check the attached file, the entry in row 6 & 9 is termed as Duplicate when it should not.

Can you please help?
Duplicate.xlsm
0
 
LVL 26

Expert Comment

by:pony10us
ID: 37801816
That is because cell $b$9 is formated as number (2 decimal) instead of text like the rest of column B so the .00 is automatically truncated when combined with $d$9 resulting in a match in $h$9 with $h$6.
0
 
LVL 45

Expert Comment

by:Martin Liss
ID: 37802344
@nainikl: Is the formatting of $b$9 that pony10us pointed out always going to be different than the rest of the column?
0
 

Author Comment

by:nainil
ID: 37802506
There is a possibility that the data can be formatted differently. our vendors will paste data from multiple sources (which we have no control over). So, i am really looking for a better handling of inconsistencies.

If we can restrict pasting of data in a certain format, even better.
0
 
LVL 45

Assisted Solution

by:Martin Liss
Martin Liss earned 100 total points
ID: 37802581
Add this to the code

 How about adding


        Columns("B:B").Select
        Selection.NumberFormat = "@"

just above the following line?

        Set r = Range("H1").End(xlDown).Offset(0, 0)
0
 

Author Comment

by:nainil
ID: 37802894
I just realized, the column B will NOT ALWAYS be a number. It can be text or alpha - numeric as well. How can we control that?
0
 
LVL 45

Expert Comment

by:Martin Liss
ID: 37802898
See my last post.
0
 

Author Comment

by:nainil
ID: 37803449
Great, that helps. Once last help.

Currently, we are highlighting / marking just the first occurrence of the duplicate values. Is it possible to include each and every duplicate value with the color?

There will be alteast 2000 rows which will need to be parsed. So, in that case, it will be helpful to see ANY AND ALL rows which are duplicated highlighted.
0
 

Author Comment

by:nainil
ID: 37815988
Any help please?
0
 
LVL 45

Accepted Solution

by:
Martin Liss earned 100 total points
ID: 37816234
Sub ValidateDuplicateCodes()

Dim RowCount As Integer
Dim RowCounter As Integer
Dim DupRow As Variant
Dim DuplicateErrorDetails As Variant
Dim r As Range
Dim i As Long
Dim DuplicateErrorCount As Variant

DuplicateErrorCount = 0
DuplicateErrorDetails = ""

ActiveSheet.Range("K4").ClearContents
ActiveSheet.Range("K4").ClearComments

RowCount = ActiveSheet.Range("B:B").End(xlDown).Row - 1
Columns("B:B").Select
Selection.NumberFormat = "@"
Set r = Range("H1").End(xlDown).Offset(0, 0)
For RowCounter = 2 To r.Row
    Range("A" & RowCounter).Interior.ColorIndex = WHITE
Next
For RowCounter = 2 To r.Row
    Range("I" & RowCounter).Value = ""
    For i = RowCounter + 1 To r.Row
        If Range("H" & RowCounter).Text = Range("H" & i).Text Then
            DupRow = DupRow + 1
            Range("A" & i).Interior.ColorIndex = 31
        End If
    Next
    If DupRow >= 1 Then
        DuplicateErrorCount = DuplicateErrorCount + 1
'        Range("A" & RowCounter).Interior.ColorIndex = 31
        DuplicateErrorDetails = DuplicateErrorDetails & "Duplicate values found in row " & RowCounter & vbLf
        DupRow = 0
    End If
Next
   
If (DuplicateErrorDetails = "") Then
        DuplicateErrorDetails = "No Error"
End If

ActiveSheet.Range("K4").HorizontalAlignment = xlCenter
ActiveSheet.Range("K4").Value = DuplicateErrorCount
ActiveSheet.Range("K4").AddComment(DuplicateErrorDetails).Shape.TextFrame.AutoSize = True

DuplicateErrorCount = 0
DuplicateErrorDetails = ""


End Sub
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now