Link to home
Start Free TrialLog in
Avatar of FREDARCE
FREDARCE

asked on

How to find duplicate combinations in Excel

Hi,  I have a spreadsheet that contains multiple duplicate combinations of items.  I would like to remove the duplicates but only based on a duplicate combination of the items between 2 separate columns:

example of original:

A             B        
test1      test2
test3      test4
test2      test1

In my case I consider row 1 and row3 to be duplicate because I don't care about the column order in which they appear.  I am more conscerned with eliminating duplicate combinations.  (i.e.-  (test1,test2 is the same as test2,test1)).  Therefore my desired result after removing the duplicate combinations should be the following:

A            B
test1     test2
test3     test4


or


A           B
test3     test4
test2     test1


Is there a formula i can apply to a worksheet that can delete or highlight these duplicate combinations between two separate columns?

thanks
Avatar of Professor J
Professor J

first sort your data  and then select both columns then go to DATA tab and then select remove duplicates  then select only column A i mean the one based on which you want to delete duplicate.

for sorting with macro use below code.  if you want formula solution then in another column put the formula =IFERROR(VLOOKUP(B1,A:A,1,0),"") and then copy and paste as value and then go to data and follow as abovementioned.

for sorting use the below macro  .

see attached file.

Sub test()
    Dim a, i As Long, ii As Long, w, x, n As Long
    With Range("a3").CurrentRegion
        a = .Value
        .ClearContents
        With CreateObject("Scripting.Dictionary")
            For i = 1 To UBound(a, 1)
                If a(i, 1) <> "" Then
                    If Not .exists(a(i, 1)) Then
                        ReDim w(1 To UBound(a, 2))
                        w(1) = a(i, 1): .Item(a(i, 1)) = w
                    End If
                End If
            Next
            For i = 1 To UBound(a, 1)
                If a(i, 2) <> "" Then
                    If Not .exists(a(i, 2)) Then
                        ReDim w(1 To UBound(a, 2))
                    Else
                        w = .Item(a(i, 2))
                    End If
                    For ii = 2 To UBound(a, 2)
                        w(ii) = a(i, ii)
                    Next
                    .Item(a(i, 2)) = w
                End If
            Next
            x = Application.Transpose(Application.Transpose(.items))
            n = .Count
        End With
        .Resize(n).Value = x
    End With
End Sub

Open in new window

EE.xlsm
please also see attached version solution with formula.
EE.xlsm
Or use this formula in C2, and copy down.
=IF(COUNTIFS($A$1:A1,A2,$B$1:B1,B2)+COUNTIFS($B$1:B1,A2,$A$1:A1,B2)>0,"Duplicate","")
Use filter for Duplicates in column C and delete.
Finally delete column C.

If many rows, copy the formulas, and paste special as values.
Sort on column C, and delete the duplicates, and then delete column C.
Avatar of FREDARCE

ASKER

I tried this but did not get the desired result.  All it did was sort the data within each column and removed duplicates within  each column.
can you upload a sample data? how did it not work?
I am attaching the file.  The columns I want to find and remove duplicates for are in  columns A and L

thanks
matrix.xlsx
ASKER CERTIFIED SOLUTION
Avatar of Ejgil Hedegaard
Ejgil Hedegaard
Flag of Denmark image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you Ejgil

Did you have to do anything different? from the steps you proposed previously?
Same principle using column A and L instead of column A and B.

But the header lines should not be removed.
So skip the compare if column Q is Comment added to the formula.