Link to home
Start Free TrialLog in
Avatar of Glenn Ray
Glenn RayFlag for United States of America

asked on

Use RemoveDuplicates in VBA with any size range

Is it possible to recode the RemoveDuplicates statement to work with any contiguous range of data, regardless of the number of columns?  All columns would need to be applied to the test array.

The following example was lifted from a recorded macro.  I can determine the active range programmatically and replace that ("$A$1:$CP$269195"), but don't know how to assign the array to pick up every column, regardless of column width.


Sub Remove_Duplicates()
    'This needs to be generalized to work with any size data set, regardless of number of columns
    ActiveSheet.Range("$A$1:$CP$269195").RemoveDuplicates Columns:=Array(1, 2, 3, 4, _
        5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, _
        33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, _
        59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, _
        85, 86, 87, 88, 89, 90, 91, 92, 93, 94), Header:=xlYes
End Sub

Open in new window

Avatar of Norie
Norie

The array is optional, if it's not included Excel/VBA assumes you mean all columns.

Mind you, the mention of column widths makes me wonder if there's more to what you want to do, eg hidden columns.

Is there?
Maybe this macro which I found on YouTube posted by Jie Jenn will meet your requirements?
 
Sub Remove_Duplicates()
    
    Dim LR As Long, i As Long
    LR = Range("A" & Rows.Count).End(xlUp).Row
    For i = LR To 2 Step -1
        If WorksheetFunction.CountIf(Columns("A"), Range("A" & i).Value) > 1 Then Rows(i).Delete
    Next i

End Sub

Open in new window

Regards,
Curt
Avatar of Glenn Ray

ASKER

imnorie:
I mis-stated my original question when I used the phrase "regardless of column widths."  I should have said, "regardless of the number of columns in the range."

epaclm:
That appears to only focus on duplicates in column A.  I will need to remove rows that have duplicate values in all columns, regardless of sort order.

I'll test with the Columns:=Array(....) section removed and report back.
Thanks,
Glenn
I tried the following code without success.

*  I created a test worksheet (three columns, 22 rows plus header, with four known duplicate rows) and was able to remove the duplicates using the "Remove Duplicates" function manually.
*  I recorded a macro while repeating that function, then removed the "Columns:=Array(...)" section, leaving the "Header:=xlYes" section - The macro DID NOT remove duplicates.  
*  Removing the "Header:=xlYes" section had no effect.

Interestingly, online help for Excel seems to suggest exactly what imnorie stated - that omitting the Column and Header parameters should check all columns in the specified range.



Sub Remove_Duplicates()
    Dim strRange As String
    strRange = "$A$1:" & ActiveCell.SpecialCells(xlLastCell).Address
    ActiveSheet.Range(strRange).RemoveDuplicates
End Sub

Open in new window

Glenn,

Yes, the macro I suggested is based on column A.

Try this modified version.
 
Sub Remove_Duplicates()

    Dim i As Long, strConc As String

    Columns("A:A").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
    Range("A1").FormulaR1C1 = "Concat"
    
    'construct concatenate string
    strConc = "=CONCATENATE("
    For i = 1 To xlLastCol
        If i > 1 Then strConc = strConc & ","
        strConc = strConc & "RC[" & i & "]"
    Next i
    strConc = strConc & ")"
    
    Range("A2").FormulaR1C1 = strConc
    Range("A2").AutoFill Destination:=Range("A2:A" & xlLastRow), Type:=xlFillDefault

    For i = xlLastRow To 2 Step -1
        If WorksheetFunction.CountIf(Columns("A"), Range("A" & i).Value) > 1 Then Rows(i).Delete
    Next i
    
    Columns("A:A").Delete Shift:=xlToLeft
    Range("A2").Select
End Sub

Function xlLastRow(Optional WorksheetName As String) As Long
     
    'Check for optional worksheetname else use activesheet
    If WorksheetName = vbNullString Then
        WorksheetName = ActiveSheet.Name
    End If
    
     '    find the last populated row in a worksheet
    With Worksheets(WorksheetName)
        xlLastRow = .Cells.Find("*", .Cells(1), xlFormulas, _
        xlWhole, xlByRows, xlPrevious).Row
    End With
     
End Function
 
Function xlLastCol(Optional WorksheetName As String) As Long
     
    'Check for optional worksheetname else use activesheet
    If WorksheetName = vbNullString Then
        WorksheetName = ActiveSheet.Name
    End If
    
     '    find the last populated column in a worksheet
    With Worksheets(WorksheetName)
        xlLastCol = .Cells.Find("*", .Cells(1), xlFormulas, _
        xlWhole, xlByColumns, xlPrevious).Column
    End With
End Function

Open in new window

It looks for duplicated rows. It will only work up to column 255 due to the limit of the Concatenate command.

The 2 functions are used to find the last used row and the last used column

Cheers,
Curt
Curt,  that certainly looks like it would work.  Unfortunately, I'm dealing with a very large data set:  about 250,000 rows and 38 columns.  I'm not sure how quickly this routine would work.  The "Remove Duplicates" function  runs very quickly (< 1 min) when run manually.

BTW, the data comes from a query out of Hyperion Brio, and darned if I can figure out why there are duplicate transactions being returned (otherwise, I'd fix the query instead of doing this in Excel).
Glenn

It must be the data then.

What you regard as duplicates isn't the same as what Excel regards as duplicates.

One thing you might want to consider, is that there really aren't duplicates but there appears to be due to formatting.

That would mainly be for numeric values with a decimal part which are displayed to a fixed decimal width.

Simple example, 2.595 and 2.596 - different obviously but when formatted to 2 decimal places both appear as 2.60.
It isn't an issue with the data, per se, since I can manually run the "Remove Duplicates" function successfully on the original data set and see that xx rows were removed (varies each month..has been anywhere from 10 to 200 rows).

Additionally - as I noted earlier - I created a very small test workbook (22 rows of data, three columns) with four duplicate rows.  The code did not remove the rows while the manual "Removed Duplicates" function did.

SO...I'm attaching another test workbook with the macro included.  I've highlighted two duplicate rows.  If one manually runs the RD function, the two rows will be removed.  However, the macro will not remove them.

I would prefer to use the programmatic method of .RemoveDuplicates if at all possible.  Otherwise, I guess I found a "feature" and will have to bypass this step. EE-Remove-Duplicates.xlsm
Well I'm confused, I was using this approach a couple of weeks ago.

I would usually use Advanced Filter for this anyway.

So it looks as though I was wrong, and so was MS - it says in Help that columns is optional and if it's not used then all columns are assumed.
Glenn,

I think I may have found the solution by combing a few ideas.

 
Sub Remove_Duplicates()
    Dim i As Long
    Dim strRange As String
    
    strRange = "$A$1:" & ActiveCell.SpecialCells(xlLastCell).Address
    i = xlLastRow
    ActiveSheet.Range(strRange).CurrentRegion.RemoveDuplicates Columns:=i, Header:=xlYes
End Sub

Function xlLastRow(Optional WorksheetName As String) As Long
     
    'Check for optional worksheetname else use activesheet
    If WorksheetName = vbNullString Then
        WorksheetName = ActiveSheet.Name
    End If
    
     '    find the last populated row in a worksheet
    With Worksheets(WorksheetName)
        xlLastRow = .Cells.Find("*", .Cells(1), xlFormulas, _
        xlWhole, xlByRows, xlPrevious).Row
    End With
     
End Function

Open in new window


Try in attached file using the sheet2 Test2 as test data

Cheers,
Curt
EE-Remove-Duplicates-mod1.xlsm
ASKER CERTIFIED SOLUTION
Avatar of Norie
Norie

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Looks like you got it right imnorie. I don't know how I managed to screw it up like I did using rows for columns and somehow (believe it or not!) I had a macro that worked but when I try it now it doesn't work at all of course. I guess the key is that Columns:= needs an array with the column numbers.

Cheers,
Curt
Curt

I actually found an explanation and I kind of understand it, but not fully.

Certainly not enough to explain right now- it's all to do with objects, by reference, by val etc

The main thing is the brackets, which I've seen used with other functions that have array arguments.

Still don't understand why the Help topic clearly states that it should work if you don't use the Columns argument.

But hey, that's another Microsoft 'feature' I suppose.:)
AND...Yes!

I had been messing around with my code and had tried a similar approach, but I wasn't aware of the use of parenthesis as an array identifier (i.e., (cols) in the code).  Nor had I properly assigned the array values.

I tried this with the Header parameter removed and also with Header:=xlYes and both instances also worked.

I then tested this against my original dataset (94 columns, 269K rows, 7 known duplicate rows) and it worked perfectly.

What is really great about this solution is that one can now programmatically use the RemoveDuplicates method selectively on any or all columns in a given range, even using header values against a lookup table.  All one has to do is properly assign an array with the specific column numbers to use for filtering and then build it as was done here.

Nice work, imnnorie!



 

Ingenious use of array identifier (parenthesis) to properly apply to columns parameter in RemoveDuplicates method.
Glenn

It isn't being used as an array identifier as far as I'm aware.

As far as I know it's to do with passing the argument ByRef or ByVal.
Meant to ad:

Whatever it is, it's working and that's the important thing.:)
Another d.
I did a bit of reading to educate myself about arrays and ByVal and ByRef.
I think the brackets means that it protects the array variable to be changed by the "ByRef"  and keep it as ByVal.
Extracted from http://msdn.microsoft.com
"The calling code cannot override a ByVal mechanism, but if an argument is declared with ByRef, the calling code can force the mechanism to pass by value by enclosing the argument name in parentheses in the call."

Regarding arrays I found that you can actually check whether a variable is an array or not by using IsArray. Just for fun I added this test to Imnorie's macro.

Cheers,
Curt
Sub DeDupeCols()
    Dim rng As Range
    Dim cols() As Variant
    Dim I As Integer
    Dim arrayCheck As Boolean

    Set rng = Worksheets("Original").Range("A1").CurrentRegion
    ReDim cols(0 To rng.Columns.Count - 1)
    For I = 0 To UBound(cols)
        cols(I) = I + 1
    Next I
    'TypeName Returns a String value containing data-type information about a variable
    'IsArray() checks if the variable is an array or not
    arrayCheck = IsArray(cols)
    MsgBox "The variable cols is of type " & TypeName(cols) & " and it is " & arrayCheck & " that it is an array"
    rng.RemoveDuplicates Columns:=(cols), Header:=xlNo
End Sub

Open in new window