Glenn Ray
asked on
Use RemoveDuplicates in VBA with any size range
Is it possible to recode the RemoveDuplicates statement to work with any contiguous range of data, regardless of the number of columns? All columns would need to be applied to the test array.
The following example was lifted from a recorded macro. I can determine the active range programmatically and replace that ("$A$1:$CP$269195"), but don't know how to assign the array to pick up every column, regardless of column width.
The following example was lifted from a recorded macro. I can determine the active range programmatically and replace that ("$A$1:$CP$269195"), but don't know how to assign the array to pick up every column, regardless of column width.
Sub Remove_Duplicates()
'This needs to be generalized to work with any size data set, regardless of number of columns
ActiveSheet.Range("$A$1:$CP$269195").RemoveDuplicates Columns:=Array(1, 2, 3, 4, _
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, _
33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, _
59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, _
85, 86, 87, 88, 89, 90, 91, 92, 93, 94), Header:=xlYes
End Sub
Maybe this macro which I found on YouTube posted by Jie Jenn will meet your requirements?
Curt
Sub Remove_Duplicates()
Dim LR As Long, i As Long
LR = Range("A" & Rows.Count).End(xlUp).Row
For i = LR To 2 Step -1
If WorksheetFunction.CountIf(Columns("A"), Range("A" & i).Value) > 1 Then Rows(i).Delete
Next i
End Sub
Regards,Curt
ASKER
imnorie:
I mis-stated my original question when I used the phrase "regardless of column widths." I should have said, "regardless of the number of columns in the range."
epaclm:
That appears to only focus on duplicates in column A. I will need to remove rows that have duplicate values in all columns, regardless of sort order.
I'll test with the Columns:=Array(....) section removed and report back.
Thanks,
Glenn
I mis-stated my original question when I used the phrase "regardless of column widths." I should have said, "regardless of the number of columns in the range."
epaclm:
That appears to only focus on duplicates in column A. I will need to remove rows that have duplicate values in all columns, regardless of sort order.
I'll test with the Columns:=Array(....) section removed and report back.
Thanks,
Glenn
ASKER
I tried the following code without success.
* I created a test worksheet (three columns, 22 rows plus header, with four known duplicate rows) and was able to remove the duplicates using the "Remove Duplicates" function manually.
* I recorded a macro while repeating that function, then removed the "Columns:=Array(...)" section, leaving the "Header:=xlYes" section - The macro DID NOT remove duplicates.
* Removing the "Header:=xlYes" section had no effect.
Interestingly, online help for Excel seems to suggest exactly what imnorie stated - that omitting the Column and Header parameters should check all columns in the specified range.
* I created a test worksheet (three columns, 22 rows plus header, with four known duplicate rows) and was able to remove the duplicates using the "Remove Duplicates" function manually.
* I recorded a macro while repeating that function, then removed the "Columns:=Array(...)" section, leaving the "Header:=xlYes" section - The macro DID NOT remove duplicates.
* Removing the "Header:=xlYes" section had no effect.
Interestingly, online help for Excel seems to suggest exactly what imnorie stated - that omitting the Column and Header parameters should check all columns in the specified range.
Sub Remove_Duplicates()
Dim strRange As String
strRange = "$A$1:" & ActiveCell.SpecialCells(xlLastCell).Address
ActiveSheet.Range(strRange).RemoveDuplicates
End Sub
Glenn,
Yes, the macro I suggested is based on column A.
Try this modified version.
The 2 functions are used to find the last used row and the last used column
Cheers,
Curt
Yes, the macro I suggested is based on column A.
Try this modified version.
Sub Remove_Duplicates()
Dim i As Long, strConc As String
Columns("A:A").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
Range("A1").FormulaR1C1 = "Concat"
'construct concatenate string
strConc = "=CONCATENATE("
For i = 1 To xlLastCol
If i > 1 Then strConc = strConc & ","
strConc = strConc & "RC[" & i & "]"
Next i
strConc = strConc & ")"
Range("A2").FormulaR1C1 = strConc
Range("A2").AutoFill Destination:=Range("A2:A" & xlLastRow), Type:=xlFillDefault
For i = xlLastRow To 2 Step -1
If WorksheetFunction.CountIf(Columns("A"), Range("A" & i).Value) > 1 Then Rows(i).Delete
Next i
Columns("A:A").Delete Shift:=xlToLeft
Range("A2").Select
End Sub
Function xlLastRow(Optional WorksheetName As String) As Long
'Check for optional worksheetname else use activesheet
If WorksheetName = vbNullString Then
WorksheetName = ActiveSheet.Name
End If
' find the last populated row in a worksheet
With Worksheets(WorksheetName)
xlLastRow = .Cells.Find("*", .Cells(1), xlFormulas, _
xlWhole, xlByRows, xlPrevious).Row
End With
End Function
Function xlLastCol(Optional WorksheetName As String) As Long
'Check for optional worksheetname else use activesheet
If WorksheetName = vbNullString Then
WorksheetName = ActiveSheet.Name
End If
' find the last populated column in a worksheet
With Worksheets(WorksheetName)
xlLastCol = .Cells.Find("*", .Cells(1), xlFormulas, _
xlWhole, xlByColumns, xlPrevious).Column
End With
End Function
It looks for duplicated rows. It will only work up to column 255 due to the limit of the Concatenate command.The 2 functions are used to find the last used row and the last used column
Cheers,
Curt
ASKER
Curt, that certainly looks like it would work. Unfortunately, I'm dealing with a very large data set: about 250,000 rows and 38 columns. I'm not sure how quickly this routine would work. The "Remove Duplicates" function runs very quickly (< 1 min) when run manually.
BTW, the data comes from a query out of Hyperion Brio, and darned if I can figure out why there are duplicate transactions being returned (otherwise, I'd fix the query instead of doing this in Excel).
BTW, the data comes from a query out of Hyperion Brio, and darned if I can figure out why there are duplicate transactions being returned (otherwise, I'd fix the query instead of doing this in Excel).
Glenn
It must be the data then.
What you regard as duplicates isn't the same as what Excel regards as duplicates.
One thing you might want to consider, is that there really aren't duplicates but there appears to be due to formatting.
That would mainly be for numeric values with a decimal part which are displayed to a fixed decimal width.
Simple example, 2.595 and 2.596 - different obviously but when formatted to 2 decimal places both appear as 2.60.
It must be the data then.
What you regard as duplicates isn't the same as what Excel regards as duplicates.
One thing you might want to consider, is that there really aren't duplicates but there appears to be due to formatting.
That would mainly be for numeric values with a decimal part which are displayed to a fixed decimal width.
Simple example, 2.595 and 2.596 - different obviously but when formatted to 2 decimal places both appear as 2.60.
ASKER
It isn't an issue with the data, per se, since I can manually run the "Remove Duplicates" function successfully on the original data set and see that xx rows were removed (varies each month..has been anywhere from 10 to 200 rows).
Additionally - as I noted earlier - I created a very small test workbook (22 rows of data, three columns) with four duplicate rows. The code did not remove the rows while the manual "Removed Duplicates" function did.
SO...I'm attaching another test workbook with the macro included. I've highlighted two duplicate rows. If one manually runs the RD function, the two rows will be removed. However, the macro will not remove them.
I would prefer to use the programmatic method of .RemoveDuplicates if at all possible. Otherwise, I guess I found a "feature" and will have to bypass this step. EE-Remove-Duplicates.xlsm
Additionally - as I noted earlier - I created a very small test workbook (22 rows of data, three columns) with four duplicate rows. The code did not remove the rows while the manual "Removed Duplicates" function did.
SO...I'm attaching another test workbook with the macro included. I've highlighted two duplicate rows. If one manually runs the RD function, the two rows will be removed. However, the macro will not remove them.
I would prefer to use the programmatic method of .RemoveDuplicates if at all possible. Otherwise, I guess I found a "feature" and will have to bypass this step. EE-Remove-Duplicates.xlsm
Well I'm confused, I was using this approach a couple of weeks ago.
I would usually use Advanced Filter for this anyway.
So it looks as though I was wrong, and so was MS - it says in Help that columns is optional and if it's not used then all columns are assumed.
I would usually use Advanced Filter for this anyway.
So it looks as though I was wrong, and so was MS - it says in Help that columns is optional and if it's not used then all columns are assumed.
Glenn,
I think I may have found the solution by combing a few ideas.
Try in attached file using the sheet2 Test2 as test data
Cheers,
Curt
EE-Remove-Duplicates-mod1.xlsm
I think I may have found the solution by combing a few ideas.
Sub Remove_Duplicates()
Dim i As Long
Dim strRange As String
strRange = "$A$1:" & ActiveCell.SpecialCells(xlLastCell).Address
i = xlLastRow
ActiveSheet.Range(strRange).CurrentRegion.RemoveDuplicates Columns:=i, Header:=xlYes
End Sub
Function xlLastRow(Optional WorksheetName As String) As Long
'Check for optional worksheetname else use activesheet
If WorksheetName = vbNullString Then
WorksheetName = ActiveSheet.Name
End If
' find the last populated row in a worksheet
With Worksheets(WorksheetName)
xlLastRow = .Cells.Find("*", .Cells(1), xlFormulas, _
xlWhole, xlByRows, xlPrevious).Row
End With
End Function
Try in attached file using the sheet2 Test2 as test data
Cheers,
Curt
EE-Remove-Duplicates-mod1.xlsm
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Looks like you got it right imnorie. I don't know how I managed to screw it up like I did using rows for columns and somehow (believe it or not!) I had a macro that worked but when I try it now it doesn't work at all of course. I guess the key is that Columns:= needs an array with the column numbers.
Cheers,
Curt
Cheers,
Curt
Curt
I actually found an explanation and I kind of understand it, but not fully.
Certainly not enough to explain right now- it's all to do with objects, by reference, by val etc
The main thing is the brackets, which I've seen used with other functions that have array arguments.
Still don't understand why the Help topic clearly states that it should work if you don't use the Columns argument.
But hey, that's another Microsoft 'feature' I suppose.:)
I actually found an explanation and I kind of understand it, but not fully.
Certainly not enough to explain right now- it's all to do with objects, by reference, by val etc
The main thing is the brackets, which I've seen used with other functions that have array arguments.
Still don't understand why the Help topic clearly states that it should work if you don't use the Columns argument.
But hey, that's another Microsoft 'feature' I suppose.:)
ASKER
AND...Yes!
I had been messing around with my code and had tried a similar approach, but I wasn't aware of the use of parenthesis as an array identifier (i.e., (cols) in the code). Nor had I properly assigned the array values.
I tried this with the Header parameter removed and also with Header:=xlYes and both instances also worked.
I then tested this against my original dataset (94 columns, 269K rows, 7 known duplicate rows) and it worked perfectly.
What is really great about this solution is that one can now programmatically use the RemoveDuplicates method selectively on any or all columns in a given range, even using header values against a lookup table. All one has to do is properly assign an array with the specific column numbers to use for filtering and then build it as was done here.
Nice work, imnnorie!
I had been messing around with my code and had tried a similar approach, but I wasn't aware of the use of parenthesis as an array identifier (i.e., (cols) in the code). Nor had I properly assigned the array values.
I tried this with the Header parameter removed and also with Header:=xlYes and both instances also worked.
I then tested this against my original dataset (94 columns, 269K rows, 7 known duplicate rows) and it worked perfectly.
What is really great about this solution is that one can now programmatically use the RemoveDuplicates method selectively on any or all columns in a given range, even using header values against a lookup table. All one has to do is properly assign an array with the specific column numbers to use for filtering and then build it as was done here.
Nice work, imnnorie!
ASKER
Ingenious use of array identifier (parenthesis) to properly apply to columns parameter in RemoveDuplicates method.
Glenn
It isn't being used as an array identifier as far as I'm aware.
As far as I know it's to do with passing the argument ByRef or ByVal.
It isn't being used as an array identifier as far as I'm aware.
As far as I know it's to do with passing the argument ByRef or ByVal.
Meant to ad:
Whatever it is, it's working and that's the important thing.:)
Whatever it is, it's working and that's the important thing.:)
Another d.
I did a bit of reading to educate myself about arrays and ByVal and ByRef.
I think the brackets means that it protects the array variable to be changed by the "ByRef" and keep it as ByVal.
Extracted from http://msdn.microsoft.com
"The calling code cannot override a ByVal mechanism, but if an argument is declared with ByRef, the calling code can force the mechanism to pass by value by enclosing the argument name in parentheses in the call."
Regarding arrays I found that you can actually check whether a variable is an array or not by using IsArray. Just for fun I added this test to Imnorie's macro.
Cheers,
Curt
I think the brackets means that it protects the array variable to be changed by the "ByRef" and keep it as ByVal.
Extracted from http://msdn.microsoft.com
"The calling code cannot override a ByVal mechanism, but if an argument is declared with ByRef, the calling code can force the mechanism to pass by value by enclosing the argument name in parentheses in the call."
Regarding arrays I found that you can actually check whether a variable is an array or not by using IsArray. Just for fun I added this test to Imnorie's macro.
Cheers,
Curt
Sub DeDupeCols()
Dim rng As Range
Dim cols() As Variant
Dim I As Integer
Dim arrayCheck As Boolean
Set rng = Worksheets("Original").Range("A1").CurrentRegion
ReDim cols(0 To rng.Columns.Count - 1)
For I = 0 To UBound(cols)
cols(I) = I + 1
Next I
'TypeName Returns a String value containing data-type information about a variable
'IsArray() checks if the variable is an array or not
arrayCheck = IsArray(cols)
MsgBox "The variable cols is of type " & TypeName(cols) & " and it is " & arrayCheck & " that it is an array"
rng.RemoveDuplicates Columns:=(cols), Header:=xlNo
End Sub
Mind you, the mention of column widths makes me wonder if there's more to what you want to do, eg hidden columns.
Is there?