Link to home
Start Free TrialLog in
Avatar of ExpertHelp79
ExpertHelp79

asked on

Macro to delete duplicate rows in the sheet

Hello

I need to delete duplicate rows in the sheet on the below condition

I have resourcestartdate, resourceenddate and employeeID columns. Now for duplicate employeeids we need to check the dates and keep only the row in datatable which has the most recent resourcestartdate and delete rest of the rows for that employeeid.
 SOS
Please help

TIA
Avatar of SiddharthRout
SiddharthRout
Flag of India image

Try this.

I am assuming that the Col A is "employeeID", Col B is "resourcestartdate" and Col C is "resourceenddate"

Sample file attached. Please run the Sub Sample in the module.

Code Used

Sub Sample()
    Dim i As Long, lastrow As Long
    Dim acell As Range, bcell As Range
    Dim oSht As Worksheet
    Dim strSearch As String
    Dim dt As Date
    
    Set oSht = Sheets("Sheet1")
    
    lastrow = oSht.Range("A" & Rows.Count).End(xlUp).Row
    
    For i = lastrow To 2 Step -1
        strSearch = oSht.Range("A" & i).Value
        dt = oSht.Range("B" & i).Value
        
        Set acell = oSht.Range("A1:A" & lastrow).Find(What:=strSearch, LookIn:=xlValues, _
        LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
        MatchCase:=False, SearchFormat:=False)
    
        If Not acell Is Nothing Then
            Set bcell = acell
            If acell.Offset(, 1).Value > dt Then oSht.Rows(i).Delete Shift:=xlUp
            ExitLoop = False
            Do While ExitLoop = False
                
                Set acell = oSht.Range("A1:A" & lastrow).FindNext(After:=acell)
                
                If Not acell Is Nothing Then
                    On Error Resume Next
                    If acell.Address = bcell.Address Then Exit Do
                    On Error GoTo 0
                    If acell.Offset(, 1).Value > dt Then oSht.Rows(i).Delete Shift:=xlUp
                Else
                    ExitLoop = True
                End If
            Loop
        End If
    Next i
End Sub

Open in new window


Sid
Duplicates-Example.xls
Avatar of mvs10000
mvs10000

An alternative approach without resorting to VBA (assumes you are running Excel 2007):
- Select your data
- Data - Sort & Filter - Sort
- Add a level.  First level sort = employeeID (Order = Smallest to Largest).  Second level sort = start date (Order = Newest to Oldest).
- Select data
- Data - Data Tools - Remove Duplicates
- Unselect all columns except for the employeeID - OK

Excel will keep the first row in each duplicate "cluster" and since it's been sorted by date you will have the correct "parent" record (the one with the most recent date).

Please give SiddharthRout first consideration for any point/answer credit.

Regards,

mvs
Avatar of ExpertHelp79

ASKER

Hello Siddharth
It ran good... but i missed one more filter i apologies

I have a col ProjectNumber

I have resourcestartdate, resourceenddate, ProjectNumber and employeeID columns. Now for duplicate employeeids we need to check the project number and dates and keep only the row in datatable which has the most recent Projectnumber and resourcestartdate and delete rest of the rows for that employeeid where the combination of projectnumber and date are not recent. ie if there are two project number for one employee with multiple date entry then it will check the recent date for the combination of employeeid and projectnumber.
Can you upload an example file?

Sid
This is the final requirement

Combination of Employeeid,Projectnumber,bk   and recent enddate to be kept rest to be deleted.

 Duplicates-Example--2-.xls
I am writing the logic again

UNIQUE employeeid,ProjectNumber,BK     and resourceenddate is most recent then keep the row and deleterest of thee rows

Sorry Sid i changed the logic.. hope i get help
Hi,

try this


Kris
Sub kTest()
Dim ka, k(), c As Long, i As Long, n As Long, t()

ka = Sheets("Sheet1").Range("a1").CurrentRegion.Resize(, 6)

ReDim k(1 To UBound(ka, 1), 1 To UBound(ka, 2))

With CreateObject("scripting.dictionary")
    .comparemode = 1
    For i = 2 To UBound(ka, 1)
        If Not .exists(ka(i, 4)) Then
            n = n + 1
            For c = 1 To UBound(ka, 2): k(n, c) = ka(i, c): Next
            .Add ka(i, 4), Array(n, UBound(ka, 2))
        Else
            t = .Item(ka(i, 4))
            If CDate(ka(i, 6)) > k(t(0), 6) Then k(t(0), 6) = ka(i, 6)
        End If
    Next
End With
If n Then
    With Sheets("Sheet1").Range("a1")
        .CurrentRegion.Resize(, 6).Offset(1).ClearContents
        .Offset(1).Resize(n, UBound(k, 2)).Value = k
    End With
End If
End Sub

Open in new window

please explain me the below lines as i have to implement the macro in a larger file where

Total Columns is 29
Project number is at col 5
Employeeid at col 19
BK at 21


ka = Sheets("Sheet1").Range("a1").CurrentRegion.Resize(, 6)

ReDim k(1 To UBound(ka, 1), 1 To UBound(ka, 2))
If Not .exists(ka(i, 4)) Then
.Offset(1).Resize(n, UBound(k, 2)).Value = k

TIA
combination at col 22
i modified it accordingly but getting error
Please check

 
Sub kTest()
Dim ka, k(), c As Long, i As Long, n As Long, t()

ka = Sheets("Sheet1").Range("a1").CurrentRegion.Resize(, 29)

ReDim k(1 To UBound(ka, 1), 1 To UBound(ka, 2))

With CreateObject("scripting.dictionary")
    .comparemode = 1
    For i = 2 To UBound(ka, 1)
        If Not .exists(ka(i, 22)) Then
            n = n + 1
            For c = 1 To UBound(ka, 2): k(n, c) = ka(i, c): Next
            .Add ka(i, 22), Array(n, UBound(ka, 2))
        Else
            t = .Item(ka(i, 22))
            If CDate(ka(i, 29)) > k(t(0), 29) Then k(t(0), 29) = ka(i, 29)
        End If
    Next
End With
If n Then
    With Sheets("Sheet1").Range("a1")
        .CurrentRegion.Resize(, 29).Offset(1).ClearContents
        .Offset(1).Resize(n, UBound(k, 2)).Value = k
    End With
End If
End Sub

Open in new window

So you want to keep this

1      100002      99      110000299      1/06/2010      1/13/2010

and delete rest?

Sid
employeeID       ProjectNumber      BK      Combination      resourcestartdate      resourceenddate Status
1                      100001              0      11000010      1/2/2010                      1/13/2010                Del
1                      100001              0      11000010      1/8/2010                      1/13/2011
1                      100001              0      11000010      1/8/2010                      1/13/2010                Del
1                      100001             99      110000199      1/8/2010                      1/13/2010
1                      100002             99      110000299      1/6/2010                      1/13/2010
1                      100002             99      110000299      1/2/2010                      1/7/2010                  Del
hey sid please find the status above
Try this

Sample Attached.

Sid

Code Used

Sub Sample()
    Dim i As Long, lastrow As Long
    Dim acell As Range, bcell As Range
    Dim oSht As Worksheet
    Dim strSearch As String
    Dim dt As Date
    Dim Pn As Long, bk As Long
    
    Set oSht = Sheets("Sheet1")
    
    lastrow = oSht.Range("A" & Rows.Count).End(xlUp).Row
    
    For i = lastrow To 2 Step -1
        strSearch = oSht.Range("A" & i).Value
        dt = oSht.Range("F" & i).Value
        Pn = oSht.Range("B" & i).Value
        bk = oSht.Range("C" & i).Value
        
        Set acell = oSht.Range("A1:A" & lastrow).Find(What:=strSearch, LookIn:=xlValues, _
        LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
        MatchCase:=False, SearchFormat:=False)
    
        If Not acell Is Nothing Then
            Set bcell = acell
            If acell.Offset(, 1).Value = Pn And acell.Offset(, 2).Value = bk Then
                If acell.Offset(, 5).Value > dt Then oSht.Rows(i).Delete Shift:=xlUp
            End If
            ExitLoop = False
            Do While ExitLoop = False
                
                Set acell = oSht.Range("A1:A" & lastrow).FindNext(After:=acell)
                
                If Not acell Is Nothing Then
                    On Error Resume Next
                    If acell.Address = bcell.Address Then Exit Do
                    On Error GoTo 0
                    If acell.Offset(, 1).Value = Pn And acell.Offset(, 2).Value = bk Then
                        If acell.Offset(, 5).Value > dt Then oSht.Rows(i).Delete Shift:=xlUp
                    End If
                Else
                    ExitLoop = True
                End If
            Loop
        End If
    Next i
End Sub

Open in new window


Duplicates-Example.xls
The actual file is like this

Total Columns is 29
Project number is at col 5
Employeeid at col 19
BK at 21
combination at col 22

how can i implement the same code

In which line you get error ? and what's the error ?
2 mins. I will amend the code as per that :)

Sid

As far I can see the code you modified is ok. What's the error and at which line ?
resourceenddate  is in which col?

Sid
resourceenddate AA col ... sorry i missed
there are around 29000 rows from which it will delete as per condition
Try this

Sub Sample()
    Dim i As Long, lastrow As Long
    Dim acell As Range, bcell As Range
    Dim oSht As Worksheet
    Dim strSearch As String, Comb As String
    Dim dt As Date
    Dim Pn As Long, bk As Long
    
    Set oSht = Sheets("Sheet1")
    
    lastrow = oSht.Range("S" & Rows.Count).End(xlUp).Row
    
    For i = lastrow To 2 Step -1
        strSearch = oSht.Range("S" & i).Value '<~~ Emp ID
        dt = oSht.Range("AA" & i).Value '<~~ resourceenddate
        Pn = oSht.Range("E" & i).Value '<~~ ProjectNumber
        bk = oSht.Range("U" & i).Value '<~~ BK
        Comb = oSht.Range("V" & i).Value '<~~ combination
        
        Set acell = oSht.Range("S1:S" & lastrow).Find(What:=strSearch, LookIn:=xlValues, _
        LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
        MatchCase:=False, SearchFormat:=False)
    
        If Not acell Is Nothing Then
            Set bcell = acell
            If acell.Offset(, -14).Value = Pn And acell.Offset(, 2).Value = bk _
            And acell.Offset(, 3).Value = Comb Then
                If acell.Offset(, 8).Value > dt Then oSht.Rows(i).Delete Shift:=xlUp
            End If
            ExitLoop = False
            Do While ExitLoop = False
                
                Set acell = oSht.Range("A1:A" & lastrow).FindNext(After:=acell)
                
                If Not acell Is Nothing Then
                    On Error Resume Next
                    If acell.Address = bcell.Address Then Exit Do
                    On Error GoTo 0
                    If acell.Offset(, -14).Value = Pn And acell.Offset(, 2).Value = bk _
                    And acell.Offset(, 3).Value = Comb Then
                        If acell.Offset(, 8).Value > dt Then oSht.Rows(i).Delete Shift:=xlUp
                    End If
                Else
                    ExitLoop = True
                End If
            Loop
        End If
    Next i
End Sub

Open in new window


Sid
Error: unable to get the FindNext property of the range class

at
Set acell = oSht.Range("A1:A" & lastrow).FindNext(After:=acell)

acel = 1495207
Hi,

Try this one.


Kris
Sub kTest()
Dim ka, k(), c As Long, i As Long, n As Long, t()

ka = Sheets("Sheet1").Range("a1").CurrentRegion.Resize(, 29)

Const CombCol       As Long = 22    '<== adjust the column #
Const ResEndDtCol   As Long = 27    '<== adjust the column #

ReDim k(1 To UBound(ka, 1), 1 To UBound(ka, 2))

With CreateObject("scripting.dictionary")
    .comparemode = 1
    For i = 2 To UBound(ka, 1)
        If Not .exists(ka(i, CombCol)) Then
            n = n + 1
            For c = 1 To UBound(ka, 2): k(n, c) = ka(i, c): Next
            .Add ka(i, CombCol), Array(n, UBound(ka, 2))
        Else
            t = .Item(ka(i, CombCol))
            If CDate(ka(i, ResEndDtCol)) > k(t(0), ResEndDtCol) Then
                For c = 1 To UBound(ka, 2)
                    k(t(0), c) = ka(i, c)
                Next
            End If
        End If
    Next
End With
If n Then
    With Sheets("Sheet1").Range("a1")
        .CurrentRegion.Resize(, 29).Offset(1).ClearContents
        .Offset(1).Resize(n, UBound(k, 2)).Value = k
    End With
End If
End Sub

Open in new window

Oops a typo

Change

Set acell = oSht.Range("A1:A" & lastrow).FindNext(After:=acell)

to

Set acell = oSht.Range("S1:S" & lastrow).FindNext(After:=acell)

Sid
Also one more point
combination is concatenation of Employeeid+ projectno+BK
That's ok. Did you try the code after the change?

Sid
trying its taking sometime as there are 29000 rows
i hope it is taking much time.... :(
Once you confirm that it is working then I will tweak it to make it much faster :)

Sid
i hope it is working as in 5 mins it reduced from 29000 to 27962 the exact number is 3715 rows less
i am not able to acess
In the meantime can you upload a copy of that file here so that i can test it?

Sid
ahh sorry Sid some confidential data cannot share   :(
oOk. Once it is working fine, I will give you a much faster version of the above code. :)

Sid
sure Sid

i am trying again now

krishnakrkc:the code worked very fast but i need to acess one thing
did you implement the logic
UNIQUE employeeid,ProjectNumber,BK     and resourceenddate is most recent then keep the row and delete rest of the rows
as the count of deletion is less
If you are starting all over gain then try this code.

Sub Sample()
    Dim i As Long, lastrow As Long
    Dim acell As Range, bcell As Range, delRange As Range
    Dim oSht As Worksheet
    Dim strSearch As String, Comb As String
    Dim dt As Date
    Dim Pn As Long, bk As Long
    Dim startedAt As String, EndedAt As String
    
    startedAt = Str(Now)
    
    Application.ScreenUpdating = False
    
    Set oSht = Sheets("Sheet1")
    
    lastrow = oSht.Range("S" & Rows.Count).End(xlUp).Row
    
    For i = lastrow To 2 Step -1
        strSearch = oSht.Range("S" & i).Value '<~~ Emp ID
        dt = oSht.Range("AA" & i).Value '<~~ resourceenddate
        Pn = oSht.Range("E" & i).Value '<~~ ProjectNumber
        bk = oSht.Range("U" & i).Value '<~~ BK
        Comb = oSht.Range("V" & i).Value '<~~ combination
        
        Set acell = oSht.Range("S1:S" & lastrow).Find(What:=strSearch, LookIn:=xlValues, _
        LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
        MatchCase:=False, SearchFormat:=False)
    
        If Not acell Is Nothing Then
            Set bcell = acell
            If acell.Offset(, -14).Value = Pn And acell.Offset(, 2).Value = bk _
            And acell.Offset(, 3).Value = Comb Then
                If delange Is Nothing Then
                    Set delRange = oSht.Rows(i)
                Else
                    Set delRange = Union(delRange, oSht.Rows(i))
                End If
                If acell.Offset(, 8).Value > dt Then oSht.Rows(i).Delete shift:=xlUp
            End If
            ExitLoop = False
            Do While ExitLoop = False
                
                Set acell = oSht.Range("S1:S" & lastrow).FindNext(After:=acell)
                
                If Not acell Is Nothing Then
                    On Error Resume Next
                    If acell.Address = bcell.Address Then Exit Do
                    On Error GoTo 0
                    If acell.Offset(, -14).Value = Pn And acell.Offset(, 2).Value = bk _
                    And acell.Offset(, 3).Value = Comb Then
                        If delange Is Nothing Then
                            Set delRange = oSht.Rows(i)
                        Else
                            Set delRange = Union(delRange, oSht.Rows(i))
                        End If
                    End If
                Else
                    ExitLoop = True
                End If
            Loop
        End If
    Next i
    
    delRange.Delete shift:=xlUp
    Application.ScreenUpdating = True
    EndedAt = Str(Now)
    
    MsgBox "Process Started at " & startedAt & " and Ended at " & EndedAt
End Sub

Open in new window


Sid
error at
If delange Is Nothing Then

object required
Hmph a typo...

Change delange to

delRange

Sid
Do that at Line 51 as well.

Sid
Hi,

The code takes the unique string from combination column. It'll keep only one record for each combination.

this part of the code

If CDate(ka(i, ResEndDtCol)) > k(t(0), ResEndDtCol) Then

confirms the recent date record.

HTH

sid my excel is dead its i hope 20 mins now the time span is too much
Strange...

Did it work for the sample files above?

Sid
krishnakrkc: i will update you tomorrow morning about the validity of the macro... also thanks a lot for helping
Sid .. yes it worked for the sample file but when i am implementing on the real file  with 28000 rows the excel is dead
Can you do me a favor... just create a sample of dummy data but leave the Row 1 intact and upload it. let me test it for you.

Sid
sure
hello Sid i have created the file .. please play with the Resource Start and End date to satisfy the logic

of the combination of (Employeeid,WbsCode,BillinkKey) + Recent Resource End Date
Book1.xlsx
New conditions? (WbsCode,BillinkKey)?

Sid
So in the above example only the 11th row will be deleted?

Sid
ya.. but i will like to see the result for different dates

also again
 (Employeeid,WbsCode,BillingKey) + Recent Resource End Date
wbscode is projectnumber and billingkey is bk
Then try this?

Sid

Code Used

Sub Sample()
    Dim i As Long, lastrow As Long
    Dim acell As Range, bcell As Range, delRange As Range
    Dim oSht As Worksheet
    Dim strSearch As String, Comb As String
    Dim dt As Date
    Dim Pn As Long, bk As Long
    Dim startedAt As String, EndedAt As String
    
    startedAt = Str(Now)
    
    Application.ScreenUpdating = False
    
    Set oSht = Sheets("Sheet1")
    
    lastrow = oSht.Range("S" & Rows.Count).End(xlUp).Row
    
    For i = lastrow To 2 Step -1
        strSearch = oSht.Range("S" & i).Value '<~~ Emp ID
        dt = oSht.Range("AA" & i).Value '<~~ resourceenddate
        Pn = oSht.Range("E" & i).Value '<~~ ProjectNumber
        bk = oSht.Range("U" & i).Value '<~~ BK
        Comb = oSht.Range("V" & i).Value '<~~ combination
        
        Set acell = oSht.Range("S1:S" & lastrow).Find(What:=strSearch, LookIn:=xlValues, _
        LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
        MatchCase:=False, SearchFormat:=False)
    
        If Not acell Is Nothing Then
            Set bcell = acell
            If acell.Offset(, -14).Value = Pn And acell.Offset(, 2).Value = bk _
            And acell.Offset(, 3).Value = Comb Then
                If acell.Offset(, 8).Value > dt Then
                    If delRange Is Nothing Then
                        Set delRange = oSht.Rows(i)
                    Else
                        Set delRange = Union(delRange, oSht.Rows(i))
                    End If
                End If
            End If
            ExitLoop = False
            Do While ExitLoop = False
                
                Set acell = oSht.Range("S1:S" & lastrow).FindNext(After:=acell)
                
                If Not acell Is Nothing Then
                    On Error Resume Next
                    If acell.Address = bcell.Address Then Exit Do
                    On Error GoTo 0
                    If acell.Offset(, -14).Value = Pn And acell.Offset(, 2).Value = bk _
                    And acell.Offset(, 3).Value = Comb Then
                        If acell.Offset(, 8).Value > dt Then
                            If delRange Is Nothing Then
                                Set delRange = oSht.Rows(i)
                            Else
                                Set delRange = Union(delRange, oSht.Rows(i))
                            End If
                        End If
                    End If
                Else
                    ExitLoop = True
                End If
            Loop
        End If
    Next i
    
    delRange.Delete shift:=xlUp
    Application.ScreenUpdating = True
    EndedAt = Str(Now)
    
    MsgBox "Process Started at " & startedAt & " and Ended at " & EndedAt
End Sub

Open in new window

Duplicates-Example.xls
the code took 40 minutes to execute and returned the same row as  krishnakrkc: which takes few second. I am still evaluating the result and let you know asap.
But sid can you make this code work as fast
The above code that I gave you should not take more than 10 seconds.

Can you post the screenshot of the message box that you get at the end of the code.

Sid
its already 20 mins running... i will share the result soon
Strange....

Let me see if I can use ant other alternative...

Sid
Would it matter if I sorted the data first?

Sid
ahhh i dont know may be it can help... but what is the approach krishnakrkc:  made is doing fast
How much time does this take?

Sub Sample()
    Dim i As Long, lastrow As Long
    Dim acell As Range, bcell As Range, delRange As Range
    Dim oSht As Worksheet
    Dim strSearch As String, Comb As String
    Dim dt As Date
    Dim Pn As Long, bk As Long
    Dim startedAt As String, EndedAt As String
    
    startedAt = Str(Now)
    
    Application.ScreenUpdating = False
    
    Set oSht = Sheets("Sheet1")
    
    lastrow = oSht.Range("S" & Rows.Count).End(xlUp).Row
    
    oSht.Range("A2:AC" & lastrow).Sort Key1:=oSht.Range("S2"), Key2:=oSht.Range("E2"), _
    Key3:=oSht.Range("U2"), Order1:=xlAscending, Header:=xlGuess, _
    OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
    DataOption1:=xlSortNormal
    For i = lastrow To 2 Step -1
        If oSht.Range("S" & i).Value = oSht.Range("S" & i - 1).Value Then
            If oSht.Range("E" & i).Value = oSht.Range("E" & i - 1).Value Then
                If oSht.Range("U" & i).Value = oSht.Range("U" & i - 1).Value Then
                    If oSht.Range("V" & i).Value = oSht.Range("V" & i - 1).Value Then
                        If oSht.Range("AA" & i - 1).Value > oSht.Range("AA" & i).Value Then
                            If delRange Is Nothing Then
                                Set delRange = oSht.Rows(i)
                            Else
                                Set delRange = Union(delRange, oSht.Rows(i))
                            End If
                        Else
                            If delRange Is Nothing Then
                                Set delRange = oSht.Rows(i - 1)
                            Else
                                Set delRange = Union(delRange, oSht.Rows(i - 1))
                            End If
                        End If
                    End If
                End If
            End If
        End If
    Next i
    delRange.Delete shift:=xlUp
    Application.ScreenUpdating = True
    EndedAt = Str(Now)
    
    MsgBox "Process Started at " & startedAt & " and Ended at " & EndedAt
End Sub

Open in new window


Sid
ASKER CERTIFIED SOLUTION
Avatar of SiddharthRout
SiddharthRout
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Tracy
This question has been classified as abandoned and is being closed as part of the Cleanup Program. See my comment at the end of the question for more details.