Solved

find duplicates and delete, excel 2010

Posted on 2014-09-11
31
162 Views
Last Modified: 2014-10-02
I have a spreadsheet, unsubs-4.csv, attached.

In Col A is listed about 5,600 email addresses.

I need a function, that  goes through all spreadsheets (.csv, .xls, .xlsx, .xlsm, etc.) found in folder:
C:\Users\mbehr.WEALTHVEST\Dropbox\WealthVest\clearslide\email_list
finds all matches to the emails in unsubs-4.csv and deletes them (from all spreadsheets,
regardless of extension) that are found in the 'email_list' subfolder.

Any questions, please feel free to ask.

unsubs-4.csv
0
Comment
Question by:mabehr
  • 16
  • 15
31 Comments
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Weired task that you ask here. May I ask if you will be doing this ONCE or on a regular basis ? Like you need to make a cleanup once or need this to be implemented as a routine ?

Reason for my question is that you have plenty of emails and you want to go on a folder and open each file and look for a certain email and if found then delete it from that file then save and close the file this type of process is very CPU demanding especially if you have say 200+ file and all these emails imagine the opening/closing/saving that you will be doing.

If you state the why you need to do this then we may understand better your need and then maybe propose an alternative.

gowflow
0
 

Author Comment

by:mabehr
Comment Utility
I'm planning on this being a one time operation but it may be used again but maybe only a few times.

If I don't have a function I'd have to search for each individual email address and delete them. The attached list is a list of unsubscribed emails that need to be removed from the original lists.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
I see now it is clearer as it should be treated totally different from how you explained it. For this and to draft the code correctly I need you to attach also a sample of say .xlsm or .csv or .xls as need to see the layout of the file and where the address are like what column etc...
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
will do and get that to you in a couple hours once I get to my work office where the files are.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
ok no sweat
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
Here's an update.

I just need the .csv files accessed and updated. And, they all have the email address in Col C. Some have other columns populated, but Col C is the column that needs updating in the .csv files.

Do you still need an example file?
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
yes I prefer so you can remove confidential info as need to see the format of email and see test to see if they match with  the file you posted and also see if there are multiple worksheets in the same workbook.

You don't need access to the other types of files (excel and macros) ?
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
ok, here you go.

Yes, I will need to access a HUGE .xlsm file sometime but don't want to do that now. Too much work to scrub.

scrubbed-005.csv
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
ok fine when  we find a duplicate email what do we do with the entire row we delete it ? or simply delete the email and keep the row ?
Always these file to check have 1 worksheet or it happens that some of them could have more than 1 worksheet ?
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
I guess delete the row would be best. All the .csv files have one worksheet.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
ok fine sorry for the delay in replying you I was caught in other urgent matters will tackle your asap.
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
sounds good, thanks.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Lets agree on a principle as this is how I will design the macro.

You will get a workbook called Unsuscribe.xlsm that will contain the macro and 3 sheets:

1) First sheet is Main on which there are buttons and possible trace of execution

2) Second sheet Audit in this sheet I will put all the rows that we delete (I feel we need to keep a trace of all the data that will be deleted from file as a principle I hate to delete data without a possible check of it or a way to go back to it.)

3) Third sheet is your current unsubs-4 is actually the list of unsubscribed addresses. Each and every time you have new list you will add them to this one or if you have a complete new list you will replace this one by the new data.

Do you agree on this principle ?
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
This sounds good. So the 3rd sheet is a sheet that could be modified or added to and the macro would run again, against the .csv files?
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Exactly

So shall I move based on above ?
Also need to know if you have Excel 2010 or else what version you are using ?
gowflow
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 

Author Comment

by:mabehr
Comment Utility
yes, please go ahead. I have Excel 2010
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
ok fine. Just a last one you should make sure 'NOT TO HAVE' the unsubscribed file or your 'unsubs-4.csv' if it is always called like this sitting in the folder that you need to process or else you can have a very bad surprise as it will remove from it all the addresses !!!

do we agree on this ?

I called the sheets:
Main
Audit
Emails

and in Emails I copied all address that you had in unsubs-4.csv to it with the date column of course.

also you mentioned we process only .csv right ?
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
agreed and yes to all.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Ooops !!!

I just noticed that in your file scrubbed-005.csv then emails are in Col C is it the case in all your csv files or the emails could be scattered in different columns ? Can you pls urgently advise as I am about to finish it and due to this need to review the whole issue.

gowflow
0
 

Author Comment

by:mabehr
Comment Utility
all the emails are in Col C for all the .csv files.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
no problem I had taken care of this I changed the logic. Pls try it and here is how it works

This file Unsuscribe.xlsm has the macro and 3 sheets
Main
Audit
Emails

In Main you will find 2 buttons 1 to select the folder to process and once done you can proceed to the second one that will take all the emails in sheet emails and go thru the directory selected and look for all .csv files and remove any email row found that matches the list and keep a trace in the sheet Audit.

The file you had attached before scrubbed-005.csv was of not big help as it did not contain any of the unsubscribed emails. I had to manually insert a couple to test that the macro is fine.

Pls check it your end and to be safe I suggest you copy your existing folder into an other one and run the macro on the copy to make sure all is fine and you don't run into havoc deleting rows that you don't want.

here is for easy reference the main function code. for sure there is more code than the below and it is all included in the attach workbook.

Function UnsuscribeEmails() As String
On Error GoTo ErrHandler

Dim WB As Workbook
Dim WS As Worksheet
Dim WSEmails As Worksheet
Dim WSAudit As Worksheet
Dim WSMain As Worksheet
Dim MaxRow As Long, MaxRowA As Long, MaxRowE As Long, MaxRowM As Long, I As Long
Dim lUns As Long
Dim cCell As Range
Dim sFile As String, sFullName As String
        
'---> Set Variables
Set WSEmails = Sheets("Emails")
MaxRowE = WSEmails.Range("A" & WSEmails.Rows.Count).End(xlUp).Row
Set WSAudit = Sheets("Audit")
MaxRowA = WSAudit.Range("A" & WSAudit.Rows.Count).End(xlUp).Row
If MaxRowA = 1 Then MaxRowA = MaxRowA + 1
Set WSMain = ActiveSheet

sFile = Dir(gstDestinationFolder & "*.csv")

If sFile = "" Then
    MsgBox "Warning !!! There are no files to process in this folder !", vbCritical, "Unsuscribe Emails"
    UnsuscribeEmails = "Folder empty, no files to process"
    Exit Function
Else
    
    '---> Clean Previous Trace
    WSMain.Range("B14:I" & WSMain.Rows.Count).ClearContents
    MaxRowM = 14
    
    Do
        
        '---> Get full name
        sFullName = gstDestinationFolder & sFile
        
        '---> Update Trace
        WSMain.Cells(MaxRowM, "B") = sFile
        
        '---> Disable Events
        With Application
            .EnableEvents = False
            .DisplayAlerts = False
            .ScreenUpdating = False
        End With
        
        '---> Open workbook and affect variables
        Set WB = Workbooks.Open(sFullName)
        Set WS = WB.ActiveSheet
        MaxRow = WS.UsedRange.Rows.Count
        WSMain.Activate
        
        '---> enable Trace
        With Application
             .ScreenUpdating = True
        End With
        
        For I = 2 To MaxRowE
            Set cCell = WS.UsedRange.Find(what:=WSEmails.Cells(I, "A"), LookIn:=xlValues, lookat:=xlWhole, MatchCase:=False)
            If Not cCell Is Nothing Then
                '---> Register the record found in Audit
                WSAudit.Cells(MaxRowA, "A") = Now
                WSAudit.Cells(MaxRowA, "B") = sFile
                WS.Range(WS.Range("A" & cCell.Row), WS.Cells(cCell.Row, WS.Columns.Count).End(xlToLeft)).Copy WSAudit.Cells(MaxRowA, "C")
                MaxRowA = MaxRowA + 1
                lUns = lUns + 1
                
                '---> Update Trace Unsuscribed count
                WSMain.Cells(MaxRowM, "D") = lUns
                
                '---> Delete the record
                cCell.EntireRow.Delete
            End If
            
            '---> Update Trace rec count
            WSMain.Cells(MaxRowM, "C") = I
            DoEvents
        Next I
        
        '---> Find Next file
        sFile = Dir
        
        '---> Update Trace Status
        WSMain.Cells(MaxRowM, "E") = "Done"
            
        '---> Save workbook
        WB.Close savechanges:=True
        
        '---> reset Variables
        Set WS = Nothing
        Set WB = Nothing
        lUns = 0
        MaxRowM = MaxRowM + 1
        
        '---> Enable Events
        With Application
            .EnableEvents = True
            .DisplayAlerts = True
            .ScreenUpdating = True
        End With
        
    Loop Until sFile = ""
End If

'---> fix Layout
WSAudit.UsedRange.EntireColumn.AutoFit

'---> Set Flag to complete successful and exit
UnsuscribeEmails = ""
Exit Function

ErrHandler:
MsgBox (Error(Err))
UnsuscribeEmails = Error(Err)
On Error GoTo 0

End Function

Open in new window


Let me know your comments.
gowflow
Unsuscribe.xlsm
0
 

Author Comment

by:mabehr
Comment Utility
Thanks gowflow and now I see I neglected one important point and that is this:

Under the folder 'email_list' there are 20 or so subfolders  and those are the folders that actually contain all the .csv files and those subfolders are saved as the names of people who own the lists. For example, there is a folder named Slake Roeppke so the file structure would be:

C:\Users\mbehr.WEALTHVEST\Dropbox\WealthVest\clearslide\email_list\Slake Roeppke

and that folder contains 10 - 11 .csv files. And so it goes for each subfolder. Each subfolder has 10 - 11 .csv files.

So... I could just copy and paste all the .csv files under one folder and select that one folder for the "Select a Folder to Process" button, but that will take some time because then I would have to copy and paste all the changed .csv files back to the correct subfolders.

Of course, since I neglected this one vital point (shame on me) I will do that, UNLESS it would be an easy fix for you to alter the code so that it will search under each subfolder and of course save the .csv files back to its folder.

Let me know.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
mmm yes major point here. Not a solution for you to copy paste nor to go and select a subfolder .. as I like to do things the good way !!!

So let me recap you want to process not only files under the folder selected but also all subfolders and their files ...

Correct ?
gowflow
0
 
LVL 29

Accepted Solution

by:
gowflow earned 500 total points
Comment Utility
I have to admit you are EXTREEMLY lucky !!!

I had developed couple month ago something that needed to loop for all specific files in folder and subfolder and it was not at all an easy task I had to look immensely on the net till I found some  help to get files in folders and subfolders by a recursive method.

Luckilly I found it and was able to incorporate it to your macro !!!

here is the heart of the new addon that basically when called stores in colfiles all files of a certain type in a starting folder and all its subfolders this is needed as the VBA function Dir does not loop in subdirectories. (This may sound a bit too much technical) but its okay as it would help lots of people running in same scenario.

Here is the code for that

Public Function RecursiveDir(colFiles As Collection, _
                              strFolder As String, _
                              strFileSpec As String, _
                              bIncludeSubfolders As Boolean)

Dim strTemp As String
Dim colFolders As New Collection
Dim vFolderName As Variant

'---> Add files in strFolder matching strFileSpec to colFiles
strFolder = TrailingSlash(strFolder)
strTemp = Dir(strFolder & strFileSpec)
Do While strTemp <> vbNullString
    colFiles.Add strFolder & strTemp
    strTemp = Dir
Loop

If bIncludeSubfolders Then
    '---> Fill colFolders with list of subdirectories of strFolder
    strTemp = Dir(strFolder, vbDirectory)
    Do While strTemp <> vbNullString
        If (strTemp <> ".") And (strTemp <> "..") Then
            If (GetAttr(strFolder & strTemp) And vbDirectory) <> 0 Then
                colFolders.Add strTemp
            End If
        End If
        strTemp = Dir
    Loop

    '---> Call RecursiveDir for each subfolder in colFolders
    For Each vFolderName In colFolders
        Call RecursiveDir(colFiles, strFolder & vFolderName, strFileSpec, True)
    Next vFolderName
End If

End Function


Public Function TrailingSlash(strFolder As String) As String
If Len(strFolder) > 0 Then
    If Right(strFolder, 1) = "\" Then
        TrailingSlash = strFolder
    Else
        TrailingSlash = strFolder & "\"
    End If
End If
End Function

Open in new window


and the way to call this function is simply like this

'---> Get the Recursive Files and folders
RecursiveDir colFiles, gstDestinationFolder, "*.csv", True

so by looping thru all the files in colFiles one has all the .csv in the folder selected and its subfolders.

Please try this version and let me know. You will notice I added now trace to directory beside the file.
gowflow
Unsuscribe-V01.xlsm
0
 

Author Comment

by:mabehr
Comment Utility
Got it and thank you! Am running it now. Very cool. It will take a bit until it's finished, and will report back, but so far it seems to be running.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
ok fine, mind you hv a Cadillac here !!! take good care :)
Let me know your comments.
gowflow
0
 

Author Closing Comment

by:mabehr
Comment Utility
The Cadillac purred right along and ran perfectly. Great work, gowflow. Fantastic job. Thank you very much. Saved me tons of time.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
I am glad it worked for you happy to have put a smile !!
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
You certainly did. Thanks again.
0
 
LVL 29

Expert Comment

by:gowflow
Comment Utility
Great. Pls feel free to ask for help on any other issue you can now message the Expert if you want by clicking on his name under his profile.
gowflow
0
 

Author Comment

by:mabehr
Comment Utility
ready for another gowflow?
Now I need to take all those 200 or so spreadsheets and create one spreadsheet from them all.

Let me know and I'll post the question.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

INDEX and MATCH can be used to great effect to replace HLOOKUP and VLOOKUP as it does not have the limitation of needing the data to be sorted so that the reference value is in the first column or row. It also has the ability to perform a bi-directi…
This tutorial explains how to create a series of drop-down lists that are dependent upon prior selections to guide (“force”) the user to make the correct selection and reduce data errors within Microsoft Excel. Excel 2010 was used for this tutorial;…
The viewer will learn how to simulate a series of sales calls dependent on a single skill level and learn how to simulate a series of sales calls dependent on two skill levels. Simulating Independent Sales Calls: Enter .75 into cell C2 – “skill leve…
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now