Expand code to remove duplicate email addresses such that the email addresses themselves are only in the first column

Here is a screencast of how I experience the code currently http://screencast.com/t/1zjHke8W .

Here is the code block itself, fully colorized  http://www.codeshare.io/VLR1a.

What I was looking to do previously is ask for selection of an Outlook folder to extract email addresses from, and then have all those email addresses extract automatically whether they are found on to line, the cc line, or the body of the individual messages.  

We were also looking to extract anything else that might be associated with an email address, and if it is on the to line or the cc line.

From what I can see, those requirements have been met.  

We now need to have it remove any email addresses that might be duplicated such that the addresses all appear, nice and clean, in column A and the remaining information, separated by commas in the next column. It should remove special marks such as < > that frequently accompany such information.

There should not be any duplicate email addresses once the code has run.

Thank you very much for your help with this.

'USE - To extract email addresses from Microsoft Outlook folders including the to and cc fields as well as the message body.

'AUTHOR(S) - EE Kimputer, MacroShadow, David Johnson CD (IT-Expert-Pro MVP)
'DATE - May 9, 2015
'EE Zones - Outlook, Email, VBScript

'FROM WITHIN OUTLOOK:
'TO OPEN CODE EDITOR - Press ALT+F11 to open the code editor, and paste the code into the "ThisOutlookSession" module.
'To CALL SCRIPT - Press ALT+F8 to select the macro, then follow the prompts to select the folder(s) that you want to extract email addresses from.

'IN PROGRESS - Remove duplicates and place names in second column

Sub pickfolder()

Dim NS As Outlook.NameSpace
    Dim pickedfolder As Object
    Dim counter As Long, i As Long
    Dim arr, Item

    Set NS = Application.GetNamespace("MAPI")
    Set pickedfolder = NS.pickfolder
    Set xlApp = CreateObject("Excel.Application")
    Set xlBook = xlApp.Workbooks.Add
    Set xlSheet = xlBook.Worksheets("Sheet1")

    xlApp.Visible = True

    counter = 1
    For Each Item In pickedfolder.Items
        If Item.SenderEmailType = "SMTP" Then
            xlSheet.Cells(counter, 1).Value = """" & Item.SenderName & """ <" & Item.SenderEmailAddress & ">"
            If Item.CC <> "" Then
                xlSheet.Cells(counter, 2).Value = Item.CC
            End If
            If Item.Body <> "" Then
                arr = ExtractEmailAddresses(Item.Body)
                For i = LBound(arr) To UBound(arr)
                    xlSheet.Cells(counter, 2).Offset(, i + 1).Value = arr(i)
                Next
            End If
            counter = counter + 1
        End If
    Next

End Sub

Public Function ExtractEmailAddresses(strBody As String) As Variant

    Dim objMatch As Object
    Dim arrMatches()
    Dim lngCount As Long

    arrMatches = Array()

    With CreateObject("VBScript.RegExp")
        .MultiLine = True
        .IgnoreCase = True
        .Global = True
        .Pattern = "\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"
        For Each objMatch In .Execute(strBody)
            ReDim Preserve arrMatches(lngCount)
            arrMatches(lngCount) = objMatch.Value
            lngCount = lngCount + 1
        Next
    End With

    ExtractEmailAddresses = arrMatches

End Function

Open in new window

frugalmuleAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

MacroShadowCommented:
Just to clarify, column A should hold unique email addresses and column B the names <> removed?
Should column B hold any other information?
0
frugalmuleAuthor Commented:
Not unless that info is in the to or Cc lines
0
frugalmuleAuthor Commented:
I mistakenly left out mention of the from field as well, which ofcourse I should have included.  All email fields should be extracted from with duplicates removed
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

frugalmuleAuthor Commented:
I apologize, Yes in direct answer to your question.
0
MacroShadowCommented:
Try this:
Option Explicit

Sub pickfolder()

    Dim NS As Outlook.NameSpace
    Dim pickedfolder As Object
    Dim counter As Long, i As Long
    Dim arr, Item
    Dim xlApp As Object, xlBook As Object, xlSheet As Object

    Set NS = Application.GetNamespace("MAPI")
    Set pickedfolder = NS.pickfolder
    Set xlApp = CreateObject("Excel.Application")
    Set xlBook = xlApp.Workbooks.Add
    Set xlSheet = xlBook.Worksheets("âéìéåï1")

    counter = 1
    For Each Item In pickedfolder.Items
        If Item.SenderEmailType = "SMTP" Then
            xlSheet.Cells(counter, 1).Value = Item.SenderEmailAddress
            xlSheet.Cells(counter, 2).Value = Item.SenderName
            If Item.CC <> "" Then
                counter = counter + 1
                xlSheet.Cells(counter, 1).Value = ExtractEmailAddresses(Item.CC)
                xlSheet.Cells(counter, 2).Value = Replace(Replace(Replace(Item.CC, xlSheet.Cells(counter, 1).Value, ""), "<", ""), ">", "")
            End If
            If Item.Body <> "" Then
                arr = ExtractEmailAddresses(Item.Body)
                For i = LBound(arr) To UBound(arr)
                    xlSheet.Cells(counter + i, 1).Offset(counter + i, 0).Value = arr(i)
                Next
            End If
        End If
        counter = counter + 1
    Next

    xlSheet.Cells.EntireRow.RemoveDuplicates Columns:=Array(1, 2), Header:=0
    xlApp.Visible = True

End Sub

Public Function ExtractEmailAddresses(strBody As String) As Variant

    Dim objMatch As Object
    Dim arrMatches()
    Dim lngCount As Long

    arrMatches = Array()

    With CreateObject("VBScript.RegExp")
        .MultiLine = True
        .IgnoreCase = True
        .Global = True
        .Pattern = "\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"
        For Each objMatch In .Execute(strBody)
            ReDim Preserve arrMatches(lngCount)
            arrMatches(lngCount) = objMatch.Value
            lngCount = lngCount + 1
        Next
    End With

    ExtractEmailAddresses = arrMatches

End Function

Open in new window

0
frugalmuleAuthor Commented:
Thank you very much!

I got the error "subscript out of range" when I ran it.

http://screencast.com/t/rUycowye
0
MacroShadowCommented:
Sorry, here you go.
Option Explicit

Sub pickfolder()

    Dim NS As Outlook.NameSpace
    Dim pickedfolder As Object
    Dim counter As Long, i As Long
    Dim arr, Item
    Dim xlApp As Object, xlBook As Object, xlSheet As Object

    Set NS = Application.GetNamespace("MAPI")
    Set pickedfolder = NS.pickfolder
    Set xlApp = CreateObject("Excel.Application")
    Set xlBook = xlApp.Workbooks.Add
    Set xlSheet = xlBook.Worksheets("Sheet1")

    counter = 1
    For Each Item In pickedfolder.Items
        If Item.SenderEmailType = "SMTP" Then
            xlSheet.Cells(counter, 1).Value = Item.SenderEmailAddress
            xlSheet.Cells(counter, 2).Value = Item.SenderName
            If Item.CC <> "" Then
                counter = counter + 1
                xlSheet.Cells(counter, 1).Value = ExtractEmailAddresses(Item.CC)
                xlSheet.Cells(counter, 2).Value = Replace(Replace(Replace(Item.CC, xlSheet.Cells(counter, 1).Value, ""), "<", ""), ">", "")
            End If
            If Item.Body <> "" Then
                arr = ExtractEmailAddresses(Item.Body)
                For i = LBound(arr) To UBound(arr)
                    xlSheet.Cells(counter + i, 1).Offset(counter + i, 0).Value = arr(i)
                Next
            End If
        End If
        counter = counter + 1
    Next

    xlSheet.Cells.EntireRow.RemoveDuplicates Columns:=Array(1, 2), Header:=0
    xlApp.Visible = True

End Sub

Public Function ExtractEmailAddresses(strBody As String) As Variant

    Dim objMatch As Object
    Dim arrMatches()
    Dim lngCount As Long

    arrMatches = Array()

    With CreateObject("VBScript.RegExp")
        .MultiLine = True
        .IgnoreCase = True
        .Global = True
        .Pattern = "\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"
        For Each objMatch In .Execute(strBody)
            ReDim Preserve arrMatches(lngCount)
            arrMatches(lngCount) = objMatch.Value
            lngCount = lngCount + 1
        Next
    End With

    ExtractEmailAddresses = arrMatches

End Function

                                          

Open in new window

0
frugalmuleAuthor Commented:
What was wrong?
0
frugalmuleAuthor Commented:
It's not evident that it is running, but it does finish.  Once it finishes, the results are MUCH better but not quite home free.  Here is how I experienced the output so far http://screencast.com/t/T0B4cJv5
0
MacroShadowCommented:
Ok, We'll have to do some debugging to figure out which chunk of code is the problematic one.
Please remove lines 22-32. Is the outcome acceptable?
0
frugalmuleAuthor Commented:
Removing lines 22-32 gets rid of trash in the name, and only returns one name per email (PERFECT), but then only returns 50 or so emails as opposed to over 500.

The original goal was to extract email addresses from any email field (to, from, cc) or body, which would result in more emails.
0
MacroShadowCommented:
Ok, please remove lines 22-26. Is the outcome acceptable?
0
frugalmuleAuthor Commented:
Appears to be, produces more results, but duplicates aren't getting removed entirely.
0
MacroShadowCommented:
If the values in both columns don't match, technically it isn't considered a duplicate and therefore isn't removed.
0
frugalmuleAuthor Commented:
Ideally, it should remove rows with duplicate email addresses.

Please revise if possible.  Code comments would be most helpful.
0
MacroShadowCommented:
How about this:
Option Explicit

Sub pickfolder()

    Dim NS As Outlook.NameSpace
    Dim pickedfolder As Object
    Dim counter As Long, i As Long
    Dim arr, Item
    Dim xlApp As Object, xlBook As Object, xlSheet As Object

    Set NS = Application.GetNamespace("MAPI")
    Set pickedfolder = NS.pickfolder
    Set xlApp = CreateObject("Excel.Application")
    Set xlBook = xlApp.Workbooks.Add
    Set xlSheet = xlBook.Worksheets(1)
    
    ' Variable to hold row number to write to
    counter = 1
    
    For Each Item In pickedfolder.Items
        If Item.SenderEmailType = "SMTP" Then
        
            ' Copy the SenderEmailAddress to the excel sheet (row = counter, column = 1)
            xlSheet.Cells(counter, 1).Value = Item.SenderEmailAddress
            ' Copy the SenderName to the excel sheet (row = counter, column = 2)
            xlSheet.Cells(counter, 2).Value = Item.SenderName
            
            ' If message has a carbon copy
            If Item.CC <> "" Then
                ' increment row counter by 1
                counter = counter + 1
                ' Copy email address to the excel sheet (row = counter, column = 1)
                xlSheet.Cells(counter, 1).Value = ExtractEmailAddresses(Item.CC)
                ' Copy the SenderName to the excel sheet (row = counter, column = 2)
                xlSheet.Cells(counter, 2).Value = Replace(Replace(Replace(Item.CC, xlSheet.Cells(counter, 1).Value, ""), "<", ""), ">", "")
            End If
            
'            ' If message body isn't empty
'            If Item.Body <> "" Then
'                ' Populate an array with all addresses in the message
'                arr = ExtractEmailAddresses(Item.Body)
'                ' Loop the array from first to last element
'                For i = LBound(arr) To UBound(arr)
'                    ' Copy email address to the excel sheet
'                    xlSheet.Cells(counter, 1).Offset(i, 0).Value = arr(i)
'                Next
'            End If
        End If
        ' increment row counter by 1
        counter = counter + 1
    Next

    ' RemoveDuplicates
    xlSheet.Range("A:B").RemoveDuplicates Columns:=1, Header:=0
    xlApp.Visible = True

End Sub

Public Function ExtractEmailAddresses(strBody As String) As Variant

    Dim objMatch As Object
    Dim arrMatches()
    Dim lngCount As Long

    arrMatches = Array()

    With CreateObject("VBScript.RegExp")
        .MultiLine = True
        .IgnoreCase = True
        .Global = True
        .Pattern = "\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"
        For Each objMatch In .Execute(strBody)
            ReDim Preserve arrMatches(lngCount)
            arrMatches(lngCount) = objMatch.Value
            lngCount = lngCount + 1
        Next
    End With

    ExtractEmailAddresses = arrMatches

End Function

Open in new window

0
frugalmuleAuthor Commented:
It's only returning 50 or so emails.

The last result was over 400 but I don't know how many were unique.  Please advise.

Is it grabbing emails from the to line, the from line, the cc and bcc lines, and from the body?
0
MacroShadowCommented:
It's not grabbing email addresses from the body, notice the chunk of code that is commented out. Uncomment it to grab emails from the body too.

Are all 50 that are returned so far unique?
0
frugalmuleAuthor Commented:
Yes, the 50 are unique.  We should be grabbing email addresses from the body also, in addition to email fields (to, from, cc, and bcc).  Of'course you cannot get the name from emails found in the body and I understand that. I see comments in your code which I appreciate VERY much!  I don't see where any actions are commented out.

So still remaining is to grab all emails from email fields and body and to also grab the name or names used when it can, then de-duplicate.  That's it.
0
MacroShadowCommented:
Lines 38-47 are the body snatching block.
0
frugalmuleAuthor Commented:
I see said the blind man.  Thank you!!!


Q:  Is it grabbing from ALL email fields (to, from, cc and bcc)?

The only bug left appears to be that when running the body block that you had commented out, one of the names is showing up next to allot of email addresses where it shouldn't.  

Here is a video illustrating the issue http://screencast.com/t/l860jxU4Ei

Thanks for all your help.  We are sure lots closer!!
0
MacroShadowCommented:
A: the following code will grad email addresses from the following fields: To, From, Cc, Bcc, Body

How about this:
Option Explicit

Sub pickfolder()

    Dim NS As Outlook.NameSpace
    Dim pickedfolder As Object
    Dim counter As Long, i As Long
    Dim arr, Item
    Dim xlApp As Object, xlBook As Object, xlSheet As Object

    Set NS = Application.GetNamespace("MAPI")
    Set pickedfolder = NS.pickfolder
    Set xlApp = CreateObject("Excel.Application")
    Set xlBook = xlApp.Workbooks.Add
    Set xlSheet = xlBook.Worksheets(1)

    ' Variable to hold row number to write to
    counter = 1

    For Each Item In pickedfolder.Items
        If Item.SenderEmailType = "SMTP" Then
            ' From
            ' increment row counter by 1
            counter = counter + 1
            ' Copy the SenderEmailAddress to the excel sheet (row = counter, column = 1)
            xlSheet.Cells(counter, 1).Value = Item.SenderEmailAddress
            ' Copy the SenderName to the excel sheet (row = counter, column = 2)
            xlSheet.Cells(counter, 2).Value = Item.SenderName
        End If

        ' To
        If Item.recipients.Count > 0 Then
            ' increment row counter by 1
            counter = counter + 1
            ' Copy email address to the excel sheet (row = counter, column = 1)
            xlSheet.Cells(counter, 1).Value = Item.recipients.Item(1).AddressEntry.Address
            ' Copy the SenderName to the excel sheet (row = counter, column = 2)
            xlSheet.Cells(counter, 2).Value = Item.recipients.Item(1).AddressEntry.Name
        End If

        ' CC
        If Item.CC <> "" Then
            ' increment row counter by 1
            counter = counter + 1
            ' Copy email address to the excel sheet (row = counter, column = 1)
            xlSheet.Cells(counter, 1).Value = ExtractEmailAddresses(Item.CC)
            ' Copy the SenderName to the excel sheet (row = counter, column = 2)
            xlSheet.Cells(counter, 2).Value = Replace(Replace(Replace(Item.CC, xlSheet.Cells(counter, 1).Value, ""), "<", ""), ">", "")
        End If

        ' BCC
        If Item.BCC <> "" Then
            ' increment row counter by 1
            counter = counter + 1
            ' Copy email address to the excel sheet (row = counter, column = 1)
            xlSheet.Cells(counter, 1).Value = ExtractEmailAddresses(Item.BCC)
            ' Copy the SenderName to the excel sheet (row = counter, column = 2)
            xlSheet.Cells(counter, 2).Value = Replace(Replace(Replace(Item.BCC, xlSheet.Cells(counter, 1).Value, ""), "<", ""), ">", "")
        End If

        ' Body
        If Item.Body <> "" Then
            ' increment row counter by 1
            counter = counter + 1
            ' Populate an array with all addresses in the message
            arr = ExtractEmailAddresses(Item.Body)
            ' Loop the array from first to last element
            For i = LBound(arr) To UBound(arr)
                ' Copy email address to the excel sheet
                xlSheet.Cells(counter, 1).Offset(i, 0).Value = arr(i)
            Next
        End If

    Next

    ' RemoveDuplicates
    xlSheet.Range("A:B").RemoveDuplicates Columns:=1, Header:=0
    
    ' Make Excel visible
    xlApp.Visible = True

End Sub

Public Function ExtractEmailAddresses(strBody As String) As Variant

    Dim objMatch As Object
    Dim arrMatches()
    Dim lngCount As Long

    arrMatches = Array()

    With CreateObject("VBScript.RegExp")
        .MultiLine = True
        .IgnoreCase = True
        .Global = True
        .Pattern = "\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*"
        For Each objMatch In .Execute(strBody)
            ReDim Preserve arrMatches(lngCount)
            arrMatches(lngCount) = objMatch.Value
            lngCount = lngCount + 1
        Next
    End With

    ExtractEmailAddresses = arrMatches

End Function

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
frugalmuleAuthor Commented:
Excellent.  MUCH better results now with just under 500 unique emails.

The email column appears to be perfect.

Debug items include:

Removing the apostrophe (') marks that appear at the end of some of the names in column B.

A handful of emails (about 5 out of 492) include a few other names that should be included and 2 of them include an alarming number of names that can't be right.

I've sent you a private message also in hopes that you can take a look at that final bug with me.

It is good enough for me to run with as-is, but if there is anything we can do to fix the remaining bugs, I would greatly appreciate your effort.
0
frugalmuleAuthor Commented:
Outstanding sir, thank you!!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VB Script

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.