Solved

How to format data in an excel spreedsheet so it shows a unique list of data in the following format

Posted on 2016-10-25
12
68 Views
Last Modified: 2016-11-02
my goal: to export a list of email addresses from the sent items folder from my outlook 2010 and format the data into this format so i could import it into my firewall whitelist as a .txt file:

Allow, *@mxtoolbox.com, *
Allow, *@amazon.com, *
Allow, *@flip.com, *
Allow, *@demo.com, *
Allow, *@gmail.com, *
Allow, *@yahoo.com, *


I was able to export the email addresses from all the domains i have sent to (that were in my sent items folder in outlook 2010)

I did this by using this process:

Basically you can use the Outlook Import/Export function to achieve this. It's typically used to migrate the actual emails, but it suits the purpose of getting just the email addresses (or any other value for that matter).

Here is how to do it in Outlook (version 2007):

Click on 'File', 'Import and Export' to open the Wizard
Select 'Export to a File' [then click Next]
Select 'Microsoft Excel 97-2003' or a CSV file [click Next]
Select an Outlook folder [Next]
Click on Browse to select a location and then type the file name to export the information to (terminated by .xls) [Next]
Click on Map Custom Fields. This will bring up a list of all the available fields that are available in that folder.
Since we are only interested in Email address, click on Clear Map and then from the Left side click on From: (address) and drag that to the Right list
Click OK
Click Finish
Open the output file using Excel

but upon export via Outlook 2010 to a .xls file the output has given me a lot of extra data such as the following:

/o=yahoo/ou=first administrative group/cn=Recipients/cn=Joe
/o=yahoo/ou=first administrative group/cn=Recipients/cn=Mike
/o=yahoo/ou=first administrative group/cn=Recipients/cn=Bill
/o=yahoo/ou=first administrative group/cn=Recipients/cn=Ron
Helen@aol.com
Helen@aol.com
Helen@aol.com;Sarah@aol.com
Tim@mxtoolbox.com
Jen@amazon.com
Mark@flip.com;Joe@flip.com


As you could see there are many problems with this data:

1) All email addresses need to be removed from this data
2) the duplicate email addresses need to be removed from not only the entire column but also from the same cell.
3) out of the remaining data a list of unique domains names needs to be created
4) this unique list of domain names needs to be put in this format as a .txt file:


Allow, *@amazon.com, *
Allow, *@flip.com, *
Allow, *@aol.com, *
Allow, *@gmail.com, *
Allow, *@yahoo.com, *


Can anyone help me figure out how to do this in excel or do you know of any other third party software that will get the same?

Thank you in advance for your time!
0
Comment
Question by:IT_Field_Technician
  • 7
  • 3
  • 2
12 Comments
 
LVL 50

Expert Comment

by:Rgonzo1971
ID: 41859789
Hi,

pls try the code on the exported list in Excel it creates a new workbook that you can save as txt

Sub macro()
Dim Res() As Variant
Dim dict As Object
Set dict = CreateObject("Scripting.Dictionary")
Application.ScreenUpdating = False
Dim regExp As Object
Set regExp = CreateObject("vbscript.regexp")

With regExp
    .Pattern = "\@[^;]+"
    .Global = True
    For Each c In Range(Range("A1"), Range("A" & Rows.Count).End(xlUp))
        Set matches = .Execute(c)
        For Each Match In matches
            If Not dict.exists(Match.Value) Then
                dict.Add Match.Value, Match.Value
            End If
        Next
    Next
End With

Res = dict.Items()
Workbooks.Add
For Idx = 0 To UBound(Res)
    Range("A1").Offset(Idx) = "Allow, *" & Res(Idx) & ", *"
Next
Application.ScreenUpdating = True

End Sub

Open in new window

Regards
0
 
LVL 8

Expert Comment

by:Koen
ID: 41860112
besides the technical solution... whitelisting domains is almost never a good solution:
- you are continuously updating the list
- basic spammer and spoofers use hotmail.com or yahoo.com which you will probably have in your whitelist for business reasons anyway

A far better way is to check mail validity at your mailgate, i.e. check whether the targetted e-mail exists on your domain, if not bounce with (or without) notification, then pass the rest through a decent spam filter.
0
 

Author Comment

by:IT_Field_Technician
ID: 41860639
thank you Rgonzo1971 i will try this.

Koen - I am very interested in this - can you please elaborate - is this a feature in exchange or something? Does this approach have an official technical name?

thank you both!
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 

Author Comment

by:IT_Field_Technician
ID: 41860707
Rgonzo1971 - how do i execute this script?

I believe I need to create a new module but I am not sure what to do after that?
0
 

Author Comment

by:IT_Field_Technician
ID: 41860759
OK Rgonzo1971 - I was able to press the Run icon in the VBA editor and the code WORKED perfectly!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Thanks you so much!

Is there a way to edit the code so it would automatically remove common email addresses?
0
 

Author Comment

by:IT_Field_Technician
ID: 41860822
Is there a way to edit the code so it would automatically remove common email addresses?

Like:

live.com
email.com
yahoo.com
gmail.com
paypal.com
ect...
0
 
LVL 50

Expert Comment

by:Rgonzo1971
ID: 41861609
then try
Sub macro()
Dim Res() As Variant
Dim dict As Object
Set dict = CreateObject("Scripting.Dictionary")
Application.ScreenUpdating = False
Dim regExp As Object
Set regExp = CreateObject("vbscript.regexp")

With regExp
    .Pattern = "(\@)([^;]+)"
    .Global = True
    For Each c In Range(Range("A1"), Range("A" & Rows.Count).End(xlUp))
        Set matches = .Execute(c)
        For Each Match In matches
            If Not dict.exists(Match.submatches(1)) Then
                dict.Add Match.submatches(1), Match.submatches(1)
            End If
        Next
    Next
End With

Res = dict.Items()
Workbooks.Add
Range("A1").Resize(UBound(Res) + 1) = WorksheetFunction.Transpose(Res)
Application.ScreenUpdating = True

End Sub

Open in new window

0
 
LVL 8

Expert Comment

by:Koen
ID: 41861792
yes, it is about handling your mail in different steps, where the first step is the 'Before Arrival' test.
This is a topic typically threated in discussions about backscatter (if you don't do anything here, your mail server will send NDR's to the (often innocent domains being abused by spammers).

Typically a Before Arrival test will drop (not bounce) any mail send from the outside to a non-existing e-mail address, this avoiding sending 'false NDR's' (downside is that the few real mistakes (typo's) are dropped also, but in my opinion it does not outweigh the advantages).

there are 3th party products out there (no advertising), that can do it for you (no need to change your Exchange environment), but there is also elements you can do on the Exchange itself (without extra investment).

I am however not a techie... so I cannot tell you exactly what and how... sorry

but google on 'avoiding backscatter' , 'recipient filtering' and you should find plenty of info.

in this question on this forum you can also find some explanations and links on recipient filtering
0
 

Author Comment

by:IT_Field_Technician
ID: 41862736
Rgonzo1971

That script did execute without any errors but the results were not what i was looking for.

It gave me results of a bunch of email addresses first along with the data that was un-needed after that.

The first VBA code seemed to work perfectly but i need VBA code that will at that point take the results in this format:

Allow, *@domain1.com, *
Allow, *@flip.com, *
Allow, *@domain2.com, *
Allow, *@gmail.com, *
Allow, *@yahoo.com, *
Allow, *@domain3.com, *
Allow, *@domain4.com, *
Allow, *@aol.com, *
Allow, *@domain5.com, *

and remove any lines/cells that have commonly spoofed email domains such as:

live.com
email.com
yahoo.com
gmail.com
paypal.com
ect...

(I will enter my own list if you provide1-3 examples domains in the code you provide)
but yet keep the same format of the results such as:

Allow, *@domain1.com, *
Allow, *@domain2.com, *
Allow, *@domain3.com, *
Allow, *@domain4.com, *
Allow, *@domain5.com, *

Thank you again for all your help!!
0
 

Author Comment

by:IT_Field_Technician
ID: 41862743
Understood Koen Thanks for the information!
0
 
LVL 50

Accepted Solution

by:
Rgonzo1971 earned 500 total points
ID: 41863419
Corrected code
Sub macro()
Dim Res() As Variant
Dim regExp As Object
Dim dict As Object
Dim dictEx As Object
Set dict = CreateObject("Scripting.Dictionary")
Set dictEx = CreateObject("Scripting.Dictionary")
Set regExp = CreateObject("vbscript.regexp")

strExceptionList = "live.com,email.com,yahoo.com,gmail.com,paypal.com"

Application.ScreenUpdating = False

For Each itm In Split(strExceptionList, ",")
    If Not dictEx.exists(itm) Then dictEx.Add itm, itm
Next
With regExp
    .Pattern = "(\@)([^;]+)"
    .Global = True
    For Each c In Range(Range("A1"), Range("A" & Rows.Count).End(xlUp))
        Set matches = .Execute(c)
        For Each Match In matches
            If Not dict.exists(Match.Value) And Not dictEx.exists(Match.submatches(1)) Then
                dict.Add Match.Value, Match.Value
            End If
        Next
    Next
End With

Res = dict.Items()
Workbooks.Add
For Idx = 0 To UBound(Res)
    Range("A1").Offset(Idx) = "Allow, *" & Res(Idx) & ", *"
Next
Application.ScreenUpdating = True

End Sub

Open in new window

0
 

Author Closing Comment

by:IT_Field_Technician
ID: 41871140
It worked thanks!
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Find out what you should include to make the best professional email signature for your organization.
Freeze panes is an option within all variants of Excel to enable parts of a sheet to remain stationary when the cursor is in another part of the sheet. This is a very useful feature which is overlooked or under used.
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…
Many of my clients call in with monstrous Gmail overloading issues with Outlook. A quick tip is to turn off the All Mail and Important folders from synching. Here is a quick video I made to show you how to turn off these and other folders in Gmail s…

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question