Solved

VBSCRIPT - EXTRACT EMAIL ADDRESS FROM TEXT FILE

Posted on 2011-03-14
20
2,553 Views
Last Modified: 2012-05-11
Hi there,

I need a VBScript (nothing else) that will extract the email addresses from a text file by searching for "@", then parsing the email address around it.  A text file may contain several email addresses, so all emails must be extracted.

Extracted emails will require to simply be displayed to the screen (the DOS window), not exported to a file.

The VSScript will be run by the batch file command line:
cscript ExtractEmail.vbs "TextFile.txt

Thanks for your help,
Rene
0
Comment
Question by:ReneGe
  • 8
  • 7
  • 3
  • +2
20 Comments
 
LVL 9

Assisted Solution

by:sshah254
sshah254 earned 20 total points
ID: 35133753
Can you upload a sample of the input text file?

Ss
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35133791
It will be complicated for me to upload one cause it contains confidential infos. However, email addresses may be anywhere in it, and a line may contain more than 256 characters.

For example:
 envonfv ofnvfv fkdjfnv;djfn;dn email@domain.com.sdvzsd vj>¨^c zcln test@domain.com
 vsifdvjzs0v8djzs0fivdj zsofvn>anothermail@domain.com >| xidisdvnsv

Thanks and cheers,
Rene
0
 
LVL 22

Assisted Solution

by:Paka
Paka earned 20 total points
ID: 35134702
Do you have standard characters that are before and after the email address (such as space and > as in your example)?  Without standard delimiters, extracting email addresses from this type of file will be extremely difficult.  For example, the first email from your example could be email@domain.com or email@domain.com.sd (which could be an email address in Sudan).
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35134926
Paka,

Interessting question/comment.

For the sake of this thread, the moment an email caracter that is not "a-z,0-9,.,_,-" etc, should be it's dellimiter.

I hope that helps.

Thanks and cheers,
Rene
0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35137069
Hi Rene,

The script below uses a regular expression to attempt to extract all legal email addresses from a string.

You can see that I have set the strTestString variable to the sample you gave earlier.  But there is a problem.  Currently the script returns the following valid format email addresses from your sample:

email@domain.com.sdvzsd
test@domain.com
anothermail@domain.com


As you can  see the first one, to a human, is clearly not a real email address (although it could be), but it is in a correct email address format.

It would help to refine the regular expression if you knew that all email addresses were to end in .com or .co.uk or a finite list.

Regards,
Daz.


Dim strPattern, arrAllMatches, strMatch, strMatches

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}(;([\w-\.]+)@\w{2,}(\.\w{2,}){1,5})*?"

strTestString = " envonfv ofnvfv fkdjfnv;djfn;dn email@domain.com.sdvzsd vj>¨^c zcln test@domain.com" & vbCrlf _
              & " vsifdvjzs0v8djzs0fivdj zsofvn>anothermail@domain.com >| xidisdvnsv "

colAllMatches = fGetMatches(strPattern, strTestString)

For Each strMatch In colAllMatches
    strMatches = strMatches & strMatch & vbCrlf
Next

MsgBox strMatches,, "Email Addresses found:"

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.
    '# Convert Collection to Array
    For Each sMatch In colMatches
        temp = temp & sMatch & vbCrlf
    Next
    temp = Split(temp, vbCrlf)
    fGetMatches = temp

End Function

Open in new window

0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35137146
Slight change, the regular expression I copy/pasted above was to match a delimited list of email addresses from another of my scripts, so is unnecessarily complicated (although still works)

Updated script below, same as the previous one, but with changed line 3 (and no change in functionality).

Regards,
Daz


Dim strPattern, arrAllMatches, strMatch, strMatches

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}"

strTestString = " envonfv ofnvfv fkdjfnv;djfn;dn email@domain.com.sdvzsd vj>¨^c zcln test@domain.com" & vbCrlf _
              & " vsifdvjzs0v8djzs0fivdj zsofvn>anothermail@domain.com >| xidisdvnsv "

colAllMatches = fGetMatches(strPattern, strTestString)

For Each strMatch In colAllMatches
    strMatches = strMatches & strMatch & vbCrlf
Next

MsgBox strMatches,, "Email Addresses found:"

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.
    '# Convert Collection to Array
    For Each sMatch In colMatches
        temp = temp & sMatch & vbCrlf
    Next
    temp = Split(temp, vbCrlf)
    fGetMatches = temp

End Function

Open in new window

0
 
LVL 10

Author Comment

by:ReneGe
ID: 35137619
Daz_1234,

Nicely done!

Knowing that it will be called from a batch file with: cscript ExtractEmail.vbs "TextFile.txt"

Now, what seems to be left to be done is:
1-Insted og the "strTestString", it must read the text file called from the cscript (see up).
2-Have the output in the DOS window, not in a Windows box.

Thanks and cheers,
Rene
0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35137847
Here you go, complete:

Regards,
Daz.


Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strPattern, arrAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

arrAllMatches = fGetMatches(strPattern, strTestString)

If UBound(arrAllMatches) <> 0 Then
    WScript.Echo Join(arrAllMatches, vbCrlf)
Else
    WScript.Echo "-- None Found --"
End If

WScript.Echo vbCrlf & "End of " & WScript.ScriptName

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = Array("")
    Else
        '# Convert Collection to Array
        For Each sMatch In colMatches
            temp = temp & sMatch & vbCrlf
        Next
        temp = Split(temp, vbCrlf)
    End If

    fGetMatches = temp
End Function

Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35137862
By the way, I added 'friendly' messages at lines 15 and 29.  Feel free to remove/comment those lines if not required.

If there are no matches it will return the string:
-- None Found --


Regards,
Daz.
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35138288
I rem'ed "WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf
"

Now in the output, I get:
test@domain.com

End of FindEmail.vbs


Can the extra CRLF and "End of FindEmail.vbs" be removed?

Thanks and cheers,
Rene
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 12

Accepted Solution

by:
Daz_1234 earned 440 total points
ID: 35138338
As requested:

Regards,
Daz.

Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strPattern, arrAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

arrAllMatches = fGetMatches(strPattern, strTestString)

If UBound(arrAllMatches) <> 0 Then
    WScript.Echo Join(arrAllMatches, vbCrlf)
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = Array("")
    Else
        '# Convert Collection to Array
        For Each sMatch In colMatches
            temp = temp & sMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Split(temp, "¶")
    End If

    fGetMatches = temp
End Function

Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
 
LVL 26

Assisted Solution

by:pony10us
pony10us earned 20 total points
ID: 35138350
Rene,

Remove line 29:  WScript.Echo vbCrlf & "End of " & WScript.ScriptName

as Daz stated should take care of that.

0
 
LVL 10

Author Comment

by:ReneGe
ID: 35138393
Thanks DAZ for your excellent work. You'r a hero !!

Thanks ss and Paka for your contribution.

Cheers,
Rene
0
 
LVL 26

Expert Comment

by:pony10us
ID: 35138768
Rene

I am glad you got this working. However I don't really think I deserved any of the points as I merely pointed out what Daz had already said.  I really didn't contribute anything to your solution.
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35138814
Your little contribution made me realized what he did without me looking for it. So yes, you deserve some points. Don't worry, to respect Daz efforts, I only gave you 20.

Thanks and cheers,
Rene
0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35138843
As far as I'm concerned that is fair enough, since although the contribution was small, it was indeed a positive one in the spirit of helping out  ;o)

Daz.
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35138869
Well said Daz!

Rene

0
 
LVL 26

Expert Comment

by:pony10us
ID: 35138954
Well thank you both then and I will accept them graciously.    :)

I actually was fascinated by the topic title so stopped in to read it and just answered the question.  

I like what you did Daz.
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35141399
FYI,

After using the script, I found that in a certain situation, emails could not be extracted.

So I created a new thread.
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26888745.html

Thanks and cheers,
Rene
0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35143025
Fixed script below.

Daz.


Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strPattern, strAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

strAllMatches = fGetMatches(strPattern, strTestString)

If strAllMatches <> "" Then
    WScript.Echo strAllMatches
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each sMatch In colMatches
            temp = temp & sMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)
    End If
    fGetMatches = temp
End Function

Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Active Directory replication delay is the cause to many problems.  Here is a super easy script to force Active Directory replication to all sites with by using an elevated PowerShell command prompt, and a tool to verify your changes.
Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now