VBSCRIPT - EXTRACT EMAIL ADDRESS FROM TEXT FILE

Hi there,

I need a VBScript (nothing else) that will extract the email addresses from a text file by searching for "@", then parsing the email address around it.  A text file may contain several email addresses, so all emails must be extracted.

Extracted emails will require to simply be displayed to the screen (the DOS window), not exported to a file.

The VSScript will be run by the batch file command line:
cscript ExtractEmail.vbs "TextFile.txt

Thanks for your help,
Rene
LVL 10
ReneGeAsked:
Who is Participating?
 
Daz_1234Connect With a Mentor Commented:
As requested:

Regards,
Daz.

Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strPattern, arrAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

arrAllMatches = fGetMatches(strPattern, strTestString)

If UBound(arrAllMatches) <> 0 Then
    WScript.Echo Join(arrAllMatches, vbCrlf)
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = Array("")
    Else
        '# Convert Collection to Array
        For Each sMatch In colMatches
            temp = temp & sMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Split(temp, "¶")
    End If

    fGetMatches = temp
End Function

Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
 
sshah254Connect With a Mentor Commented:
Can you upload a sample of the input text file?

Ss
0
 
ReneGeAuthor Commented:
It will be complicated for me to upload one cause it contains confidential infos. However, email addresses may be anywhere in it, and a line may contain more than 256 characters.

For example:
 envonfv ofnvfv fkdjfnv;djfn;dn email@domain.com.sdvzsd vj>¨^c zcln test@domain.com
 vsifdvjzs0v8djzs0fivdj zsofvn>anothermail@domain.com >| xidisdvnsv

Thanks and cheers,
Rene
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
PakaConnect With a Mentor Commented:
Do you have standard characters that are before and after the email address (such as space and > as in your example)?  Without standard delimiters, extracting email addresses from this type of file will be extremely difficult.  For example, the first email from your example could be email@domain.com or email@domain.com.sd (which could be an email address in Sudan).
0
 
ReneGeAuthor Commented:
Paka,

Interessting question/comment.

For the sake of this thread, the moment an email caracter that is not "a-z,0-9,.,_,-" etc, should be it's dellimiter.

I hope that helps.

Thanks and cheers,
Rene
0
 
Daz_1234Commented:
Hi Rene,

The script below uses a regular expression to attempt to extract all legal email addresses from a string.

You can see that I have set the strTestString variable to the sample you gave earlier.  But there is a problem.  Currently the script returns the following valid format email addresses from your sample:

email@domain.com.sdvzsd
test@domain.com
anothermail@domain.com


As you can  see the first one, to a human, is clearly not a real email address (although it could be), but it is in a correct email address format.

It would help to refine the regular expression if you knew that all email addresses were to end in .com or .co.uk or a finite list.

Regards,
Daz.


Dim strPattern, arrAllMatches, strMatch, strMatches

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}(;([\w-\.]+)@\w{2,}(\.\w{2,}){1,5})*?"

strTestString = " envonfv ofnvfv fkdjfnv;djfn;dn email@domain.com.sdvzsd vj>¨^c zcln test@domain.com" & vbCrlf _
              & " vsifdvjzs0v8djzs0fivdj zsofvn>anothermail@domain.com >| xidisdvnsv "

colAllMatches = fGetMatches(strPattern, strTestString)

For Each strMatch In colAllMatches
    strMatches = strMatches & strMatch & vbCrlf
Next

MsgBox strMatches,, "Email Addresses found:"

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.
    '# Convert Collection to Array
    For Each sMatch In colMatches
        temp = temp & sMatch & vbCrlf
    Next
    temp = Split(temp, vbCrlf)
    fGetMatches = temp

End Function

Open in new window

0
 
Daz_1234Commented:
Slight change, the regular expression I copy/pasted above was to match a delimited list of email addresses from another of my scripts, so is unnecessarily complicated (although still works)

Updated script below, same as the previous one, but with changed line 3 (and no change in functionality).

Regards,
Daz


Dim strPattern, arrAllMatches, strMatch, strMatches

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}"

strTestString = " envonfv ofnvfv fkdjfnv;djfn;dn email@domain.com.sdvzsd vj>¨^c zcln test@domain.com" & vbCrlf _
              & " vsifdvjzs0v8djzs0fivdj zsofvn>anothermail@domain.com >| xidisdvnsv "

colAllMatches = fGetMatches(strPattern, strTestString)

For Each strMatch In colAllMatches
    strMatches = strMatches & strMatch & vbCrlf
Next

MsgBox strMatches,, "Email Addresses found:"

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.
    '# Convert Collection to Array
    For Each sMatch In colMatches
        temp = temp & sMatch & vbCrlf
    Next
    temp = Split(temp, vbCrlf)
    fGetMatches = temp

End Function

Open in new window

0
 
ReneGeAuthor Commented:
Daz_1234,

Nicely done!

Knowing that it will be called from a batch file with: cscript ExtractEmail.vbs "TextFile.txt"

Now, what seems to be left to be done is:
1-Insted og the "strTestString", it must read the text file called from the cscript (see up).
2-Have the output in the DOS window, not in a Windows box.

Thanks and cheers,
Rene
0
 
Daz_1234Commented:
Here you go, complete:

Regards,
Daz.


Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strPattern, arrAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

arrAllMatches = fGetMatches(strPattern, strTestString)

If UBound(arrAllMatches) <> 0 Then
    WScript.Echo Join(arrAllMatches, vbCrlf)
Else
    WScript.Echo "-- None Found --"
End If

WScript.Echo vbCrlf & "End of " & WScript.ScriptName

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = Array("")
    Else
        '# Convert Collection to Array
        For Each sMatch In colMatches
            temp = temp & sMatch & vbCrlf
        Next
        temp = Split(temp, vbCrlf)
    End If

    fGetMatches = temp
End Function

Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
 
Daz_1234Commented:
By the way, I added 'friendly' messages at lines 15 and 29.  Feel free to remove/comment those lines if not required.

If there are no matches it will return the string:
-- None Found --


Regards,
Daz.
0
 
ReneGeAuthor Commented:
I rem'ed "WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf
"

Now in the output, I get:
test@domain.com

End of FindEmail.vbs


Can the extra CRLF and "End of FindEmail.vbs" be removed?

Thanks and cheers,
Rene
0
 
pony10usConnect With a Mentor Commented:
Rene,

Remove line 29:  WScript.Echo vbCrlf & "End of " & WScript.ScriptName

as Daz stated should take care of that.

0
 
ReneGeAuthor Commented:
Thanks DAZ for your excellent work. You'r a hero !!

Thanks ss and Paka for your contribution.

Cheers,
Rene
0
 
pony10usCommented:
Rene

I am glad you got this working. However I don't really think I deserved any of the points as I merely pointed out what Daz had already said.  I really didn't contribute anything to your solution.
0
 
ReneGeAuthor Commented:
Your little contribution made me realized what he did without me looking for it. So yes, you deserve some points. Don't worry, to respect Daz efforts, I only gave you 20.

Thanks and cheers,
Rene
0
 
Daz_1234Commented:
As far as I'm concerned that is fair enough, since although the contribution was small, it was indeed a positive one in the spirit of helping out  ;o)

Daz.
0
 
ReneGeAuthor Commented:
Well said Daz!

Rene

0
 
pony10usCommented:
Well thank you both then and I will accept them graciously.    :)

I actually was fascinated by the topic title so stopped in to read it and just answered the question.  

I like what you did Daz.
0
 
ReneGeAuthor Commented:
FYI,

After using the script, I found that in a certain situation, emails could not be extracted.

So I created a new thread.
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26888745.html

Thanks and cheers,
Rene
0
 
Daz_1234Commented:
Fixed script below.

Daz.


Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strPattern, strAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strPattern = "([\w-\.]+)@\w{2,}(\.\w{2,}){1,5}"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

strAllMatches = fGetMatches(strPattern, strTestString)

If strAllMatches <> "" Then
    WScript.Echo strAllMatches
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each sMatch In colMatches
            temp = temp & sMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)
    End If
    fGetMatches = temp
End Function

Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.