Solved

VBSCRIPT: Extract telephone number from a text file

Posted on 2011-03-17
14
922 Views
Last Modified: 2012-05-11
Hi there,

I need to add the possibility to extract the phone number from a text file and output it in the standard format xxx-xxx-xxxx.

Also, if on the same line and on the left of the phone number, it finds the word fax, it would output it as "fax", as seen below.

source formats could be:
xxx-xxx-xxxx
(xxx) xxx-xxxx
(xxx)xxx-xxxx
xxx xxx xxxx
xxxxxxxxxx
xxx.xxx.xxxx

While parsing a file, the output should be:
email:email@domain.com
email:test@domain.com
phone:xxx-xxx-xxxx
fax:xxx-xxx-xxxx

By the way, this script was made by "Daz_1234" (big thanks to you)

Ref: http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26888745.html


Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strPattern, strAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strPattern = "([\w-\.]+)@[\w-]{2,}(\.[\w-]{2,}){1,5}"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

strAllMatches = fGetMatches(strPattern, strTestString)

If strAllMatches <> "" Then
    WScript.Echo strAllMatches
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each sMatch In colMatches
            temp = temp & sMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)
    End If
    fGetMatches = temp
End Function

Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
Comment
Question by:ReneGe
  • 7
  • 4
  • 3
14 Comments
 
LVL 20

Expert Comment

by:Proculopsis
Comment Utility

Test here

  (?:\d{3}([-\. ]?)\d{3}\1\d{4})|\(\d{3}\)[ ]?\d{3}-\d{4}
0
 
LVL 12

Expert Comment

by:Daz_1234
Comment Utility
Hi Rene,

As we've seen from the last one, knowing the source format possibilities is critical.  And you've obviously done your homework with the ones you've listed.  So just a couple of clarifying questions:

1. Will phone numbers always only have 10 numbers:  Not less than 10 or more, but only 10?
2. How will the word fax be present before?  For example:

Fax xxx-xxx-xxxx  (a space)
Fax: xxx  (a colon then a space)
Fax:xxxx (just a colon)
Fax<any non alphanumeric characters including symbols and spaces>xxxx

... or all or the above?  Any reduction in the number of possibilities makes for more accurate matching.

Regards,
Daz.
0
 
LVL 20

Expert Comment

by:Proculopsis
Comment Utility

Test replace ($1$5-$3$6-$4$7) here for number reformatting:

  (?:(\d{3})([-\. ]?)(\d{3})\2(\d{4}))|\((\d{3})\)[ ]?(\d{3})-(\d{4})
0
 
LVL 20

Assisted Solution

by:Proculopsis
Proculopsis earned 50 total points
Comment Utility

Sorry, forgot fax - replace paramter "$1$2$6-$4$7-$5$8" :

  (fax:)?(?:(?:(\d{3})([-\. ]?)(\d{3})\3(\d{4}))|\((\d{3})\)[ ]?(\d{3})-(\d{4}))
0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
==> DAZ, sounds right!

Hey guys, thanks for the fast response.

Just to make sure that my question is clear.  The output sample was not the input file. But how the data would be outputted to the screen.

SOURCE PHONE FORMAT:
xxx-xxx-xxxx
(xxx) xxx-xxxx
(xxx)xxx-xxxx
xxx xxx xxxx
xxxxxxxxxx
xxx.xxx.xxxx

OUTPUT FORMAT:
email:email@domain.com
email:test@domain.com
phone:xxx-xxx-xxxx
fax:xxx-xxx-xxxx

Thanks and cheers,
Rene
0
 
LVL 12

Expert Comment

by:Daz_1234
Comment Utility
Hi Rene,

I used this test jumble of data:
<td width="150" valign="top">Moyen(s) de communication&nbsp;:</td>
<td>t?l?copieur&nbsp;: 905-555-1212<br />
courriel (courrier ?lectronique)&nbsp;: test.email@domain-name.com<br />
</td>

a.n@other.email.addr


fax:*&<>&^%£&*%^&£%£&%£&£() (())__  (123)456-7890

 (098) 765-4321 

</td>
</td>
</td>
</td>

Open in new window


... to test the script below and it seems to work!

Please test thoroughly :o)

Regards,
Daz.

Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strEmailPattern, strTelFaxPattern, strAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strEmailPattern = "([\w-\.]+)@[\w-]{2,}(\.[\w-]{2,}){1,5}"
strTelFaxPattern = "((fax)?(?:[^A-Za-z0-9\n\r\t]*))?\(?(\d{3})\)?[- \.]?(\d{3})[- \.]?(\d{4})"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

strAllMatches = fGetEmailMatches(strEmailPattern, strTestString)
If strAllMatches <> "" Then strAllMatches = strAllMatches & vbCrlf
strAllMatches = strAllMatches & fGetTelFaxMatches(strTelFaxPattern, strTestString)

If strAllMatches <> "" Then
    WScript.Echo strAllMatches
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName



Function fGetEmailMatches(sPattern, sStr)
    Dim regEx, retVal, oMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each oMatch In colMatches
            temp = temp & "email:" & oMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)
    End If
   fGetEmailMatches = temp
End Function

Function fGetTelFaxMatches(sPattern, sStr)
    Dim regEx, retVal, oMatch, colMatches, temp, sPrefix
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each oMatch In colMatches
            sPrefix = "phone:"
            If LCase(Left(oMatch, 3)) <> "fax" Then
                temp = temp & sPrefix & oMatch.SubMatches(2) & "-" & oMatch.SubMatches(3) & "-" & oMatch.SubMatches(4) & "¶"
            End If
        Next
        For Each oMatch In colMatches
            sPrefix = "fax:"
            If LCase(Left(oMatch, 3)) = "fax" Then
                temp = temp & sPrefix & oMatch.SubMatches(2) & "-" & oMatch.SubMatches(3) & "-" & oMatch.SubMatches(4) & "¶"
            End If
        Next

        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)

    End If



    fGetTelFaxMatches = temp
End Function



Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
 
LVL 12

Expert Comment

by:Daz_1234
Comment Utility
... sorry I forgot to put the results.  The result using the previous script and the previous test data was:

email:test.email@domain-name.com
email:a.n@other.email.addr
phone:905-555-1212
phone:098-765-4321
fax:123-456-7890


Regards,
Daz.
0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 10

Author Comment

by:ReneGe
Comment Utility
Daz,

Seems to be working well. I will start a thorough test this PM.

Some times, "fax" is also written in french "telecopieur" or "télépocieur". Do you think you could add those 2 alternatives in the script? The output would still be standardized as "fax".

Thanks,
Rene
0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
After revising, the french version of fax will require to be "copieur" since dealing with page codes may be tricky.

Thanks,
Rene
0
 
LVL 12

Accepted Solution

by:
Daz_1234 earned 450 total points
Comment Utility
The script below, when fed the following jumble of data:
<td width="150" valign="top">Moyen(s) de communication&nbsp;:</td>
<td>t?l?copieur&nbsp;: 905-555-1212<br />
courriel (courrier ?lectronique)&nbsp;: test.email@domain-name.com<br />
</td>

a.n@other.email.addr


fax:*&<>&^%£&*%^&£%£&%£&£() (())__  (123)456-7890

 (098) 765-4321 

</td>

telecopieur 768878-6789

télécopieur:@££ (828).737.8123

</td>
</td>
</td>

Open in new window


... produces this output:

email:test.email@domain-name.com
email:a.n@other.email.addr
phone:098-765-4321
fax:905-555-1212
fax:123-456-7890
fax:768-878-6789
fax:828-737-8123

Regards,
Daz.
Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strEmailPattern, strTelFaxPattern, strAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strEmailPattern = "([\w-\.]+)@[\w-]{2,}(\.[\w-]{2,}){1,5}"
strTelFaxPattern = "((copieur|fax)?(?:[^A-Za-z0-9\n\r\t]*))?\(?(\d{3})\)?[- \.]?(\d{3})[- \.]?(\d{4})"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll
strTestString = Replace(strTestString, "&nbsp", " ")

strAllMatches = fGetEmailMatches(strEmailPattern, strTestString)
If strAllMatches <> "" Then strAllMatches = strAllMatches & vbCrlf
strAllMatches = strAllMatches & fGetTelFaxMatches(strTelFaxPattern, strTestString)

If strAllMatches <> "" Then
    WScript.Echo strAllMatches
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName



Function fGetEmailMatches(sPattern, sStr)
    Dim regEx, retVal, oMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each oMatch In colMatches
            temp = temp & "email:" & oMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)
    End If
   fGetEmailMatches = temp
End Function

Function fGetTelFaxMatches(sPattern, sStr)
    Dim regEx, retVal, oMatch, colMatches, temp, sPrefix
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each oMatch In colMatches
            sPrefix = "phone:"
            If LCase(Left(oMatch, 3)) <> "fax" And LCase(Left(oMatch, 7)) <> "copieur" Then
                temp = temp & sPrefix & oMatch.SubMatches(2) & "-" & oMatch.SubMatches(3) & "-" & oMatch.SubMatches(4) & "¶"
            End If
        Next
        For Each oMatch In colMatches
            sPrefix = "fax:"
            If LCase(Left(oMatch, 3)) = "fax" Or LCase(Left(oMatch, 7)) = "copieur" Then
                temp = temp & sPrefix & oMatch.SubMatches(2) & "-" & oMatch.SubMatches(3) & "-" & oMatch.SubMatches(4) & "¶"
            End If
        Next

        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)

    End If



    fGetTelFaxMatches = temp
End Function



Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
 
LVL 10

Author Closing Comment

by:ReneGe
Comment Utility
The script works like a charm.

Thanks Daz, you'r a hero!!

Thanks Proculopsis for your generous contribution.

Cheers,
Rene
0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
I thought you may be interested in this one.

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26896804.html

Thanks and again; cheers,
Rene
0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
Hi again,

I thought you may also be interested in this one.

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26897116.html

Thanks and again and again; cheers,
Rene
0
 
LVL 10

Author Comment

by:ReneGe
Comment Utility
Guys,  I need to convert the extracted emails to lower case.

So I created a new thread. Here it is:
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26898665.html

Thanks and cheers,
Rene
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

This is an explanation of a simple data model to help parse a JSON feed
Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now