Solved

VBSCRIPT: Extract telephone number from a text file

Posted on 2011-03-17
14
925 Views
Last Modified: 2012-05-11
Hi there,

I need to add the possibility to extract the phone number from a text file and output it in the standard format xxx-xxx-xxxx.

Also, if on the same line and on the left of the phone number, it finds the word fax, it would output it as "fax", as seen below.

source formats could be:
xxx-xxx-xxxx
(xxx) xxx-xxxx
(xxx)xxx-xxxx
xxx xxx xxxx
xxxxxxxxxx
xxx.xxx.xxxx

While parsing a file, the output should be:
email:email@domain.com
email:test@domain.com
phone:xxx-xxx-xxxx
fax:xxx-xxx-xxxx

By the way, this script was made by "Daz_1234" (big thanks to you)

Ref: http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26888745.html


Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strPattern, strAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strPattern = "([\w-\.]+)@[\w-]{2,}(\.[\w-]{2,}){1,5}"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

strAllMatches = fGetMatches(strPattern, strTestString)

If strAllMatches <> "" Then
    WScript.Echo strAllMatches
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName

Function fGetMatches(sPattern, sStr)
    Dim regEx, retVal, sMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each sMatch In colMatches
            temp = temp & sMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)
    End If
    fGetMatches = temp
End Function

Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
Comment
Question by:ReneGe
  • 7
  • 4
  • 3
14 Comments
 
LVL 20

Expert Comment

by:Proculopsis
ID: 35154873

Test here

  (?:\d{3}([-\. ]?)\d{3}\1\d{4})|\(\d{3}\)[ ]?\d{3}-\d{4}
0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35154894
Hi Rene,

As we've seen from the last one, knowing the source format possibilities is critical.  And you've obviously done your homework with the ones you've listed.  So just a couple of clarifying questions:

1. Will phone numbers always only have 10 numbers:  Not less than 10 or more, but only 10?
2. How will the word fax be present before?  For example:

Fax xxx-xxx-xxxx  (a space)
Fax: xxx  (a colon then a space)
Fax:xxxx (just a colon)
Fax<any non alphanumeric characters including symbols and spaces>xxxx

... or all or the above?  Any reduction in the number of possibilities makes for more accurate matching.

Regards,
Daz.
0
 
LVL 20

Expert Comment

by:Proculopsis
ID: 35154943

Test replace ($1$5-$3$6-$4$7) here for number reformatting:

  (?:(\d{3})([-\. ]?)(\d{3})\2(\d{4}))|\((\d{3})\)[ ]?(\d{3})-(\d{4})
0
 
LVL 20

Assisted Solution

by:Proculopsis
Proculopsis earned 50 total points
ID: 35155012

Sorry, forgot fax - replace paramter "$1$2$6-$4$7-$5$8" :

  (fax:)?(?:(?:(\d{3})([-\. ]?)(\d{3})\3(\d{4}))|\((\d{3})\)[ ]?(\d{3})-(\d{4}))
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35155568
==> DAZ, sounds right!

Hey guys, thanks for the fast response.

Just to make sure that my question is clear.  The output sample was not the input file. But how the data would be outputted to the screen.

SOURCE PHONE FORMAT:
xxx-xxx-xxxx
(xxx) xxx-xxxx
(xxx)xxx-xxxx
xxx xxx xxxx
xxxxxxxxxx
xxx.xxx.xxxx

OUTPUT FORMAT:
email:email@domain.com
email:test@domain.com
phone:xxx-xxx-xxxx
fax:xxx-xxx-xxxx

Thanks and cheers,
Rene
0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35157082
Hi Rene,

I used this test jumble of data:
<td width="150" valign="top">Moyen(s) de communication&nbsp;:</td>
<td>t?l?copieur&nbsp;: 905-555-1212<br />
courriel (courrier ?lectronique)&nbsp;: test.email@domain-name.com<br />
</td>

a.n@other.email.addr


fax:*&<>&^%£&*%^&£%£&%£&£() (())__  (123)456-7890

 (098) 765-4321 

</td>
</td>
</td>
</td>

Open in new window


... to test the script below and it seems to work!

Please test thoroughly :o)

Regards,
Daz.

Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strEmailPattern, strTelFaxPattern, strAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strEmailPattern = "([\w-\.]+)@[\w-]{2,}(\.[\w-]{2,}){1,5}"
strTelFaxPattern = "((fax)?(?:[^A-Za-z0-9\n\r\t]*))?\(?(\d{3})\)?[- \.]?(\d{3})[- \.]?(\d{4})"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll

strAllMatches = fGetEmailMatches(strEmailPattern, strTestString)
If strAllMatches <> "" Then strAllMatches = strAllMatches & vbCrlf
strAllMatches = strAllMatches & fGetTelFaxMatches(strTelFaxPattern, strTestString)

If strAllMatches <> "" Then
    WScript.Echo strAllMatches
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName



Function fGetEmailMatches(sPattern, sStr)
    Dim regEx, retVal, oMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each oMatch In colMatches
            temp = temp & "email:" & oMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)
    End If
   fGetEmailMatches = temp
End Function

Function fGetTelFaxMatches(sPattern, sStr)
    Dim regEx, retVal, oMatch, colMatches, temp, sPrefix
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each oMatch In colMatches
            sPrefix = "phone:"
            If LCase(Left(oMatch, 3)) <> "fax" Then
                temp = temp & sPrefix & oMatch.SubMatches(2) & "-" & oMatch.SubMatches(3) & "-" & oMatch.SubMatches(4) & "¶"
            End If
        Next
        For Each oMatch In colMatches
            sPrefix = "fax:"
            If LCase(Left(oMatch, 3)) = "fax" Then
                temp = temp & sPrefix & oMatch.SubMatches(2) & "-" & oMatch.SubMatches(3) & "-" & oMatch.SubMatches(4) & "¶"
            End If
        Next

        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)

    End If



    fGetTelFaxMatches = temp
End Function



Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
 
LVL 12

Expert Comment

by:Daz_1234
ID: 35157110
... sorry I forgot to put the results.  The result using the previous script and the previous test data was:

email:test.email@domain-name.com
email:a.n@other.email.addr
phone:905-555-1212
phone:098-765-4321
fax:123-456-7890


Regards,
Daz.
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 
LVL 10

Author Comment

by:ReneGe
ID: 35158557
Daz,

Seems to be working well. I will start a thorough test this PM.

Some times, "fax" is also written in french "telecopieur" or "télépocieur". Do you think you could add those 2 alternatives in the script? The output would still be standardized as "fax".

Thanks,
Rene
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35158801
After revising, the french version of fax will require to be "copieur" since dealing with page codes may be tricky.

Thanks,
Rene
0
 
LVL 12

Accepted Solution

by:
Daz_1234 earned 450 total points
ID: 35164339
The script below, when fed the following jumble of data:
<td width="150" valign="top">Moyen(s) de communication&nbsp;:</td>
<td>t?l?copieur&nbsp;: 905-555-1212<br />
courriel (courrier ?lectronique)&nbsp;: test.email@domain-name.com<br />
</td>

a.n@other.email.addr


fax:*&<>&^%£&*%^&£%£&%£&£() (())__  (123)456-7890

 (098) 765-4321 

</td>

telecopieur 768878-6789

télécopieur:@££ (828).737.8123

</td>
</td>
</td>

Open in new window


... produces this output:

email:test.email@domain-name.com
email:a.n@other.email.addr
phone:098-765-4321
fax:905-555-1212
fax:123-456-7890
fax:768-878-6789
fax:828-737-8123

Regards,
Daz.
Option Explicit

Dim objArgs, fso, strFile, strTestString
Dim strEmailPattern, strTelFaxPattern, strAllMatches

Set objArgs = WScript.Arguments

ErrCheck objArgs.Count < 1, 1, "No argument specified."

strFile = objArgs(0)

Set fso = CreateObject("Scripting.FileSystemObject")
ErrCheck Not fso.FileExists(strFile), 1, "File supplied as argument cannot be found: '" & strFile & "'"

'# WScript.Echo "Checking file contents for email addresses: '" & strFile & "'" & vbCrlf

strEmailPattern = "([\w-\.]+)@[\w-]{2,}(\.[\w-]{2,}){1,5}"
strTelFaxPattern = "((copieur|fax)?(?:[^A-Za-z0-9\n\r\t]*))?\(?(\d{3})\)?[- \.]?(\d{3})[- \.]?(\d{4})"

strTestString = fso.OpenTextFile(strFile, 1).ReadAll
strTestString = Replace(strTestString, "&nbsp", " ")

strAllMatches = fGetEmailMatches(strEmailPattern, strTestString)
If strAllMatches <> "" Then strAllMatches = strAllMatches & vbCrlf
strAllMatches = strAllMatches & fGetTelFaxMatches(strTelFaxPattern, strTestString)

If strAllMatches <> "" Then
    WScript.Echo strAllMatches
Else
    WScript.Echo "-- None Found --"
End If

'# WScript.Echo vbCrlf & "End of " & WScript.ScriptName



Function fGetEmailMatches(sPattern, sStr)
    Dim regEx, retVal, oMatch, colMatches, temp
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each oMatch In colMatches
            temp = temp & "email:" & oMatch & "¶"
        Next
        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)
    End If
   fGetEmailMatches = temp
End Function

Function fGetTelFaxMatches(sPattern, sStr)
    Dim regEx, retVal, oMatch, colMatches, temp, sPrefix
    Set regEx = New RegExp     ' Create a regular expression.
    regEx.Pattern = sPattern   ' Set pattern.
    regEx.IgnoreCase = True   ' Set case insensitivity.
    regEx.Global = True        ' Set global applicability.

    Set colMatches = regEx.Execute(sStr)   ' Execute search.

    If colMatches.Count = 0 Then
        temp = ""
    Else
        For Each oMatch In colMatches
            sPrefix = "phone:"
            If LCase(Left(oMatch, 3)) <> "fax" And LCase(Left(oMatch, 7)) <> "copieur" Then
                temp = temp & sPrefix & oMatch.SubMatches(2) & "-" & oMatch.SubMatches(3) & "-" & oMatch.SubMatches(4) & "¶"
            End If
        Next
        For Each oMatch In colMatches
            sPrefix = "fax:"
            If LCase(Left(oMatch, 3)) = "fax" Or LCase(Left(oMatch, 7)) = "copieur" Then
                temp = temp & sPrefix & oMatch.SubMatches(2) & "-" & oMatch.SubMatches(3) & "-" & oMatch.SubMatches(4) & "¶"
            End If
        Next

        temp = Left(temp, Len(temp) - 1)
        temp = Replace(temp, "¶", vbCrlf)

    End If



    fGetTelFaxMatches = temp
End Function



Sub ErrCheck(blTest, iErrNum, sTxt)
    Dim sErrText
    If Not blTest Then Exit Sub
    sErrText = "Error: " & sTxt
    MsgBox sErrText, vbSystemModal + vbCritical, "Error in: " & WScript.ScriptName
    WScript.Quit iErrNum
End Sub

Open in new window

0
 
LVL 10

Author Closing Comment

by:ReneGe
ID: 35167356
The script works like a charm.

Thanks Daz, you'r a hero!!

Thanks Proculopsis for your generous contribution.

Cheers,
Rene
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35167727
I thought you may be interested in this one.

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26896804.html

Thanks and again; cheers,
Rene
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35168612
Hi again,

I thought you may also be interested in this one.

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26897116.html

Thanks and again and again; cheers,
Rene
0
 
LVL 10

Author Comment

by:ReneGe
ID: 35174017
Guys,  I need to convert the extracted emails to lower case.

So I created a new thread. Here it is:
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/VB_Script/Q_26898665.html

Thanks and cheers,
Rene
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
A short article about problems I had with the new location API and permissions in Marshmallow
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now