Solved

Searching  A Text File

Posted on 2002-05-29
8
165 Views
Last Modified: 2010-05-02
Hey Everyone,

What I am trying to do here is search a text file for certain keywords and values. On thing I need to search for is email addresses.  What is the best way to search through a text file scanning for these values?  Any ideas?

0
Comment
Question by:dsplice
8 Comments
 
LVL 22

Expert Comment

by:rspahitz
ID: 7042176
Open "myfile.txt" for binary as #1
strFileContents = input$(lof(1), #1)
close #1

' Search contents for e-mail address
iEMailPosit = 0
do
  iEMailPosit = instr(iEMailPosit+1, strFileContents, "@")
  if iEMailPosit =0 then
    exit do
  endif
  ' add extra code to determine start and end of e-mail address
  iEMailStart = instrrev(iEMailPosit, strFileContents, " ")
  iEMailEnd = instr(iEMailPosit+1, strFileContents, " ")
loop

' Note that the above logic will have to be expanded to accomodate other e-mail delimiters besides space characters.
0
 
LVL 18

Accepted Solution

by:
bobbit31 earned 50 total points
ID: 7042316
you could also use the microsoft script control to use javascript regular expressions:

ie:

Dim ff As Integer
Dim strLine As String
Dim scr As New ScriptControl
Dim funcRegExpr As String
Dim strFile As String

scr.Language = "javascript"

funcRegExpr = "function findExpression(str, pattern) {" & _
              "   var regEmailCheck = /[A-Za-z0-9\_\-]+\@[A-Za-z0-9\_\-]+.*\.\w{2,3}/g;" & _
              "   var res = regEmailCheck.exec(str);" & _
              "   return (res == null) ? '' : res;" & _
              "}"
scr.AddCode (funcRegExpr)

ff = FreeFile

Open "C:\my documents\test.txt" For Input As #ff

Do While Not EOF(ff)

    Line Input #ff, strLine
    strFile = strFile & strLine
   
Loop

Close (ff)

'' get all emails out
Dim strEmails As String
strEmails = scr.Eval("findExpression('" & strFile & "')")

If strEmails = "" Then
    MsgBox "No Emails Found"
Else
    MsgBox strEmails
End If

you might have to tweak the regular expression shown above... see the link below for some help w/ regular expressions:
http://www.marzie.com/devtools/misc/regexp.asp
0
 

Author Comment

by:dsplice
ID: 7042808
Thanks for the great comments...How would I go about capturing the entire email address?  I guess Im alittle unclear on the logic behind searching through the file.

0
 
LVL 22

Expert Comment

by:rspahitz
ID: 7042846
There's no easy answer because e-mail addresses are like postal addresses and are not necessarily in any common format.

Here are the restrictions as I understand them:

1) Must contain "@"
2) Must not contain any spaces or non-printable characters
3) "@" must be preceded by at least one valid character
4) "@" must be followed by at least one valid character
5) Somewhere following the "@" msut be a "." which will be followed by a domain category (com, edu, uk, fi, etc.)
6) Among the list of *possibly* invalid characters: @, *, ?, =, +, ", <, >, |, /, \.  Some of these may be valid, but not likely; other invalid characters probably exist.
7) Among the list of *probably* valid characters: A through Z, a through z, 0 through 9, -, _, .

Other than that, some servers may have additional limitations.

Based on this, your parsing routine must search for "@" symbols, then work backwards until it finds an invalid character, then work forward until it finds an invalid character.  The e-mail address is that which is located between the invalid characters.

Further clouding the issue is that carriage return/line feed combinations may get embedded in the e-mail address but are not part of the address.

Then, of course, there may be "@" symbols embedded within other contexts, such as "apples: 2@$0.29" or "my company is named Fan@ix."

And don't forget that when you extract all of these e-mails and start spamming people that your ISP can cancel your account and legal action could be taken against you.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 18

Expert Comment

by:bobbit31
ID: 7042851
adjustment to my above code:

Dim ff As Integer
Dim strLine As String
Dim scr As New ScriptControl
Dim funcRegExpr As String
Dim strFile As String

scr.Language = "javascript"

funcRegExpr = "function findExpression(str, pattern) {" & _
              "   var regEmailCheck = /\w+[\w-\.]*\@\w+((-\w+)|(\w*))\.[a-z]{2,3}/;" & _
              "   var res = regEmailCheck.exec(str);" & _
              "   return (res == null) ? '' : res;" & _
              "}"
scr.AddCode (funcRegExpr)

ff = FreeFile

Open "C:\my documents\test.txt" For Input As #ff

Do While Not EOF(ff)

    Line Input #ff, strLine
   
    '' check for email addresses
    Dim strEmails As String
    strEmails = scr.Eval("findExpression('" & strLine & "')")
   
    If strEmails <> "" Then
        MsgBox strEmails
    End If
   
   
Loop

Close (ff)


See what happens when you run this (strEmails will be your email address if there was one found)
0
 
LVL 18

Expert Comment

by:bobbit31
ID: 7042856
also, you can go to: http://www.regexlib.com/Default.aspx and search for other helpful regular expressions
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 7851251
Hi dsplice,
It appears that you have forgotten this question. I will ask Community Support to close it unless you finalize it within 7 days. I will ask a Community Support Moderator to:

    Accept bobbit31's comment(s) as an answer.

dsplice, if you think your question was not answered at all or if you need help, just post a new comment here; Community Support will help you.  DO NOT accept this comment as an answer.

EXPERTS: If you disagree with that recommendation, please post an explanatory comment.
==========
DanRollins -- EE database cleanup volunteer
0
 

Expert Comment

by:SpideyMod
ID: 7912850
per recommendation

SpideyMod
Community Support Moderator @Experts Exchange
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Martin
Here are a few simple, working, games that you can use as-is or as the basis for your own games. Tic-Tac-Toe This is one of the simplest of all games.   The game allows for a choice of who goes first and keeps track of the number of wins for…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now