Solved

Searching  A Text File

Posted on 2002-05-29
8
152 Views
Last Modified: 2010-05-02
Hey Everyone,

What I am trying to do here is search a text file for certain keywords and values. On thing I need to search for is email addresses.  What is the best way to search through a text file scanning for these values?  Any ideas?

0
Comment
Question by:dsplice
8 Comments
 
LVL 22

Expert Comment

by:rspahitz
ID: 7042176
Open "myfile.txt" for binary as #1
strFileContents = input$(lof(1), #1)
close #1

' Search contents for e-mail address
iEMailPosit = 0
do
  iEMailPosit = instr(iEMailPosit+1, strFileContents, "@")
  if iEMailPosit =0 then
    exit do
  endif
  ' add extra code to determine start and end of e-mail address
  iEMailStart = instrrev(iEMailPosit, strFileContents, " ")
  iEMailEnd = instr(iEMailPosit+1, strFileContents, " ")
loop

' Note that the above logic will have to be expanded to accomodate other e-mail delimiters besides space characters.
0
 
LVL 18

Accepted Solution

by:
bobbit31 earned 50 total points
ID: 7042316
you could also use the microsoft script control to use javascript regular expressions:

ie:

Dim ff As Integer
Dim strLine As String
Dim scr As New ScriptControl
Dim funcRegExpr As String
Dim strFile As String

scr.Language = "javascript"

funcRegExpr = "function findExpression(str, pattern) {" & _
              "   var regEmailCheck = /[A-Za-z0-9\_\-]+\@[A-Za-z0-9\_\-]+.*\.\w{2,3}/g;" & _
              "   var res = regEmailCheck.exec(str);" & _
              "   return (res == null) ? '' : res;" & _
              "}"
scr.AddCode (funcRegExpr)

ff = FreeFile

Open "C:\my documents\test.txt" For Input As #ff

Do While Not EOF(ff)

    Line Input #ff, strLine
    strFile = strFile & strLine
   
Loop

Close (ff)

'' get all emails out
Dim strEmails As String
strEmails = scr.Eval("findExpression('" & strFile & "')")

If strEmails = "" Then
    MsgBox "No Emails Found"
Else
    MsgBox strEmails
End If

you might have to tweak the regular expression shown above... see the link below for some help w/ regular expressions:
http://www.marzie.com/devtools/misc/regexp.asp
0
 

Author Comment

by:dsplice
ID: 7042808
Thanks for the great comments...How would I go about capturing the entire email address?  I guess Im alittle unclear on the logic behind searching through the file.

0
 
LVL 22

Expert Comment

by:rspahitz
ID: 7042846
There's no easy answer because e-mail addresses are like postal addresses and are not necessarily in any common format.

Here are the restrictions as I understand them:

1) Must contain "@"
2) Must not contain any spaces or non-printable characters
3) "@" must be preceded by at least one valid character
4) "@" must be followed by at least one valid character
5) Somewhere following the "@" msut be a "." which will be followed by a domain category (com, edu, uk, fi, etc.)
6) Among the list of *possibly* invalid characters: @, *, ?, =, +, ", <, >, |, /, \.  Some of these may be valid, but not likely; other invalid characters probably exist.
7) Among the list of *probably* valid characters: A through Z, a through z, 0 through 9, -, _, .

Other than that, some servers may have additional limitations.

Based on this, your parsing routine must search for "@" symbols, then work backwards until it finds an invalid character, then work forward until it finds an invalid character.  The e-mail address is that which is located between the invalid characters.

Further clouding the issue is that carriage return/line feed combinations may get embedded in the e-mail address but are not part of the address.

Then, of course, there may be "@" symbols embedded within other contexts, such as "apples: 2@$0.29" or "my company is named Fan@ix."

And don't forget that when you extract all of these e-mails and start spamming people that your ISP can cancel your account and legal action could be taken against you.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 18

Expert Comment

by:bobbit31
ID: 7042851
adjustment to my above code:

Dim ff As Integer
Dim strLine As String
Dim scr As New ScriptControl
Dim funcRegExpr As String
Dim strFile As String

scr.Language = "javascript"

funcRegExpr = "function findExpression(str, pattern) {" & _
              "   var regEmailCheck = /\w+[\w-\.]*\@\w+((-\w+)|(\w*))\.[a-z]{2,3}/;" & _
              "   var res = regEmailCheck.exec(str);" & _
              "   return (res == null) ? '' : res;" & _
              "}"
scr.AddCode (funcRegExpr)

ff = FreeFile

Open "C:\my documents\test.txt" For Input As #ff

Do While Not EOF(ff)

    Line Input #ff, strLine
   
    '' check for email addresses
    Dim strEmails As String
    strEmails = scr.Eval("findExpression('" & strLine & "')")
   
    If strEmails <> "" Then
        MsgBox strEmails
    End If
   
   
Loop

Close (ff)


See what happens when you run this (strEmails will be your email address if there was one found)
0
 
LVL 18

Expert Comment

by:bobbit31
ID: 7042856
also, you can go to: http://www.regexlib.com/Default.aspx and search for other helpful regular expressions
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 7851251
Hi dsplice,
It appears that you have forgotten this question. I will ask Community Support to close it unless you finalize it within 7 days. I will ask a Community Support Moderator to:

    Accept bobbit31's comment(s) as an answer.

dsplice, if you think your question was not answered at all or if you need help, just post a new comment here; Community Support will help you.  DO NOT accept this comment as an answer.

EXPERTS: If you disagree with that recommendation, please post an explanatory comment.
==========
DanRollins -- EE database cleanup volunteer
0
 

Expert Comment

by:SpideyMod
ID: 7912850
per recommendation

SpideyMod
Community Support Moderator @Experts Exchange
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
Most everyone who has done any programming in VB6 knows that you can do something in code like Debug.Print MyVar and that when the program runs from the IDE, the value of MyVar will be displayed in the Immediate Window. Less well known is Debug.Asse…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now