Solved

Searching for specific text amongst large bodies of text.

Posted on 1998-11-12
24
319 Views
Last Modified: 2010-04-30
Using VB5, I have several fields in an underlying access database that contain large amounts of text.  Buried in amongst these multiple fields, and subsequent text are email addresses, usually more than one, often several.

What I need to do is have a function or sub that will cycle thru the records, searching the text and extract all the email addresses, then write the addresses into a field, one address only per field, into an mdb table.

This table will then be used to generate auto reply responses to my contacts.

Probably, the process would search thru the text looking for the @ symbol.  Then grab the text on either side.

Any ideas on how to do this.  I;m new to VB so need a comprehensive solution.

thanks
Wills
0
Comment
Question by:Wills030998
  • 11
  • 9
  • 3
  • +1
24 Comments
 

Author Comment

by:Wills030998
ID: 1444444
Edited text of question
0
 

Expert Comment

by:FeVeR
ID: 1444445
...here we go

for x = 1 to len(string)                 'cycle through string looking for @
if mid(x,1,string) = "@" then gosub getemailname 'mid(begin at, for how long, which string)
next x
end
getemailname:
for y = x to 1 step -1 'left side
if mid(x,1,string) = " " then leftpartofemail = mid(x-y,y,string)
next y
for z = x to len(string) 'right side
if mid(x,1,string) = " " then rightpartofemail = mid(x,y,string)
next z
emailadress = leftpartofemail + "@" + rightpartofemail

'** here you must enter the string emailadress into your database before it is cleared.

return 'to searching

there must be a space between emails, or else you must change the " " to whatever
alternatively, you could seach for ".com" but it might not be that

the string variable is the name of the field that you are searching through
leftpartofemail, rightpartofemail and emailadress are all strings


you might have to adapt my code a bit, but hopefully this will help you
good luck
fever
0
 

Expert Comment

by:FeVeR
ID: 1444446
you wont actually need to put in the ...+ "@" +...bit i've just seen that it will be part of rightpartofemail, and possibly leftpartofemail too   :)
you can tune it up
0
 
LVL 3

Expert Comment

by:vikiing
ID: 1444447
Fever:

To look for a certain sub-string into a string, it's much easier to use Instr() funtion than a For examining byte-by-byte.

x = Instr(DataRecord$, "@")
If x the gosub getmailname

That locates a "@" into DataRecord$, and, if found, X will hold its position into the string. If not found, Instr() returns zero.
0
 

Expert Comment

by:FeVeR
ID: 1444448
i knew it needed tuning up :)
0
 

Expert Comment

by:FeVeR
ID: 1444449
i knew it needed tuning up
0
 

Author Comment

by:Wills030998
ID: 1444450
As I said I'm new to VB and certainly not capable of tuning up your code!  Can you give me the complete code solution that I can copy and paste into my VB form, on cmdExtractEmail_click()

I also need the code that inserts the records (email addresses) into the table, say tblEmail, and the field fldEmailAddress.

thanks

0
 

Expert Comment

by:FeVeR
ID: 1444451
no problem
will have it ready at seven o'clock (my time - australia)
0
 

Author Comment

by:Wills030998
ID: 1444452
On re-reading my question and comments, I haven't explained things very well.  To get the complete solution, I need to give you more detail.  So here goes again..

The records to be searched will be located in database OutlookExport.mdb.  The table is called Email and the fields are called Subject and Body.  (This table is exported from Outlook98).

So when the button is clicked, all the records in table Email (both the Subject and Body fields) need to searched and the email addresses (of which there can be more than one in each field) need to be written to a different table, called tblEmail, and each found email address inserted into fldEmailAddress (one address per record).

Thanks for your help.  I realise I probably didn't explain it too well initially.

Wills

0
 

Author Comment

by:Wills030998
ID: 1444453
On re-reading my question and comments, I haven't explained things very well.  To get the complete solution, I need to give you more detail.  So here goes again..

The records to be searched will be located in database OutlookExport.mdb.  The table is called Email and the fields are called Subject and Body.  (This table is exported from Outlook98).

So when the button is clicked, all the records in table Email (both the Subject and Body fields) need to searched and the email addresses (of which there can be more than one in each field) need to be written to a different table, called tblEmail, and each found email address inserted into fldEmailAddress (one address per record).

Thanks for your help.  I realise I probably didn't explain it too well initially.

Wills

0
 

Expert Comment

by:FeVeR
ID: 1444454
Private Sub cmdExtractEmail_Click()
'WILLS - this code does exactly what you want it to - i actually made it in vb5
'as opposed to having a shot at doing it from memory (and very badly i might
'add) i am not quite clear on what you want to do regarding the table, but if
'you give me more detail i'm sure that i'll be able to do it for you.
'
'hope this helps... FeVeR

a$ = Text1.Text     'text1.text is your input string
                    'if you are loading from a file - use a rich textbox
                    'and set the filename

For x = 1 To Len(a$)
    If Mid(a$, x, 1) = "@" Then GoSub getemailname 'Mid(string, start[, length])
Next x
Exit Sub

'or you can use instr if you prefer

getemailname:
emailcheck = False 'guilty until proven innocent

For y = x To 1 Step -1 'this will search for the end of the left side
    If Mid(a$, y, 1) = " " Then 'space = end
        leftpartofemail = Mid(a$, y + 1, x - y)
        Exit For
    End If
Next y

For z = x To Len(a$) Step 1 'this will search for the end of the right side
    If Mid(a$, z, 1) = " " Then
        rightpartofemail = Mid(a$, x + 1, z - x - 1)
        Exit For
    End If
Next z

For w = 1 To Len(rightpartofemail) 'checks if the email obeys the format of ?@?.?
    If Mid(rightpartofemail, w, 1) = "." Then emailcheck = True
Next w
If emailcheck = False Then Return 'if it's not an email adress you don't want it

emailadress = leftpartofemail + rightpartofemail 'pretty straight forward

Text2.Text = Text2.Text + emailadress

Return 'to searching

End Sub
0
 

Author Comment

by:Wills030998
ID: 1444455
Did you notice my second comment.  may almost have crossed with your answer.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 

Author Comment

by:Wills030998
ID: 1444456
Hi FeVer.

Will it be possible to incorporate the requirements outlined in my last comment into your code.  Hope so...

thanks
Wills
0
 

Expert Comment

by:FeVeR
ID: 1444457
sorry about the delay
i was getting more ram
and they had problems
major problems
tell me how you setup your table
do you mean table or array?
0
 

Author Comment

by:Wills030998
ID: 1444458
Thought you must have gone walkabout - thanks for coming back.  I'm not using an array, but am using tables in access97 database.  A couple of comments earlier I give the field, table and database names.

hopefully that is all you need.  If you need something more specific, let me know.

cheers
0
 

Expert Comment

by:FeVeR
ID: 1444459
are you communicating with access then?
0
 

Expert Comment

by:FeVeR
ID: 1444460
i'm not sure if i will be able to help you then...
i don't even have access!
0
 

Author Comment

by:Wills030998
ID: 1444461
The tables mentioned above are used to store the data, are access mdb files.  Everything else is done from VB.  

The Access programme is not being used.
0
 

Expert Comment

by:lefmabo
ID: 1444462
Wills, the following is a lot of code but it works for me:

Private Sub ReadWriteStuff()
    Dim tmpTxt, strMsg, UtmpTxt As String
    Dim db As Database
    Dim qdIn As QueryDef
    Dim rsIn, rsOut As Recordset
    Dim RecCnt, idx, tidx, sidx, eidx, lidx, tlen As Long
    Dim DotFlag As Boolean
   
' open stuff and test for data

    Set db = Workspaces(0).OpenDatabase("x:\OutlookExport.mdb")
    Set qdIn = db.CreateQueryDef("")
    qdIn.SQL = "Select * from Email"
    Set rsIn = qdIn.OpenRecordset(dbOpenSnapshot)
    Set rsOut = db.OpenRecordset("tblEmail", dbOpenDynaset)

    If rsIn.BOF And rsIn.EOF Then
        GoSub CloseStuff
        strMsg = "No records found in Email table!"
        Exit Sub
    End If
   
' populate the recordset and get recordcount
    rsIn.MoveLast
    RecCnt = rsIn.RecordCount
   
    rsIn.MoveFirst
       
' loop through the records

    For idx = 1 To RecCnt
        tmpTxt = "" & rsIn.Fields("Subject")
        If Len(tmpTxt) > 0 Then
            GoSub GetEmailAddr
        End If
       
        tmpTxt = "" & rsIn.Fields("Body")
        If Len(tmpTxt) > 0 Then
            GoSub GetEmailAddr
        End If
       
        rsIn.MoveNext
    Next idx
   
    GoSub CloseStuff
    Exit Sub
   
GetEmailAddr:
' tack on extra spaces so we can find the beginning or
' end of email addresses without going out of bounds
' and make an uppercased copy

    tmpTxt = " " & tmpTxt & " "
    UtmpTxt = UCase(tmpTxt) ' this will be used later
   
' this loops through all of the text to find all occurrences
' of email addresses

    lidx = 2
GEALoop:
    tidx = InStr(lidx, tmpTxt, "@")
    If tidx < 1 Then GoTo GEALoopEnd

' get left side - pass all letters and numbers (do nothing),
' otherwise stop
    For sidx = tidx - 1 To 1 Step -1
        Select Case Mid(UtmpTxt, sidx, 1)
            Case "A" To "Z", "0" To "9"
            Case Else:  Exit For
        End Select
    Next sidx
    sidx = sidx + 1
   
' now get right side - here a period passes once
' first set flag to indicate we haven't yet hit it
    DotFlag = False

    For eidx = tidx + 1 To Len(tmpTxt) Step 1
        Select Case Mid(UtmpTxt, eidx, 1)
        ' first dot is ok - part of address - second
        ' would be end of address
            Case "."
                If DotFlag = False Then
                    DotFlag = True
                Else
                    Exit For
                End If
            Case "A" To "Z", "0" To "9"
            Case Else:  Exit For
        End Select
    Next eidx
   
    tlen = eidx - sidx
'now add the new address - grabbing it from the un-uppercased text
    rsOut.AddNew
    rsOut.Fields("fldEmailAddress") = Mid(tmpTxt, sidx, tlen)
    rsOut.Update
'set the looping index to get past this address in the text
    lidx = eidx
    GoTo GEALoop
GEALoopEnd:
    Return
   
CloseStuff:
        rsIn.Close
        Set rsIn = Nothing
        rsOut.Close
        Set rsOut = Nothing
        Set qdIn = Nothing
        db.Close
        Set db = Nothing
        Return
       
End Sub


   

0
 

Author Comment

by:Wills030998
ID: 1444463
lefmabo,

An impressive comment and very nearly perfect, but for one small hicup.

In New Zealand where I live email addresses can differ a bit from how they are used elsewhere.:

Normally the address will be  name@company.co.nz
but it could also be           firstname.lastname@company.co.nz

So the problem is that there can be more than one "period" on each side of the @ symbol.  When I tested your answer it appears to only look for one period either side of @.

Possibly we need to look for the space on either side of the @ symbol instead of the  period (.)

I have rejected FeVeR's answer - he appears to have abandoned this question, and you can answer it now.

Thanks very much for your help, and we are very nearly there!

Wills

0
 

Accepted Solution

by:
lefmabo earned 250 total points
ID: 1444464
Then forget the dotflag. Test all periods and accept all that aren't followed by a space. Use this:

' get left side - pass all letters and numbers AND PERIODS(do 'nothing), otherwise stop
    For sidx = tidx - 1 To 1 Step -1
        Select Case Mid(UtmpTxt, sidx, 1)
            Case "A" To "Z", "0" To "9" , "."
            Case Else:  Exit For
        End Select
    Next sidx
    sidx = sidx + 1
     
' now get right side - here a period passes IF THERE'S NO SPACE
' AFTER IT

    For eidx = tidx + 1 To Len(tmpTxt) Step 1
        Select Case Mid(UtmpTxt, eidx, 1)
            Case "."
                IF MID(UTMPTXT,EIDX+1,1) + " " THEN
                    Exit For
                End If
            Case "A" To "Z", "0" To "9"
            Case Else:  Exit For
        End Select
    Next eidx


0
 

Author Comment

by:Wills030998
ID: 1444465
Thanks for your additional code.  However I get a run-time error '13', Type mismatch on the line marked with >>>>>.


For eidx = tidx + 1 To Len(tmpTxt) Step 1
        Select Case Mid(UtmpTxt, eidx, 1)
            Case "."
>>>>>>             If Mid(UtmpTxt, eidx + 1, 1) + " " Then
                    Exit For
                End If
            Case "A" To "Z", "0" To "9"
            Case Else:  Exit For
        End Select
    Next eidx

I'm not sure what that line does (this code is well beyond my level of understanding), however when I rem out that line and the following two lines, the code works as wanted.  ie all the email addresses are extracted correctly and written to the table accordingly.

What does that If statement do?  How do we correct the type mismatch problem.

One final thing.  is it possible to avoid duplicate email addresses from being written to the table.  The table is always empty at the start of this process, so I'm looking at avoiding the duplicates from being written in the first place.

Not really a problem as they can be removed later, but any possibilities there.  If it's a hastle don't worry about it, the code above is the more important thing to look at.

thanks

0
 

Expert Comment

by:lefmabo
ID: 1444466
Sorry, Wills, that line has a typo in it. The + should be =. That line tests to see if a period is followed by a space which means that is a period at the end of a sentence (in case your email addresses are embedded in a paragraph for instance). In that case, the period indicates you've reached the end of the address. If it's not followed by a space,it's part of the address.

To only collect unique email addresses, in Access make fldEmailAddress the primary key in tblEmail or index it, but indicate that no duplicates are allowed. Trying to write a duplicate address would generate a duplicate key error or duplicate value error (I'm not quite sure what exactly), so include the statement "On Error Resume Next" somewhere in that module. Or look up what the error is and trap it - look up the use of Err.Number in help. But if you're not worried about other errors occuring - use resume next.
0
 

Author Comment

by:Wills030998
ID: 1444467
Perfect!

Thanks for your help
Wills

0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
Introduction In a recent article (http://www.experts-exchange.com/A_7811-A-Better-Concatenate-Function.html) for the Excel community, I showed an improved version of the Excel Concatenate() function.  While writing that article I realized that no o…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now