?
Solved

How to trap text without unique identifier

Posted on 2003-03-21
16
Medium Priority
?
233 Views
Last Modified: 2010-05-01
I asked a question similiar to this in another area and have had only one response so will repost here. Using VBA in Access 2002, I need to parse data from the body of received emails. There is no unique identifier for the string. Here is an example:

------------------------------
Jane Doe's Address
------------------------------

Jane Doe
123 Frenchtown Rd
P.O. Box 1156
Gainsville, FL
21903-0012
USA

Thank you for stopping by,


I need to grab the address information from this block and add it to the appropriate fields in my DB. fldAddress1, fldAddress2, fldCity... etc. When I receive these emails, this block of data does not always begin or end on same line.

I have no idea how difficult this is but can find nothing written about it so will assume it is not an easy task.

Really need your help. TIA

0
Comment
Question by:colonelmustard
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 4
  • 3
  • +2
16 Comments
 
LVL 11

Expert Comment

by:rdrunner
ID: 8180770
Hmmm....

this is "kinda hard"

What happens if there are only 1 address line ?

'----
Jane Doe
123 Frenchtown Rd
Gainsville, FL
21903-0012
USA

'----

What i could think of is trying to get a hook on the address by extracting the ZIP-code and work your way from there ...

Will a country allways be there?

Will that text allways be at the END of the body?

If its allways the last part of the body you could use a small collection and limit it to say the last 6 lines read..

Another thought would be to look for the "," and split from there so you have Town/state....

I need to know more about what the user can/MUST enter in order to do some real processing....

If i know more about whats a MUST then you could try to tackle it by using regular expressions and generate a matchsstring that will break those lines up as needed...

Hope this helps so far ...
0
 

Author Comment

by:colonelmustard
ID: 8180861
Thanks for responding rdrunner. Yes the Country is always there....not always USA though. These lines always start on lines 32-35 and end at no more than line 40. More often then not there is only one address line. What you see on that little sniplet is the maximum information the user can enter on the web form. And everything listed between the:

------------------------------
Jane Doe's Address
------------------------------

and the:


Thank you for stopping by,


I need.


Hope this helps
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 8180883
show an example for the MIMIMUM dataentry please ...

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Expert Comment

by:rdrunner
ID: 8180934
i THINK i could cut it appart with regular expressions ... Lemme try to get my pattern streight ...

0
 

Author Comment

by:colonelmustard
ID: 8180938
My comment above was incorrect: The minimum data in this block would be:

Jane Doe
123 Frenchtown Rd
Gainsville, FL
21903
USA

the maximum would be:

Jane Doe
c/o America Corp
123 Frenchtown Rd
Gainsville, FL
21903-0012
USA

Is this more clear?
0
 
LVL 11

Expert Comment

by:rdrunner
ID: 8181294
Ill take another look on Sunday if it isnt solved till then ... I think i know whats needed, but i dont have the right version of IE installed ;)

I wanted to go for a match by the double occurence by the name and work from there on ...



0
 
LVL 5

Expert Comment

by:JohnMcCann
ID: 8181321
What you are trying to do is extremly difficult if not impossible.  My reasoning for this is quite simple.

a) There is no consistent start or end to the text.
b) The text will have no identifyable data.

If you attempted to implemnt some sort of searching algorythm it would be to complex and very unlikely to sucseed.  Your database would end up with invalidate data all over the place.

Thee only solution I would recommend to this problem, would be to use some form of data capture to capture the addresses eg. a standard from that is filled in and emailed with the address on formatted in a specific way.  Or an online form with a backend cgi script.


Best of luck

John McCann
0
 
LVL 5

Expert Comment

by:JohnMcCann
ID: 8181343
Just read the comment

What you see on that little sniplet is the maximum information the user can enter on the web form.

So you already have a web form creating the emails?
0
 

Author Comment

by:colonelmustard
ID: 8181551
JohnMcCann,
This email that I am trying to process is a payment notification from an online service. There are other fields that I have already captured because they are actually named on the email and easily extracted. This is the only instance of a shipping address...which is nuts. I have no control over these notifications except to accept them in there current form. But I need to get the addresses into the DB for shipping purposes. The only consistantcy is that the data always is the word "Thank" just after the empty line following the address block is the only instance of that word in the email. I was hoping that using that I could capture the 8 lines of text preceding it. Excluding lines that are carriage returns.

Thanks for responding
0
 

Author Comment

by:colonelmustard
ID: 8181576
EDITED----sorry for last post!

JohnMcCann,
This email that I am trying to process is a payment notification from an online service. There are other fields that I have already captured because they are actually named on the email and easily extracted. This is the only instance of a shipping address...which is nuts. I have no control over these notifications except to accept them in there current form. But I need to get the addresses into the DB for shipping purposes. The only consistantcy is that the data always preceeds the word "Thank", which is just after the empty line following the address block. It is the only instance of that word in the email. I was hoping that using that I could capture the 8 lines of text preceding it. Excluding lines that are carriage returns.

Thanks for responding

0
 
LVL 1

Accepted Solution

by:
cyberknet earned 2000 total points
ID: 8182968
Just a thought:

dim strStart as string, strEnd as string
dim lngStart as long, lngEnd as long
dim strAddress as string, strAddressLines() as string
dim strResult(5) as string

' match the start with a line of dashes and two returns.
strStart = "------------------------------" & vbCrLf & vbCrLf
' match the end with two returns and thank you...
strEnd = vbCrLf & vbCrLf & "Thank you for stopping by,"

' find the start character
lngStart = instr(1, strEmail, strStart)
lngStart = lngStart + len(strStart)
' find the end character
lngEnd = instr(lngStart, strEmail, strEnd)

if lngStart > 0 and lngEnd > 0 then
    strAddress = mid(StrEmail, lngStart, (lngEnd - lngStart))
    strAddressLines = split(strAddress, vbCrLf)
   
    if ubound(strAddressLines) = 5 then
        strResult(0) = strAddressLines(0)
        strResult(1) = strAddressLines(1)
        strResult(2) = strAddressLines(2)
        strResult(3) = strAddressLines(3)
        strResult(4) = strAddressLines(4)
        strResult(5) = strAddressLines(5)
    elseif ubound(strAddressLines) = 4 then
        strResult(0) = strAddressLines(0)
        strResult(1) = strAddressLines(1)
        strResult(3) = strAddressLines(2)
        strResult(4) = strAddressLines(3)
        strResult(5) = strAddressLines(4)
    end if
end if

----- end code
strResult(0) = "Jane Doe"
strResult(1) = 123 Frenchtown Rd (Address 1)
strResult(2) = P.O. Box 1156 (Address 2, if available)
strResult(3) = Gainsville, FL (CIty, State)
strResult(4) = 21903-0012 (zip code)
strResult(5) = USA (country)

this is pretty rough... but you should get the idea.
                 
0
 

Author Comment

by:colonelmustard
ID: 8183029
Thanks for responding...in the above example you make reference to a string named "strEmail"...can you tell me where that value is coming from? TIA
0
 

Author Comment

by:colonelmustard
ID: 8183030
Thanks for responding...in the above example you make reference to a string named "strEmail"...can you tell me where that value is coming from? TIA
0
 
LVL 1

Expert Comment

by:cyberknet
ID: 8183543
strEmail is the text in the email. You would need to populate it.
0
 
LVL 1

Expert Comment

by:cyberknet
ID: 8183590
also be aware that if there is no text between those two lines, or if those two lines do not appear in strEmail (after you populate strEmail with the text of the email) then there will be no address populated, just an array with 6 indexes (0 to 5) of blank strings.
0
 
LVL 3

Expert Comment

by:y2ksw
ID: 8196273
This is a typical problem solved best by Artificial Neural Networks ([A]NN). You have an arbitrary number of lines and contents, and the order may or might not be the same. Components may miss or my be complete, might be incorrect, or spaced differently, right and wrong mixed upper and lower case, interpunctuation, and so stuff, and all that recognition will be far too much to solve it easily with standard little code and efford.

Effectively, all main post offices have a NN address recognizer (sort of intelligent OCR), which distributes the mail to the respective mail boxes, trained with just about 6000 different ways to write a generic post address.

The types of NN you should look for is the Hopfield or the Bidirectional Associated Memory (BAM) algorithm. Both have pro's and cont's, but it's a nice and quiet simple way to solve all eventuallities and possibilities a human brain may invent for writing down his/hers address. Note that some of the algo's are capable to elaborate even legastenic problems (casual or regular letter and word inversions).

For easy source code (C++) have a look at:

http://www.geocities.com/CapeCanaveral/1624/
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
This article describes how to use a set of graphical playing cards to create a Draw Poker game in Excel or VB6.
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…
Suggested Courses
Course of the Month9 days, 15 hours left to enroll

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question