sara_bellum
asked on
python retrieval of email message body/text fails
I'm using python file globbing to read through a series of email messages and store each part of the message in a database table. So far I can retrieve each part of the header, but I can't isolate the body of the message, which should be simple except that it's not.
FILESPEC = "/path-to-eml-files/*.eml"
for f in files:
gg = open(f)
text = gg.read()
head = message_from_string(text)
message_id = head['Message-ID']
# etc for all portions of the email header...
but how can the body of the email be retrieved independently of the header for storage?
I tried readlines() but that too fails (output is chaotic so I probably have flow control issues here):
for f in files:
jj = open(f, 'r')
text = jj.readlines()
for i, line in enumerate(text):
if i >= 8: # body starts on line 8 (current eml format)
#print(line)
body = ''.join(line)
body_text = message_from_string(body)
print('start body', body_text)
FILESPEC = "/path-to-eml-files/*.eml"
for f in files:
gg = open(f)
text = gg.read()
head = message_from_string(text)
message_id = head['Message-ID']
# etc for all portions of the email header...
but how can the body of the email be retrieved independently of the header for storage?
I tried readlines() but that too fails (output is chaotic so I probably have flow control issues here):
for f in files:
jj = open(f, 'r')
text = jj.readlines()
for i, line in enumerate(text):
if i >= 8: # body starts on line 8 (current eml format)
#print(line)
body = ''.join(line)
body_text = message_from_string(body)
print('start body', body_text)
Can you attach some of your typical .elm file? (You can create some dummy one. I just want to see how complex the elm file is.)
My initial guess is that you should not parse the content of the file at all. It should be done by a parser probably from the email module. You are probably required only to open the file and pass the file object to the parser. (I do not have first-hand experience with the subject, but I dare to try if you attach the elm ;)
My initial guess is that you should not parse the content of the file at all. It should be done by a parser probably from the email module. You are probably required only to open the file and pass the file object to the parser. (I do not have first-hand experience with the subject, but I dare to try if you attach the elm ;)
ASKER
Thanks for writing pepr! For this drill I looked at my inbox and formatted some samples, making sure that all of my samples use the same format. There's no telling that another set of emails would have the same format of course, but extracting header data has proven to be much simpler than capturing body text, which has no title and any number of lines. I'd like to strip the message body of unnecessary empty lines also. So here's an eml:
Subject: Re: [WinEdt] Email-Mode
From: Roger Mudd <roger.mudd@gmx.net>
Reply-To: <winedt+list@wsg.net>
Date: Fri, 18 Nov 2011 12:35:19 +0200
To: WinEdt Mailing List <winedt+list@wsg.net>
Message-ID: <200805051111.33339999>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv: 1.0.0) Gecko/20020530
On 8/6/2002 7:35 AM, Robert W. Kuhn wrote:
The world will end on this date: 2002-02-02 18:12:00
Subject: Re: [WinEdt] Email-Mode
From: Roger Mudd <roger.mudd@gmx.net>
Reply-To: <winedt+list@wsg.net>
Date: Fri, 18 Nov 2011 12:35:19 +0200
To: WinEdt Mailing List <winedt+list@wsg.net>
Message-ID: <200805051111.33339999>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv: 1.0.0) Gecko/20020530
On 8/6/2002 7:35 AM, Robert W. Kuhn wrote:
The world will end on this date: 2002-02-02 18:12:00
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks pepr you made my day!! This is genius: if I'd thought to research the email module I should have found the get_payload() function myself! But I didn't think of it - it's too easy to miss important points when trying to learn many things at once.
Now that you've answered the question I should just close it, but will wait until tomorrow in case I think of something I failed to understand. Thanks again!
Now that you've answered the question I should just close it, but will wait until tomorrow in case I think of something I failed to understand. Thanks again!
ASKER
Thanks!!
ASKER
import pymysql
from glob import glob
from email import message_from_string
from datetime import datetime, timedelta
from email.utils import parsedate_tz, mktime_tz