python retrieval of email message body/text fails

I'm using python file globbing to read through a series of email messages and store each part of the message in a database table. So far I can retrieve each part of the header, but I can't isolate the body of the message, which should be simple except that it's not.  
   
FILESPEC = "/path-to-eml-files/*.eml"
for f in files:
    gg = open(f)
    text = gg.read()                
    head = message_from_string(text)
    message_id = head['Message-ID']
# etc for all portions of the email header...
but how can the body of the email be retrieved independently of the header for storage?

I tried readlines() but that too fails (output is chaotic so I probably have flow control issues here):

for f in files:
    jj = open(f, 'r')
    text = jj.readlines()
    for i, line in enumerate(text):
         if i >= 8: # body starts on line 8 (current eml format)
            #print(line)
            body = ''.join(line)
    body_text = message_from_string(body)
    print('start body', body_text)
sara_bellumAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sara_bellumAuthor Commented:
It may be useful if I post the modules I'm importing fyi:
import pymysql
from glob import glob
from email import message_from_string
from datetime import datetime, timedelta
from email.utils import parsedate_tz, mktime_tz
0
peprCommented:
Can you attach some of your typical .elm file?  (You can create some dummy one.  I just want to see how complex the elm file is.)

My initial guess is that you should not parse the content of the file at all.  It should be done by a parser probably from the email module.  You are probably required only to open the file and pass the file object to the parser.  (I do not have first-hand experience with the subject, but I dare to try if you attach the elm ;)
0
sara_bellumAuthor Commented:
Thanks for writing pepr! For this drill I looked at my inbox and formatted some samples, making sure that all of my samples use the same format. There's no telling that another set of emails would have the same format of course, but extracting header data has proven to be much simpler than capturing body text, which has no title and any number of lines. I'd like to strip the message body of unnecessary empty lines also. So here's an eml:

Subject: Re: [WinEdt] Email-Mode
From: Roger Mudd <roger.mudd@gmx.net>
Reply-To: <winedt+list@wsg.net>
Date: Fri, 18 Nov 2011 12:35:19 +0200
To: WinEdt Mailing List <winedt+list@wsg.net>
Message-ID: <200805051111.33339999>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv: 1.0.0) Gecko/20020530

On 8/6/2002 7:35 AM, Robert W. Kuhn wrote:

The world will end on this date: 2002-02-02 18:12:00
0
HTML5 and CSS3 Fundamentals

Build a website from the ground up by first learning the fundamentals of HTML5 and CSS3, the two popular programming languages used to present content online. HTML deals with fonts, colors, graphics, and hyperlinks, while CSS describes how HTML elements are to be displayed.

peprCommented:
Try the following script:

a.py
import email

fname = 'data.elm'

# Open the elm file in text mode and pass the file object to the parser.
# the object of the email.message.Message class is returned.
f = open(fname)
m = email.message_from_file(f)
f.close()

print(type(m))

# Extract some info.
print('\nIs it a multipart message?', m.is_multipart())

# The Message object acts as a dictionary of header items.  Let's see what keys and
# values are stored inside.  (I did not check why the order seems to 
# be preserved; probably the new ordered dict is used for the implementation.)
print('\nHeaders -----------------------')
for k, v in m.items():
    print('{0}: {1}'.format(k, v))
    
print('The subject:', m['subject'])  # notice the case insensitivity
print('The subject:', m['SuBjEcT'])  # notice the case insensitivity

print('\nThe body ------------------------')
s = m.get_payload()                  # if not multipart, then string
print(s)

print('\nYou can process the string the way you need ----')
# Split to the list of lines and trim the trailing spaces.
lines = [e.rstrip() for e in s.split('\n')]  # the list comprehension used
print(lines)       # representation of the list is printed

# Remove the empty lines and prepend the '> ', join to a single
# multiline string again.
s2 = '\n'.join('> ' + e for e in lines if e)
print('\n--------------------------------------')
print(s2)

Open in new window



Here is your data sample.  I have added some "empty" lines and trailing spaces intentionally.  They are removed during processing.

data.txt

I can see the following output (wrapped by the console):
c:\tmp\_Python\sara_bellum\Q_27520670>python3 a.py
<class 'email.message.Message'>

Is it a multipart message? False

Headers -----------------------
Subject: Re: [WinEdt] Email-Mode
From: Roger Mudd <roger.mudd@gmx.net>
Reply-To: <winedt+list@wsg.net>
Date: Fri, 18 Nov 2011 12:35:19 +0200
To: WinEdt Mailing List <winedt+list@wsg.net>
Message-ID: <200805051111.33339999>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv: 1.0.0) Gecko/200
20530
The subject: Re: [WinEdt] Email-Mode
The subject: Re: [WinEdt] Email-Mode

The body ------------------------
On 8/6/2002 7:35 AM, Robert W. Kuhn wrote:

The world will end on this date: 2002-02-02 18:12:00




You can process the string the way you need ----
['On 8/6/2002 7:35 AM, Robert W. Kuhn wrote:', '', 'The world will end on this d
ate: 2002-02-02 18:12:00', '', '', '']

--------------------------------------
> On 8/6/2002 7:35 AM, Robert W. Kuhn wrote:
> The world will end on this date: 2002-02-02 18:12:00

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
sara_bellumAuthor Commented:
Thanks pepr you made my day!! This is genius: if I'd thought to research the email module I should have found the get_payload() function myself! But I didn't think of it - it's too easy to miss important points when trying to learn many things at once.

Now that you've answered the question I should just close it, but will wait until tomorrow in case I think of something I failed to understand. Thanks again!
0
sara_bellumAuthor Commented:
Thanks!!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Python

From novice to tech pro — start learning today.