pulling html code out of the body of the message.

I am trying to script something or export all the messages in my mailbox to give me the body of the message unaltered by Notes.  If you look at document properties the body tag can contain HTML code but when opened Notes strips that out.  :-(  I am trying to take 5000 some email and put them into one text document and pull out any HTML tags for further processing and adding to our spam filter,  This is becoming a big pain in the rear.  

Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

First of all there is no straight forward method in R5. R6 I am not sure, but it has enhanced classes to handle mime type fields more elegantly.

So, to make it easy I would suggest you to use HTTP stream to obtain the content in html. How ?

Two possibilities, you can use java io stream to do that and second use GetDocumentByURL method to get html. For both these methods http task should be up and running.

Read this forum's question for more details on this technique.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Do you need just the bodies?  You canuse the MIME_PART type on Body (only available when you set Convertime=False).  Then, just recursively pul all the parts and spit them back out. I have some sample code...
This is hard coded for teh currently open NotesUiDocument, and hard-coded for c:\mime.txt.

It should not be difficult to adapt it to work with a NotesDocument cllection, and to name the otput file for each document based on the NoteID or UNID.

I would not dump all to a single file, since you will run into MIME parsing problems.

      Dim ws As New notesUiWorkspace
      Set doc = ws.currentDocument.document
      Dim s As New notesSession
      s.ConvertMime = False
      Dim body As notesItem
      Set body = doc.getFirstItem("Body")
      If body.type = MIME_PART Then
            Open "c:\mime.txt" For Output As #1
            Print #1 , "Mime-Version 1.0"
            Print #1 ,
            Print #1 , {This is a MIME-formatted message.  If you see this text it means that your
E-mail software does not support MIME-formatted messages.  Notes RTF MIME dump by Dovid Gross.}
            Print #1 ,
            Print #1 , "--" HeadersWIthBoundary ( body.getMimeEntity , Null , "" , doc.noteID) 'mime, headers-for-return, default boundary, filename
            DumpMIME body.getMimeEntity , "" , doc.universalID 'mime , level.level , boundary
            Close #1
            isComplexMIME = False
      End If
      s.ConvertMime = True
      Delete body
      If Not isComplexMIME Then
            Dim text As String
            Set body = doc.getFirstItem("Body")
            If body.type = Richtext Then
                  Dim rtf As notesRichTextItem
                  Set rtf = Body
                  text = rtf.getFormattedText(True,72)
                  text = body.text
            End If
            Dim pos As Integer , nl As String
            nl = Chr$(13) + Chr$(10)
            pos = Instr(text,nl)
            While pos
                  Dim two As String*2
                  two = Mid$(text,pos+2,2)
                  If two <> nl And two <> "==" Then
                        two = Mid$(text,pos-2,2)
                        If two <> "==" Then
                              Mid$(text,pos,2) = "  "
                        End If
                  End If
                  pos = Instr(pos+2,text,nl)
            Set doc = New notesDocument ( doc.parentDatabase )
            doc.form = "memo"
            doc.body = text
            ws.editDocument True , doc
      End If

Sub DumpMIME ( mime As notesMimeEntity , Byval level As String , boundary As String)
      Dim count As Integer
      If mime Is Nothing Then
            Print #1 ,
            Print #1 , ";(Empty MIME part - " & level & "x)"
            Print #1 ,
            Exit Sub
      End If
      If level <> "" Then
            isComplexMIME = True
      End If
      Do Until mime Is Nothing
            count = count + 1
            If (count > 1) Then
                  isComplexMIME = True
            End If
            Dim headers As String
            boundary = HeadersWithBoundary ( mime , headers , boundary , level & count)
            If (mime.contentType <> "text") Or (mime.contentSubType <> "plain") Then
                  isComplexMIME = True
            End If
            Print #1 ,
            Print #1 , "--" boundary
            Print #1 , ";Part " & level & count
            Print #1,  ";Content type:" & Chr(9) & mime.ContentType
            Print #1 , ";Content subtype:" & Chr(9) & mime.ContentSubtype
            Print #1 , ";Headers..."
            Print #1 ,  Headers
            Print #1 ,
            Print #1 , mime.ContentAsText
            Print #1 ,
            Print #1 , ";Children..."
            Print #1 ,
            DumpMIME mime.getFirstChildEntity , level & count & "." , boundary
            Set mime = mime.getNextSibling
      Print #1 , "--" boundary "--"
End Sub
Function HeadersWithBoundary ( mime As notesMimeEntity , headers As Variant , Byval boundary As String , id As String) As String
      doc.tempEvaluate = mime.Headers
      Dim newHeaders As Variant
      newHeaders = Evaluate({@ReplaceSubString(tempEvaluate; @Char(13); @Char(13) + @Char(10))} , doc )
      doc.removeItem "tempEvaluate"
      headers = newHeaders(0)
      Const boundMark = {Boundary="}
      Dim boundStart As Integer , boundEnd As Integer, boundSize
      boundStart = Instr(headers,boundMark)
      If boundStart Then
            boundStart = boundStart + Len(boundMark)
            boundEnd = Instr(boundStart , headers , {"})
            If boundEnd Then
                  boundSize = boundEnd - boundStart
                  boundary = Mid$(headers , boundStart , boundSize )
                  boundary = Mid$(headers , boundStart )
            End If
            boundary = boundary & "___" & doc.universalID
      End If
      Const typeMark = {Content-type: }
      boundStart = Instr(headers,typeMark)
      If boundStart Then
            boundStart = boundStart + Len(typeMark)
            boundEnd = Instr(boundStart,headers,"; ")
            boundSize = Instr(boundStart,headers," ")
            If boundEnd < 1 Or (boundSize < boundEnd And boundSize > 0) Then
                  boundEnd = boundSize - 1
                  Print "space as marker " id
            End If
            boundSize = Instr(boundStart,headers,Chr$(10))
            If boundEnd < 1 Or (boundSize < boundEnd And boundSize > 0) Then
                  boundEnd = boundSize -1
                  Print "lf as marker " id
            End If
            boundSize = Instr(boundStart,headers,Chr$(13))
            If boundEnd < 1 Or (boundSize < boundEnd And boundSize > 0) Then
                  boundEnd = boundSize - 1
                  Print "cr as marker " id
            End If
            If boundEnd = 0 Then
                  headers = headers & "; name=" & id & ".txt"
                  Print "no marker " id
                  headers = Left$(headers, boundEnd) & " ; name=" & id & ".txt; " & Mid$(headers, boundEnd + 1)
            End If
            Print "Unable to insert " id
      End If
      HeadersWithBoundary = boundary
End Function
Wow, now somebody writes code for free !
Nope, pulled that out of a test file I created to debug a MIME problem.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Lotus IBM

From novice to tech pro — start learning today.