Solved

Need someone to modify this VBA code to handle Internet multiline header fields

Posted on 2012-03-26
4
660 Views
Last Modified: 2012-04-04
I need the below code modified to handle Internet headers that are "folded" into multiple lines per
'           /* Rfc 2822 2.2 Header Fields
'               Header fields are lines composed of a field name, followed by a colon
'               (":"), followed by a field body, and terminated by CRLF.  A field
'               name MUST be composed of printable US-ASCII characters (i.e.,
'               characters that have values between 33 and 126, inclusive), except
'               colon.  A field body may be composed of any US-ASCII characters,
'               except for CR and LF.  However, a field body may contain CRLF when
'               used in header "folding" and  "unfolding" as described in section
'               2.2.3.  All field bodies MUST conform to the syntax described in
'               sections 3 and 4 of this standard.
'
'              Rfc 2822 2.2.3 (Multiline header fields)
'               The process of moving from this folded multiple-line representation
'               of a header field to its single line representation is called
'               "unfolding". Unfolding is accomplished by simply removing any CRLF
'               that is immediately followed by WSP.  Each header field should be
'               treated in its unfolded form for further syntactic and semantic
'               evaluation.
'
'               Example:
'                   Subject: aaaaa<CRLF>
'                   <TAB or SP>aaaaa<CRLF>
'

I have attached 2 sample headers (email addresses are not real); the "To:" header is the header of interest.

Basically, I need the code modified to "unfold" multiline header fields by removing CRLF when it is immediately followed by whitespace.  This needs to be done BEFORE parsing the words to decode.

Thanks for any help--
Sam

Option Explicit

Const C_SEP = " "
Const C_EQUAL = "="
Const C_QUEST = "?"
'

Sub TestAllDecode()
    Dim arrh: arrh = Array( _
        "Test: =?utf-8?b?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=", _
        "Subject: =?utf-8?Q?Your=20Weekly=20HAR=205=2DMinute=20REALTOR=C2=AE=3A=20Posts=20for=20the=20week=20of=2003=2F06=2F2012?=", _
        "To: =?utf-8?Q?Fred?= <HAR@Equity.com>" _
    )
    Dim strH, strW, d
    For Each strH In arrh
        d = ""
        For Each strW In Split(strH, C_SEP)
            If d <> "" Then d = d & C_SEP
            d = d & mhdecode(strW)
        Next
        Debug.Print d
    Next
End Sub

' TODO:
' * "_" => space
' * first word always unencoded?
' * comments in parentheses for specific fields?

Function mhdecode(ByVal s As String) As String
    Dim hdr As String: hdr = C_EQUAL & C_QUEST ' "=?utf-8?"
    Dim ftr As String: ftr = C_QUEST & C_EQUAL
    Dim lenS As Integer: lenS = Len(s)
    If Left(s, Len(hdr)) = hdr And Right(s, Len(ftr)) = ftr Then
        Dim p1 As Integer: p1 = InStr(Len(hdr) + 1, s, C_QUEST)
        If p1 > Len(hdr) + 1 And p1 < lenS - 2 Then
            Dim p2 As Integer: p2 = InStr(p1 + 1, s, C_QUEST)
            If p2 > p1 + 1 And p2 < lenS - 2 Then
                Dim encCharSet As String: encCharSet = Mid(s, Len(hdr) + 1, p1 - Len(hdr) - 1)
                Dim encType As String: encType = Mid(s, p1 + 1, p2 - p1 - 1)
                Dim Message: Set Message = CreateObject("CDO.Message")
                Select Case UCase(encType)
                    Case "Q"
                        Message.BodyPart.ContentTransferEncoding = "quoted-printable"
                    Case "B"
                        Message.BodyPart.ContentTransferEncoding = "base64"
                    Case Else
                        Message.BodyPart.ContentTransferEncoding = "us-ascii" ' ?
                End Select
                Dim Stream 'As ADODB.Stream
                Set Stream = Message.BodyPart.GetEncodedContentStream
                'Set charset To base windows charset
                Stream.Charset = "windows-1250"
                Stream.WriteText Mid(s, p2 + 1, lenS - p2 - Len(ftr))
                Stream.Flush
                Stream.Close
                Set Stream = Nothing
                Set Stream = Message.BodyPart.GetDecodedContentStream
                Stream.Charset = encCharSet
                mhdecode = Stream.ReadText
                Stream.Close
                Set Stream = Nothing
            End If
        End If
    Else
        mhdecode = s
    End If
End Function

Open in new window


Code provided by robert_schutt (Thanks again!)
Header1.txt
Header2.txt
0
Comment
Question by:SAbboushi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 83

Expert Comment

by:CodeCruiser
ID: 37769126
0
 

Author Comment

by:SAbboushi
ID: 37789446
I don't believe it is that simple...

Anyone else?  Again, looking for someone to modify the code ; )
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 500 total points
ID: 37794901
Here's what I ended up doing. The reason for the double replace with Regular Expressions instead of just a normal Replace is that I think it's important to treat these cases differently:

1) need to take out spacing between the encoded words:

Subject: =?UTF-8?Q?Well=2C_let=27s_try_a_subject_with_s?=
	=?UTF-8?Q?=C3=96me_str=C4=83nge_characters_and_a_bit?=
	=?UTF-8?Q?_longer_so_it_may_span_more_than?=
	=?UTF-8?Q?_1_line_=E2=98=BA_and_how_about_graphics?=
	=?UTF-8?Q?=3F_=C2=B6?=

Open in new window


2) need to retain some form of spacing (I chose 1 space):

Received: from [70.101.42.90] (70-101-42-90.dsl2-plymouth.roc.ny.frontiernet.net [70.101.42.90])
	by relay04.roch.ny.frontiernet.net (Postfix) with ESMTP id E300ECCDF2;
	Thu, 11 Dec 2008 02:47:00 +0000 (UTC)

Open in new window


The code:

Option Explicit

Const C_TEMP_PATH = "C:\temp\"

Const C_EQUAL = "="
Const C_QUEST = "?"

Sub TestAllDecode()
    Dim arrh: arrh = Array( _
        "Test: =?utf-8?b?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=", _
        "Subject: =?utf-8?Q?Your=20Weekly=20HAR=205=2DMinute=20REALTOR=C2=AE=3A=20Posts=20for=20the=20week=20of=2003=2F06=2F2012?=", _
        "To: =?utf-8?Q?Fred?= <HAR@Equity.com>" _
    )
    Dim strH
    For Each strH In arrh
        Debug.Print DecodeHeaders(strH)
    Next
    DecodeFileWithHeaders "Header1.txt"
    DecodeFileWithHeaders "Header2.txt"
    'DecodeFileWithHeaders "Header3.txt"
End Sub

Function DecodeHeaders(ByVal strH As String) As String
    Dim res As String, ps As Integer
    res = ""
    ps = 1
    Dim re As New RegExp
    re.Global = True
    re.MultiLine = False
    re.IgnoreCase = True
    re.Pattern = "=" & vbCrLf & "[ \t]+="
    strH = re.Replace(strH, "==")
    re.Pattern = vbCrLf & "[ \t]+"
    strH = re.Replace(strH, " ")
    re.Pattern = "=\?[a-z0-9-]+\?[BQ]\?.+?\?="
    Dim m As Match
    For Each m In re.Execute(strH)
        res = res & Mid(strH, ps, m.FirstIndex + 1 - ps) & mhdecode(m.Value)
        ps = m.FirstIndex + m.Length + 1
    Next
    res = res & Mid(strH, ps)
    DecodeHeaders = res
End Function

Sub DecodeFileWithHeaders(fn As String)
    Debug.Print "--- file: " & fn
    Dim oFSO As Object, oFile As Object, c As String
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFile = oFSO.OpenTextFile(C_TEMP_PATH & fn, 1, False)
    c = oFile.ReadAll()
    oFile.Close
    Set oFile = Nothing
    Set oFSO = Nothing
    Debug.Print DecodeHeaders(c)
End Sub

Function mhdecode(ByVal s As String) As String
    Dim hdr As String: hdr = C_EQUAL & C_QUEST
    Dim ftr As String: ftr = C_QUEST & C_EQUAL
    Dim lenS As Integer: lenS = Len(s)
    If Left(s, Len(hdr)) = hdr And Right(s, Len(ftr)) = ftr Then
        Dim p1 As Integer: p1 = InStr(Len(hdr) + 1, s, C_QUEST)
        If p1 > Len(hdr) + 1 And p1 < lenS - 2 Then
            Dim p2 As Integer: p2 = InStr(p1 + 1, s, C_QUEST)
            If p2 > p1 + 1 And p2 < lenS - 2 Then
                Dim encCharSet As String: encCharSet = Mid(s, Len(hdr) + 1, p1 - Len(hdr) - 1)
                Dim encType As String: encType = Mid(s, p1 + 1, p2 - p1 - 1)
                Dim Message: Set Message = CreateObject("CDO.Message")
                Select Case UCase(encType)
                    Case "Q"
                        Message.BodyPart.ContentTransferEncoding = "quoted-printable"
                        s = Replace(s, "_", " ")
                    Case "B"
                        Message.BodyPart.ContentTransferEncoding = "base64"
                End Select
                Dim Stream 'As ADODB.Stream
                Set Stream = Message.BodyPart.GetEncodedContentStream
                'Set charset To base windows charset
                Stream.CharSet = "windows-1250"
                Stream.WriteText Mid(s, p2 + 1, lenS - p2 - Len(ftr))
                Stream.Flush
                Stream.Close
                Set Stream = Nothing
                Set Stream = Message.BodyPart.GetDecodedContentStream
                Stream.CharSet = encCharSet
                mhdecode = Stream.ReadText
                Stream.Close
                Set Stream = Nothing
            End If
        End If
    Else
        mhdecode = s
    End If
End Function

Open in new window

0
 

Author Closing Comment

by:SAbboushi
ID: 37796306
Absolutely Awesome!
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Many of my clients call in with monstrous Gmail overloading issues with Outlook. A quick tip is to turn off the All Mail and Important folders from synching. Here is a quick video I made to show you how to turn off these and other folders in Gmail s…

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question