Solved

Need someone to modify this VBA code to handle Internet multiline header fields

Posted on 2012-03-26
4
629 Views
Last Modified: 2012-04-04
I need the below code modified to handle Internet headers that are "folded" into multiple lines per
'           /* Rfc 2822 2.2 Header Fields
'               Header fields are lines composed of a field name, followed by a colon
'               (":"), followed by a field body, and terminated by CRLF.  A field
'               name MUST be composed of printable US-ASCII characters (i.e.,
'               characters that have values between 33 and 126, inclusive), except
'               colon.  A field body may be composed of any US-ASCII characters,
'               except for CR and LF.  However, a field body may contain CRLF when
'               used in header "folding" and  "unfolding" as described in section
'               2.2.3.  All field bodies MUST conform to the syntax described in
'               sections 3 and 4 of this standard.
'
'              Rfc 2822 2.2.3 (Multiline header fields)
'               The process of moving from this folded multiple-line representation
'               of a header field to its single line representation is called
'               "unfolding". Unfolding is accomplished by simply removing any CRLF
'               that is immediately followed by WSP.  Each header field should be
'               treated in its unfolded form for further syntactic and semantic
'               evaluation.
'
'               Example:
'                   Subject: aaaaa<CRLF>
'                   <TAB or SP>aaaaa<CRLF>
'

I have attached 2 sample headers (email addresses are not real); the "To:" header is the header of interest.

Basically, I need the code modified to "unfold" multiline header fields by removing CRLF when it is immediately followed by whitespace.  This needs to be done BEFORE parsing the words to decode.

Thanks for any help--
Sam

Option Explicit

Const C_SEP = " "
Const C_EQUAL = "="
Const C_QUEST = "?"
'

Sub TestAllDecode()
    Dim arrh: arrh = Array( _
        "Test: =?utf-8?b?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=", _
        "Subject: =?utf-8?Q?Your=20Weekly=20HAR=205=2DMinute=20REALTOR=C2=AE=3A=20Posts=20for=20the=20week=20of=2003=2F06=2F2012?=", _
        "To: =?utf-8?Q?Fred?= <HAR@Equity.com>" _
    )
    Dim strH, strW, d
    For Each strH In arrh
        d = ""
        For Each strW In Split(strH, C_SEP)
            If d <> "" Then d = d & C_SEP
            d = d & mhdecode(strW)
        Next
        Debug.Print d
    Next
End Sub

' TODO:
' * "_" => space
' * first word always unencoded?
' * comments in parentheses for specific fields?

Function mhdecode(ByVal s As String) As String
    Dim hdr As String: hdr = C_EQUAL & C_QUEST ' "=?utf-8?"
    Dim ftr As String: ftr = C_QUEST & C_EQUAL
    Dim lenS As Integer: lenS = Len(s)
    If Left(s, Len(hdr)) = hdr And Right(s, Len(ftr)) = ftr Then
        Dim p1 As Integer: p1 = InStr(Len(hdr) + 1, s, C_QUEST)
        If p1 > Len(hdr) + 1 And p1 < lenS - 2 Then
            Dim p2 As Integer: p2 = InStr(p1 + 1, s, C_QUEST)
            If p2 > p1 + 1 And p2 < lenS - 2 Then
                Dim encCharSet As String: encCharSet = Mid(s, Len(hdr) + 1, p1 - Len(hdr) - 1)
                Dim encType As String: encType = Mid(s, p1 + 1, p2 - p1 - 1)
                Dim Message: Set Message = CreateObject("CDO.Message")
                Select Case UCase(encType)
                    Case "Q"
                        Message.BodyPart.ContentTransferEncoding = "quoted-printable"
                    Case "B"
                        Message.BodyPart.ContentTransferEncoding = "base64"
                    Case Else
                        Message.BodyPart.ContentTransferEncoding = "us-ascii" ' ?
                End Select
                Dim Stream 'As ADODB.Stream
                Set Stream = Message.BodyPart.GetEncodedContentStream
                'Set charset To base windows charset
                Stream.Charset = "windows-1250"
                Stream.WriteText Mid(s, p2 + 1, lenS - p2 - Len(ftr))
                Stream.Flush
                Stream.Close
                Set Stream = Nothing
                Set Stream = Message.BodyPart.GetDecodedContentStream
                Stream.Charset = encCharSet
                mhdecode = Stream.ReadText
                Stream.Close
                Set Stream = Nothing
            End If
        End If
    Else
        mhdecode = s
    End If
End Function

Open in new window


Code provided by robert_schutt (Thanks again!)
Header1.txt
Header2.txt
0
Comment
Question by:SAbboushi
  • 2
4 Comments
 
LVL 83

Expert Comment

by:CodeCruiser
Comment Utility
0
 

Author Comment

by:SAbboushi
Comment Utility
I don't believe it is that simple...

Anyone else?  Again, looking for someone to modify the code ; )
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 500 total points
Comment Utility
Here's what I ended up doing. The reason for the double replace with Regular Expressions instead of just a normal Replace is that I think it's important to treat these cases differently:

1) need to take out spacing between the encoded words:

Subject: =?UTF-8?Q?Well=2C_let=27s_try_a_subject_with_s?=
	=?UTF-8?Q?=C3=96me_str=C4=83nge_characters_and_a_bit?=
	=?UTF-8?Q?_longer_so_it_may_span_more_than?=
	=?UTF-8?Q?_1_line_=E2=98=BA_and_how_about_graphics?=
	=?UTF-8?Q?=3F_=C2=B6?=

Open in new window


2) need to retain some form of spacing (I chose 1 space):

Received: from [70.101.42.90] (70-101-42-90.dsl2-plymouth.roc.ny.frontiernet.net [70.101.42.90])
	by relay04.roch.ny.frontiernet.net (Postfix) with ESMTP id E300ECCDF2;
	Thu, 11 Dec 2008 02:47:00 +0000 (UTC)

Open in new window


The code:

Option Explicit

Const C_TEMP_PATH = "C:\temp\"

Const C_EQUAL = "="
Const C_QUEST = "?"

Sub TestAllDecode()
    Dim arrh: arrh = Array( _
        "Test: =?utf-8?b?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=", _
        "Subject: =?utf-8?Q?Your=20Weekly=20HAR=205=2DMinute=20REALTOR=C2=AE=3A=20Posts=20for=20the=20week=20of=2003=2F06=2F2012?=", _
        "To: =?utf-8?Q?Fred?= <HAR@Equity.com>" _
    )
    Dim strH
    For Each strH In arrh
        Debug.Print DecodeHeaders(strH)
    Next
    DecodeFileWithHeaders "Header1.txt"
    DecodeFileWithHeaders "Header2.txt"
    'DecodeFileWithHeaders "Header3.txt"
End Sub

Function DecodeHeaders(ByVal strH As String) As String
    Dim res As String, ps As Integer
    res = ""
    ps = 1
    Dim re As New RegExp
    re.Global = True
    re.MultiLine = False
    re.IgnoreCase = True
    re.Pattern = "=" & vbCrLf & "[ \t]+="
    strH = re.Replace(strH, "==")
    re.Pattern = vbCrLf & "[ \t]+"
    strH = re.Replace(strH, " ")
    re.Pattern = "=\?[a-z0-9-]+\?[BQ]\?.+?\?="
    Dim m As Match
    For Each m In re.Execute(strH)
        res = res & Mid(strH, ps, m.FirstIndex + 1 - ps) & mhdecode(m.Value)
        ps = m.FirstIndex + m.Length + 1
    Next
    res = res & Mid(strH, ps)
    DecodeHeaders = res
End Function

Sub DecodeFileWithHeaders(fn As String)
    Debug.Print "--- file: " & fn
    Dim oFSO As Object, oFile As Object, c As String
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFile = oFSO.OpenTextFile(C_TEMP_PATH & fn, 1, False)
    c = oFile.ReadAll()
    oFile.Close
    Set oFile = Nothing
    Set oFSO = Nothing
    Debug.Print DecodeHeaders(c)
End Sub

Function mhdecode(ByVal s As String) As String
    Dim hdr As String: hdr = C_EQUAL & C_QUEST
    Dim ftr As String: ftr = C_QUEST & C_EQUAL
    Dim lenS As Integer: lenS = Len(s)
    If Left(s, Len(hdr)) = hdr And Right(s, Len(ftr)) = ftr Then
        Dim p1 As Integer: p1 = InStr(Len(hdr) + 1, s, C_QUEST)
        If p1 > Len(hdr) + 1 And p1 < lenS - 2 Then
            Dim p2 As Integer: p2 = InStr(p1 + 1, s, C_QUEST)
            If p2 > p1 + 1 And p2 < lenS - 2 Then
                Dim encCharSet As String: encCharSet = Mid(s, Len(hdr) + 1, p1 - Len(hdr) - 1)
                Dim encType As String: encType = Mid(s, p1 + 1, p2 - p1 - 1)
                Dim Message: Set Message = CreateObject("CDO.Message")
                Select Case UCase(encType)
                    Case "Q"
                        Message.BodyPart.ContentTransferEncoding = "quoted-printable"
                        s = Replace(s, "_", " ")
                    Case "B"
                        Message.BodyPart.ContentTransferEncoding = "base64"
                End Select
                Dim Stream 'As ADODB.Stream
                Set Stream = Message.BodyPart.GetEncodedContentStream
                'Set charset To base windows charset
                Stream.CharSet = "windows-1250"
                Stream.WriteText Mid(s, p2 + 1, lenS - p2 - Len(ftr))
                Stream.Flush
                Stream.Close
                Set Stream = Nothing
                Set Stream = Message.BodyPart.GetDecodedContentStream
                Stream.CharSet = encCharSet
                mhdecode = Stream.ReadText
                Stream.Close
                Set Stream = Nothing
            End If
        End If
    Else
        mhdecode = s
    End If
End Function

Open in new window

0
 

Author Closing Comment

by:SAbboushi
Comment Utility
Absolutely Awesome!
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Encryption Decryption in Oracle 12 58
Outlook Archive File 3 34
outlook, calendar 3 33
outlook 2013 8 27
Get an idea of what you should include in an email disclaimer with these Top 5 email disclaimer tips.
This process describes the steps required to Import and Export data from and to .pst files using Exchange 2010. We can use these steps to export data from a user to a .pst file, import data back to the same or a different user, or even import data t…
This Experts Exchange video Micro Tutorial shows how to tell Microsoft Office that a word is NOT spelled correctly. Microsoft Office has a built-in, main dictionary that is shared by Office apps, including Excel, Outlook, PowerPoint, and Word. When …
To add imagery to an HTML email signature, you have two options available to you. You can either add a logo/image by embedding it directly into the signature or hosting it externally and linking to it. The vast majority of email clients display l…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now