Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Need someone to modify this VBA code to handle Internet multiline header fields

Posted on 2012-03-26
4
652 Views
Last Modified: 2012-04-04
I need the below code modified to handle Internet headers that are "folded" into multiple lines per
'           /* Rfc 2822 2.2 Header Fields
'               Header fields are lines composed of a field name, followed by a colon
'               (":"), followed by a field body, and terminated by CRLF.  A field
'               name MUST be composed of printable US-ASCII characters (i.e.,
'               characters that have values between 33 and 126, inclusive), except
'               colon.  A field body may be composed of any US-ASCII characters,
'               except for CR and LF.  However, a field body may contain CRLF when
'               used in header "folding" and  "unfolding" as described in section
'               2.2.3.  All field bodies MUST conform to the syntax described in
'               sections 3 and 4 of this standard.
'
'              Rfc 2822 2.2.3 (Multiline header fields)
'               The process of moving from this folded multiple-line representation
'               of a header field to its single line representation is called
'               "unfolding". Unfolding is accomplished by simply removing any CRLF
'               that is immediately followed by WSP.  Each header field should be
'               treated in its unfolded form for further syntactic and semantic
'               evaluation.
'
'               Example:
'                   Subject: aaaaa<CRLF>
'                   <TAB or SP>aaaaa<CRLF>
'

I have attached 2 sample headers (email addresses are not real); the "To:" header is the header of interest.

Basically, I need the code modified to "unfold" multiline header fields by removing CRLF when it is immediately followed by whitespace.  This needs to be done BEFORE parsing the words to decode.

Thanks for any help--
Sam

Option Explicit

Const C_SEP = " "
Const C_EQUAL = "="
Const C_QUEST = "?"
'

Sub TestAllDecode()
    Dim arrh: arrh = Array( _
        "Test: =?utf-8?b?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=", _
        "Subject: =?utf-8?Q?Your=20Weekly=20HAR=205=2DMinute=20REALTOR=C2=AE=3A=20Posts=20for=20the=20week=20of=2003=2F06=2F2012?=", _
        "To: =?utf-8?Q?Fred?= <HAR@Equity.com>" _
    )
    Dim strH, strW, d
    For Each strH In arrh
        d = ""
        For Each strW In Split(strH, C_SEP)
            If d <> "" Then d = d & C_SEP
            d = d & mhdecode(strW)
        Next
        Debug.Print d
    Next
End Sub

' TODO:
' * "_" => space
' * first word always unencoded?
' * comments in parentheses for specific fields?

Function mhdecode(ByVal s As String) As String
    Dim hdr As String: hdr = C_EQUAL & C_QUEST ' "=?utf-8?"
    Dim ftr As String: ftr = C_QUEST & C_EQUAL
    Dim lenS As Integer: lenS = Len(s)
    If Left(s, Len(hdr)) = hdr And Right(s, Len(ftr)) = ftr Then
        Dim p1 As Integer: p1 = InStr(Len(hdr) + 1, s, C_QUEST)
        If p1 > Len(hdr) + 1 And p1 < lenS - 2 Then
            Dim p2 As Integer: p2 = InStr(p1 + 1, s, C_QUEST)
            If p2 > p1 + 1 And p2 < lenS - 2 Then
                Dim encCharSet As String: encCharSet = Mid(s, Len(hdr) + 1, p1 - Len(hdr) - 1)
                Dim encType As String: encType = Mid(s, p1 + 1, p2 - p1 - 1)
                Dim Message: Set Message = CreateObject("CDO.Message")
                Select Case UCase(encType)
                    Case "Q"
                        Message.BodyPart.ContentTransferEncoding = "quoted-printable"
                    Case "B"
                        Message.BodyPart.ContentTransferEncoding = "base64"
                    Case Else
                        Message.BodyPart.ContentTransferEncoding = "us-ascii" ' ?
                End Select
                Dim Stream 'As ADODB.Stream
                Set Stream = Message.BodyPart.GetEncodedContentStream
                'Set charset To base windows charset
                Stream.Charset = "windows-1250"
                Stream.WriteText Mid(s, p2 + 1, lenS - p2 - Len(ftr))
                Stream.Flush
                Stream.Close
                Set Stream = Nothing
                Set Stream = Message.BodyPart.GetDecodedContentStream
                Stream.Charset = encCharSet
                mhdecode = Stream.ReadText
                Stream.Close
                Set Stream = Nothing
            End If
        End If
    Else
        mhdecode = s
    End If
End Function

Open in new window


Code provided by robert_schutt (Thanks again!)
Header1.txt
Header2.txt
0
Comment
Question by:SAbboushi
  • 2
4 Comments
 
LVL 83

Expert Comment

by:CodeCruiser
ID: 37769126
0
 

Author Comment

by:SAbboushi
ID: 37789446
I don't believe it is that simple...

Anyone else?  Again, looking for someone to modify the code ; )
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 500 total points
ID: 37794901
Here's what I ended up doing. The reason for the double replace with Regular Expressions instead of just a normal Replace is that I think it's important to treat these cases differently:

1) need to take out spacing between the encoded words:

Subject: =?UTF-8?Q?Well=2C_let=27s_try_a_subject_with_s?=
	=?UTF-8?Q?=C3=96me_str=C4=83nge_characters_and_a_bit?=
	=?UTF-8?Q?_longer_so_it_may_span_more_than?=
	=?UTF-8?Q?_1_line_=E2=98=BA_and_how_about_graphics?=
	=?UTF-8?Q?=3F_=C2=B6?=

Open in new window


2) need to retain some form of spacing (I chose 1 space):

Received: from [70.101.42.90] (70-101-42-90.dsl2-plymouth.roc.ny.frontiernet.net [70.101.42.90])
	by relay04.roch.ny.frontiernet.net (Postfix) with ESMTP id E300ECCDF2;
	Thu, 11 Dec 2008 02:47:00 +0000 (UTC)

Open in new window


The code:

Option Explicit

Const C_TEMP_PATH = "C:\temp\"

Const C_EQUAL = "="
Const C_QUEST = "?"

Sub TestAllDecode()
    Dim arrh: arrh = Array( _
        "Test: =?utf-8?b?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=", _
        "Subject: =?utf-8?Q?Your=20Weekly=20HAR=205=2DMinute=20REALTOR=C2=AE=3A=20Posts=20for=20the=20week=20of=2003=2F06=2F2012?=", _
        "To: =?utf-8?Q?Fred?= <HAR@Equity.com>" _
    )
    Dim strH
    For Each strH In arrh
        Debug.Print DecodeHeaders(strH)
    Next
    DecodeFileWithHeaders "Header1.txt"
    DecodeFileWithHeaders "Header2.txt"
    'DecodeFileWithHeaders "Header3.txt"
End Sub

Function DecodeHeaders(ByVal strH As String) As String
    Dim res As String, ps As Integer
    res = ""
    ps = 1
    Dim re As New RegExp
    re.Global = True
    re.MultiLine = False
    re.IgnoreCase = True
    re.Pattern = "=" & vbCrLf & "[ \t]+="
    strH = re.Replace(strH, "==")
    re.Pattern = vbCrLf & "[ \t]+"
    strH = re.Replace(strH, " ")
    re.Pattern = "=\?[a-z0-9-]+\?[BQ]\?.+?\?="
    Dim m As Match
    For Each m In re.Execute(strH)
        res = res & Mid(strH, ps, m.FirstIndex + 1 - ps) & mhdecode(m.Value)
        ps = m.FirstIndex + m.Length + 1
    Next
    res = res & Mid(strH, ps)
    DecodeHeaders = res
End Function

Sub DecodeFileWithHeaders(fn As String)
    Debug.Print "--- file: " & fn
    Dim oFSO As Object, oFile As Object, c As String
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFile = oFSO.OpenTextFile(C_TEMP_PATH & fn, 1, False)
    c = oFile.ReadAll()
    oFile.Close
    Set oFile = Nothing
    Set oFSO = Nothing
    Debug.Print DecodeHeaders(c)
End Sub

Function mhdecode(ByVal s As String) As String
    Dim hdr As String: hdr = C_EQUAL & C_QUEST
    Dim ftr As String: ftr = C_QUEST & C_EQUAL
    Dim lenS As Integer: lenS = Len(s)
    If Left(s, Len(hdr)) = hdr And Right(s, Len(ftr)) = ftr Then
        Dim p1 As Integer: p1 = InStr(Len(hdr) + 1, s, C_QUEST)
        If p1 > Len(hdr) + 1 And p1 < lenS - 2 Then
            Dim p2 As Integer: p2 = InStr(p1 + 1, s, C_QUEST)
            If p2 > p1 + 1 And p2 < lenS - 2 Then
                Dim encCharSet As String: encCharSet = Mid(s, Len(hdr) + 1, p1 - Len(hdr) - 1)
                Dim encType As String: encType = Mid(s, p1 + 1, p2 - p1 - 1)
                Dim Message: Set Message = CreateObject("CDO.Message")
                Select Case UCase(encType)
                    Case "Q"
                        Message.BodyPart.ContentTransferEncoding = "quoted-printable"
                        s = Replace(s, "_", " ")
                    Case "B"
                        Message.BodyPart.ContentTransferEncoding = "base64"
                End Select
                Dim Stream 'As ADODB.Stream
                Set Stream = Message.BodyPart.GetEncodedContentStream
                'Set charset To base windows charset
                Stream.CharSet = "windows-1250"
                Stream.WriteText Mid(s, p2 + 1, lenS - p2 - Len(ftr))
                Stream.Flush
                Stream.Close
                Set Stream = Nothing
                Set Stream = Message.BodyPart.GetDecodedContentStream
                Stream.CharSet = encCharSet
                mhdecode = Stream.ReadText
                Stream.Close
                Set Stream = Nothing
            End If
        End If
    Else
        mhdecode = s
    End If
End Function

Open in new window

0
 

Author Closing Comment

by:SAbboushi
ID: 37796306
Absolutely Awesome!
0

Featured Post

NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article lists the top 5 free OST to PST Converter Tools. These tools save a lot of time for users when they want to convert OST to PST after their exchange server is no longer available or some other critical issue with exchange server or impor…
When you see single cell contains number and text, and you have to get any date out of it seems like cracking our heads.
To add imagery to an HTML email signature, you have two options available to you. You can either add a logo/image by embedding it directly into the signature or hosting it externally and linking to it. The vast majority of email clients display l…
CodeTwo Sync for iCloud (http://www.codetwo.com/sync-for-icloud?sts=6554) automatically synchronizes your Outlook 2016, 2013, 2010 or 2007 folders with iCloud folders available via iCloud Control Panel. This lets you automatically sync them with…

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question