Need someone to modify this VBA code to handle Internet multiline header fields

I need the below code modified to handle Internet headers that are "folded" into multiple lines per
'           /* Rfc 2822 2.2 Header Fields
'               Header fields are lines composed of a field name, followed by a colon
'               (":"), followed by a field body, and terminated by CRLF.  A field
'               name MUST be composed of printable US-ASCII characters (i.e.,
'               characters that have values between 33 and 126, inclusive), except
'               colon.  A field body may be composed of any US-ASCII characters,
'               except for CR and LF.  However, a field body may contain CRLF when
'               used in header "folding" and  "unfolding" as described in section
'               2.2.3.  All field bodies MUST conform to the syntax described in
'               sections 3 and 4 of this standard.
'
'              Rfc 2822 2.2.3 (Multiline header fields)
'               The process of moving from this folded multiple-line representation
'               of a header field to its single line representation is called
'               "unfolding". Unfolding is accomplished by simply removing any CRLF
'               that is immediately followed by WSP.  Each header field should be
'               treated in its unfolded form for further syntactic and semantic
'               evaluation.
'
'               Example:
'                   Subject: aaaaa<CRLF>
'                   <TAB or SP>aaaaa<CRLF>
'

I have attached 2 sample headers (email addresses are not real); the "To:" header is the header of interest.

Basically, I need the code modified to "unfold" multiline header fields by removing CRLF when it is immediately followed by whitespace.  This needs to be done BEFORE parsing the words to decode.

Thanks for any help--
Sam

Option Explicit

Const C_SEP = " "
Const C_EQUAL = "="
Const C_QUEST = "?"
'

Sub TestAllDecode()
    Dim arrh: arrh = Array( _
        "Test: =?utf-8?b?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=", _
        "Subject: =?utf-8?Q?Your=20Weekly=20HAR=205=2DMinute=20REALTOR=C2=AE=3A=20Posts=20for=20the=20week=20of=2003=2F06=2F2012?=", _
        "To: =?utf-8?Q?Fred?= <HAR@Equity.com>" _
    )
    Dim strH, strW, d
    For Each strH In arrh
        d = ""
        For Each strW In Split(strH, C_SEP)
            If d <> "" Then d = d & C_SEP
            d = d & mhdecode(strW)
        Next
        Debug.Print d
    Next
End Sub

' TODO:
' * "_" => space
' * first word always unencoded?
' * comments in parentheses for specific fields?

Function mhdecode(ByVal s As String) As String
    Dim hdr As String: hdr = C_EQUAL & C_QUEST ' "=?utf-8?"
    Dim ftr As String: ftr = C_QUEST & C_EQUAL
    Dim lenS As Integer: lenS = Len(s)
    If Left(s, Len(hdr)) = hdr And Right(s, Len(ftr)) = ftr Then
        Dim p1 As Integer: p1 = InStr(Len(hdr) + 1, s, C_QUEST)
        If p1 > Len(hdr) + 1 And p1 < lenS - 2 Then
            Dim p2 As Integer: p2 = InStr(p1 + 1, s, C_QUEST)
            If p2 > p1 + 1 And p2 < lenS - 2 Then
                Dim encCharSet As String: encCharSet = Mid(s, Len(hdr) + 1, p1 - Len(hdr) - 1)
                Dim encType As String: encType = Mid(s, p1 + 1, p2 - p1 - 1)
                Dim Message: Set Message = CreateObject("CDO.Message")
                Select Case UCase(encType)
                    Case "Q"
                        Message.BodyPart.ContentTransferEncoding = "quoted-printable"
                    Case "B"
                        Message.BodyPart.ContentTransferEncoding = "base64"
                    Case Else
                        Message.BodyPart.ContentTransferEncoding = "us-ascii" ' ?
                End Select
                Dim Stream 'As ADODB.Stream
                Set Stream = Message.BodyPart.GetEncodedContentStream
                'Set charset To base windows charset
                Stream.Charset = "windows-1250"
                Stream.WriteText Mid(s, p2 + 1, lenS - p2 - Len(ftr))
                Stream.Flush
                Stream.Close
                Set Stream = Nothing
                Set Stream = Message.BodyPart.GetDecodedContentStream
                Stream.Charset = encCharSet
                mhdecode = Stream.ReadText
                Stream.Close
                Set Stream = Nothing
            End If
        End If
    Else
        mhdecode = s
    End If
End Function

Open in new window


Code provided by robert_schutt (Thanks again!)
Header1.txt
Header2.txt
SAbboushiAsked:
Who is Participating?
 
Robert SchuttConnect With a Mentor Software EngineerCommented:
Here's what I ended up doing. The reason for the double replace with Regular Expressions instead of just a normal Replace is that I think it's important to treat these cases differently:

1) need to take out spacing between the encoded words:

Subject: =?UTF-8?Q?Well=2C_let=27s_try_a_subject_with_s?=
	=?UTF-8?Q?=C3=96me_str=C4=83nge_characters_and_a_bit?=
	=?UTF-8?Q?_longer_so_it_may_span_more_than?=
	=?UTF-8?Q?_1_line_=E2=98=BA_and_how_about_graphics?=
	=?UTF-8?Q?=3F_=C2=B6?=

Open in new window


2) need to retain some form of spacing (I chose 1 space):

Received: from [70.101.42.90] (70-101-42-90.dsl2-plymouth.roc.ny.frontiernet.net [70.101.42.90])
	by relay04.roch.ny.frontiernet.net (Postfix) with ESMTP id E300ECCDF2;
	Thu, 11 Dec 2008 02:47:00 +0000 (UTC)

Open in new window


The code:

Option Explicit

Const C_TEMP_PATH = "C:\temp\"

Const C_EQUAL = "="
Const C_QUEST = "?"

Sub TestAllDecode()
    Dim arrh: arrh = Array( _
        "Test: =?utf-8?b?SWhyZSBCZXN0ZWxsdW5nIC0gVmVyc2FuZGJlc3TDpHRpZ3VuZyAtIDExMDU4OTEyNDY=?=", _
        "Subject: =?utf-8?Q?Your=20Weekly=20HAR=205=2DMinute=20REALTOR=C2=AE=3A=20Posts=20for=20the=20week=20of=2003=2F06=2F2012?=", _
        "To: =?utf-8?Q?Fred?= <HAR@Equity.com>" _
    )
    Dim strH
    For Each strH In arrh
        Debug.Print DecodeHeaders(strH)
    Next
    DecodeFileWithHeaders "Header1.txt"
    DecodeFileWithHeaders "Header2.txt"
    'DecodeFileWithHeaders "Header3.txt"
End Sub

Function DecodeHeaders(ByVal strH As String) As String
    Dim res As String, ps As Integer
    res = ""
    ps = 1
    Dim re As New RegExp
    re.Global = True
    re.MultiLine = False
    re.IgnoreCase = True
    re.Pattern = "=" & vbCrLf & "[ \t]+="
    strH = re.Replace(strH, "==")
    re.Pattern = vbCrLf & "[ \t]+"
    strH = re.Replace(strH, " ")
    re.Pattern = "=\?[a-z0-9-]+\?[BQ]\?.+?\?="
    Dim m As Match
    For Each m In re.Execute(strH)
        res = res & Mid(strH, ps, m.FirstIndex + 1 - ps) & mhdecode(m.Value)
        ps = m.FirstIndex + m.Length + 1
    Next
    res = res & Mid(strH, ps)
    DecodeHeaders = res
End Function

Sub DecodeFileWithHeaders(fn As String)
    Debug.Print "--- file: " & fn
    Dim oFSO As Object, oFile As Object, c As String
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Set oFile = oFSO.OpenTextFile(C_TEMP_PATH & fn, 1, False)
    c = oFile.ReadAll()
    oFile.Close
    Set oFile = Nothing
    Set oFSO = Nothing
    Debug.Print DecodeHeaders(c)
End Sub

Function mhdecode(ByVal s As String) As String
    Dim hdr As String: hdr = C_EQUAL & C_QUEST
    Dim ftr As String: ftr = C_QUEST & C_EQUAL
    Dim lenS As Integer: lenS = Len(s)
    If Left(s, Len(hdr)) = hdr And Right(s, Len(ftr)) = ftr Then
        Dim p1 As Integer: p1 = InStr(Len(hdr) + 1, s, C_QUEST)
        If p1 > Len(hdr) + 1 And p1 < lenS - 2 Then
            Dim p2 As Integer: p2 = InStr(p1 + 1, s, C_QUEST)
            If p2 > p1 + 1 And p2 < lenS - 2 Then
                Dim encCharSet As String: encCharSet = Mid(s, Len(hdr) + 1, p1 - Len(hdr) - 1)
                Dim encType As String: encType = Mid(s, p1 + 1, p2 - p1 - 1)
                Dim Message: Set Message = CreateObject("CDO.Message")
                Select Case UCase(encType)
                    Case "Q"
                        Message.BodyPart.ContentTransferEncoding = "quoted-printable"
                        s = Replace(s, "_", " ")
                    Case "B"
                        Message.BodyPart.ContentTransferEncoding = "base64"
                End Select
                Dim Stream 'As ADODB.Stream
                Set Stream = Message.BodyPart.GetEncodedContentStream
                'Set charset To base windows charset
                Stream.CharSet = "windows-1250"
                Stream.WriteText Mid(s, p2 + 1, lenS - p2 - Len(ftr))
                Stream.Flush
                Stream.Close
                Set Stream = Nothing
                Set Stream = Message.BodyPart.GetDecodedContentStream
                Stream.CharSet = encCharSet
                mhdecode = Stream.ReadText
                Stream.Close
                Set Stream = Nothing
            End If
        End If
    Else
        mhdecode = s
    End If
End Function

Open in new window

0
 
CodeCruiserCommented:
0
 
SAbboushiAuthor Commented:
I don't believe it is that simple...

Anyone else?  Again, looking for someone to modify the code ; )
0
 
SAbboushiAuthor Commented:
Absolutely Awesome!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.