Link to home
Start Free TrialLog in
Avatar of Marthaj
MarthajFlag for United States of America

asked on

PHP and extracting a file from a forwarded email

Asked by: dogsareit
Hello Folks. Been awhile. I am spinning my wheels on trying to extract a file that is attached to a email that was forwarded. I need to extract the file which will be written out a file (decoded). The user forwards an email that has a pdf file attached to it and so the 'forwarded email' appears as an attachment. That I can extract, but I can not extract the file attached to it. I have even tried to parse the forwarded email. I can display  the forwarded email and its attached pdf  but I must not be starting/ending in the right area to parse it because  it will not base64_decode it.

I know this issue has had to occur before and that there must be a solution.
My coding works fine extracting attachments as ling as it is not attached to a forwarded email.

Any help/guidance would be appreciated. Thank you in advance.
My code to extract attachments.
And below that - is how I access the array etc. after extracting the attachments.
 Function RtnExtractAttachments($inbox, $email_number)
{
   //echo '<BR>AT RtnExtractAttachments';
   
        /* get mail structure */
        $structure = imap_fetchstructure($inbox, $email_number);
 
        $attachments = array();
 
        /* if any attachments found... */
        if(isset($structure->parts) && count($structure->parts)) 
        {
            for($i = 0; $i < count($structure->parts); $i++) 
            {
                $attachments[$i] = array(
                    'is_attachment' => false,
                    'filename' => '',
                    'name' => '',
                    'attachment' => '');
                if($structure->parts[$i]->ifdparameters) 
                {
                    foreach($structure->parts[$i]->dparameters as $object) 
                    {
                        if(strtolower($object->attribute) == 'filename') 
                        {
                            $attachments[$i]['is_attachment'] = true;
                            $attachments[$i]['filename'] = $object->value;
                  
                        } // end of if(strtolower
                    } // foreach($structure->parts
                } //if($structure->parts
 
                if($structure->parts[$i]->ifparameters) 
                {
                    foreach($structure->parts[$i]->parameters as $object) 
                    {
                        if(strtolower($object->attribute) == 'name') 
                        {
                            $attachments[$i]['is_attachment'] = true;
                            $attachments[$i]['name'] = $object->value;
                        }   // end of  if(strtolower
                    }     // end foreach($structure-->parts[$i]->parameters as $object)
                }   // end of if($structure->parts[$i]->ifparameters)
 
                if($attachments[$i]['is_attachment']) 
                {
                    $attachments[$i]['attachment'] = imap_fetchbody($inbox, $email_number, $i+1);
                    /* 3 = BASE64 encoding */
                    if($structure->parts[$i]->encoding == 3) 
                    { 
                        $attachments[$i]['attachment'] = base64_decode($attachments[$i]['attachment']);
                    }
                    /* 4 = QUOTED-PRINTABLE encoding */
                    elseif($structure->parts[$i]->encoding == 4) 
                    { 
                        $attachments[$i]['attachment'] = quoted_printable_decode($attachments[$i]['attachment']);
                    } // end of  elsef($structure->parts
                }    //  end of 
            }
        }  // end of if(isset($structure->parts)
         
   // GET RID OF EMPTY NULL VALUES IN THE ARRAY
   $attachments = array_map('array_filter', $attachments);
    $attachments = array_filter( $attachments );
   

// RETURN ARRAY THAT CONTAINS ALL THE ATTACHMENTS

return $attachments;
}

Open in new window

And the code I use to rewrite the file:
    foreach($attachments as $attachment)
   
        {
            if($attachment['is_attachment'] == 1)
            {
                $filename = $attachment['name'];
            
            if(empty($filename)) $filename = $attachment['filename'];
            {
              //  empty line
                }
            
            
            
            if (!preg_match('/pdf/', $filename))
                {
                       echo '<BR><BR>ERROR ** NOT A PDF FILE - SKIPPING - NOT PROCESSING - SUBJECT LINE: '  . $subject . ' FILENAME: ' . $filename;
                       $_SESSION['NbrOfInvalidAttachments'] = ($_SESSION['NbrOfInvalidAttachments'] + 1);
                       $strLogMsg = 'ERROR - NOT PDF FILE' . ' FILENAME: ' . strval($filename) . '  SUBJECT LINE CONTAINS:  ' . $subject . PHP_EOL;
                      RtnWriteLogMsg($strLogMsg);
                  continue;
                }else{
                     echo '<BR><BR>IS A VALID PDF FILE - CONTINUING PROCESSING:  ' . $filename;
                    // CREATE NEW SEQ NBR
                    $strSeqNbr = ($strSeqNbr + 1);
                    $strSeq = 'SEQNBR_' . strval($strSeqNbr);
                
                   // CREATE NEW FILENAME FOR UPLOADING
                    $wrkFilename =  'PDF_' . $strSeq . trim('.pdf'); 
                    $wrkFilename = str_replace ( ' ' , '_' , $wrkFilename);
                        echo '<BR><BR>NEW PDF FILENAME: ' .  $wrkFilename;
                
                     $wrkNewFileName = trim('./workpdf/') . trim($wrkFilename);
                    file_put_contents($wrkNewFileName, $attachment['attachment']) or die('FPUT CONTENTS FAILED');
                }
            }  
             // END OF foreach($attachments as $attachment)
        }


Open in new window




ASKER CERTIFIED SOLUTION
Avatar of David Favor
David Favor
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Marthaj

ASKER

Thank you so much for responding. Can you recommend one ? The client uses PHP 5.6 .
They will NOT allow the use of Composer to install any packages so that's an additional yeeech,
My coding works fine extracting attachments as ling as it is not attached to a forwarded email.

The issue to solve here seems to be getting the email attachment from a forwarded message.  A quick google search and this seems to be what you are using https://gist.github.com/AikChun/8305789

On the line,    if(isset($structure->parts) && count($structure->parts))  perhaps do a print_r or variable dump of $structure->parts so you can see the difference between an original imail that has an attachment vs a forwarded.  I you can run that code inside of the loop to view that data, then post back here, we may be able to help figure it out, or you may get the ah ha.


Avatar of Marthaj

ASKER

Thank you Scott for responding. Upon examination of  the var-dump results, I see some interesting items but I not sure.
The first var_dump is from the forwarded email that has an file attached - it shows the subtype as string and RF822,
 no disposition value and array size of 2. I think it having RF822 is a key. But I am not sure where to take it after that. I know it can be extracted or atleast I think so from my research, but I don't quite understand the RF822 business, I do and I don't. but I think it is a key to resolving it.
The second var_dump, that had 2 files attached to it, (and extracted just fine), has a subtype of string 'PDF',disposition pf 'string attachment' with length and array size of 3.
Below is the printout of both emails:
VAR_DUMP FROM FORWARDED EMAIL THAT CONTAINS A FILE ATTACHMENT 

array (size=2)
  0 => 
    object(stdClass)[10]
      public 'type' => int 0
      public 'encoding' => int 1
      public 'ifsubtype' => int 1
      public 'subtype' => string 'PLAIN' (length=5)
      public 'ifdescription' => int 0
      public 'ifid' => int 0
      public 'lines' => int 266
      public 'bytes' => int 11414
      public 'ifdisposition' => int 1
      public 'disposition' => string 'inline' (length=6)
      public 'ifdparameters' => int 0
      public 'ifparameters' => int 1
      public 'parameters' => 
        array (size=3)
          0 => 
            object(stdClass)[11]
              ...
          1 => 
            object(stdClass)[12]
              ...
          2 => 
            object(stdClass)[13]
              ...
  1 => 
    object(stdClass)[14]
      public 'type' => int 2
      public 'encoding' => int 0
      public 'ifsubtype' => int 1
      public 'subtype' => string 'RFC822' (length=6)
      public 'ifdescription' => int 0
      public 'ifid' => int 0
      public 'lines' => int 8444
      public 'bytes' => int 632962
      public 'ifdisposition' => int 0
      public 'ifdparameters' => int 0
      public 'ifparameters' => int 1
      public 'parameters' => 
        array (size=1)
          0 => 
            object(stdClass)[15]
              ...
      public 'parts' => 
        array (size=1)
          0 => 
            object(stdClass)[16]
              ...


==========================================
VAR_DUMP FROM EMAIL THAT CONTAINS HAD 2 FILE ATTACHMENTS

array (size=3)
  0 => 
    object(stdClass)[10]
      public 'type' => int 1
      public 'encoding' => int 0
      public 'ifsubtype' => int 1
      public 'subtype' => string 'ALTERNATIVE' (length=11)
      public 'ifdescription' => int 0
      public 'ifid' => int 0
      public 'ifdisposition' => int 0
      public 'ifdparameters' => int 0
      public 'ifparameters' => int 1
      public 'parameters' => 
        array (size=1)
          0 => 
            object(stdClass)[11]
              ...
      public 'parts' => 
        array (size=2)
          0 => 
            object(stdClass)[12]
              ...
          1 => 
            object(stdClass)[14]
              ...
  1 => 
    object(stdClass)[16]
      public 'type' => int 3
      public 'encoding' => int 3
      public 'ifsubtype' => int 1
      public 'subtype' => string 'PDF' (length=3)
      public 'ifdescription' => int 0
      public 'ifid' => int 1
      public 'id' => string '<f_kg5g5g221>' (length=13)
      public 'bytes' => int 144866
      public 'ifdisposition' => int 1
      public 'disposition' => string 'attachment' (length=10)
      public 'ifdparameters' => int 1
      public 'dparameters' => 
        array (size=1)
          0 => 
            object(stdClass)[17]
              ...
      public 'ifparameters' => int 1
      public 'parameters' => 
        array (size=1)
          0 => 
            object(stdClass)[18]
              ...
  2 => 
    object(stdClass)[19]
      public 'type' => int 3
      public 'encoding' => int 3
      public 'ifsubtype' => int 1
      public 'subtype' => string 'PDF' (length=3)
      public 'ifdescription' => int 0
      public 'ifid' => int 1
      public 'id' => string '<f_kg5g5g0p0>' (length=13)
      public 'bytes' => int 281940
      public 'ifdisposition' => int 1
      public 'disposition' => string 'attachment' (length=10)
      public 'ifdparameters' => int 1
      public 'dparameters' => 
        array (size=1)
          0 => 
            object(stdClass)[20]
              ...
      public 'ifparameters' => int 1
      public 'parameters' => 
        array (size=1)
          0 => 
            object(stdClass)[21]
              ...

 

Open in new window



I don't really understand it either without looking at https://www.w3.org/Protocols/rfc822/ or https://www.php.net/manual/en/function.imap-fetchstructure.php

Just guessing, my next step would be looking at what is in the other arrays for clues. That will probably lead you to the right direction.

If you do find another option that works and uses composer, I never upload composer to the server, just the vendor folder that autoloads everything. 

Avatar of Marthaj

ASKER

Thank you for responding. So, what else would you suggest ?? It has been suggested to use a MIME parser but do they enable the ability to extract the attachment from the forwarded email ?? All confusing...sigh.
Avatar of Marthaj

ASKER

I decided to post the results of the print_r: I alos attached it as a file. as it looks ugly below.FROM PRINT_R.txt
FORWARDED EMAIL THAT HAS AN ATTACHMENT (EMAIL FORWARDED WITH FILE ATTACHED)
FROM print_r($structure->parts);

Array ([0] => stdClass Object([type] => 0[encoding] => 1[ifsubtype] => 1[subtype] => PLAIN[ifdescription] => 0[ifid] => 0[lines] => 243[bytes] => 10343[ifdisposition] => 1[disposition] => inline[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => utf-8)[1] => stdClass Object([attribute] => format[value] => flowed)[2] => stdClass Object([attribute] => DelSp[value] => Yes)))[1] => stdClass Object([type] => 2[encoding] => 0[ifsubtype] => 1[subtype] => RFC822[ifdescription] => 0[ifid] => 0[lines] => 8829[bytes] => 666137[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => Forwarded Message))[parts] => Array([0] => stdClass Object([type] => 1[encoding] => 0[ifsubtype] => 1[subtype] => MIXED[ifdescription] => 0[ifid] => 0[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => boundary[value] => =====================_442203==_))[parts] => Array([0] => stdClass Object([type] => 1[encoding] => 0[ifsubtype] => 1[subtype] => ALTERNATIVE[ifdescription] => 0[ifid] => 0[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => boundary[value] => =====================_442203==.ALT))[parts] => Array([0] => stdClass Object([type] => 0[encoding] => 4[ifsubtype] => 1[subtype] => PLAIN[ifdescription] => 0[ifid] => 0[lines] => 285[bytes] => 10433[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => iso-8859-1)[1] => stdClass Object([attribute] => format[value] => flowed)))[1] => stdClass Object([type] => 0[encoding] => 4[ifsubtype] => 1[subtype] => HTML[ifdescription] => 0[ifid] => 0[lines] => 296[bytes] => 14668[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => iso-8859-1)))))[1] => stdClass Object([type] => 3[encoding] => 3[ifsubtype] => 1[subtype] => PDF[ifdescription] => 0[ifid] => 0[bytes] => 307224[ifdisposition] => 1[disposition] => attachment[ifdparameters] => 1[dparameters] => Array([0] => stdClass Object([attribute] => filename[value] => Guest_XXX_41505963.pdf))[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => Guest_XXX_41505963.pdf)))[2] => stdClass Object([type] => 3[encoding] => 3[ifsubtype] => 1[subtype] => PDF[ifdescription] => 0[ifid] => 0[bytes] => 331568[ifdisposition] => 1[disposition] => attachment[ifdparameters] => 1[dparameters] => Array([0] => stdClass Object([attribute] => filename[value] => Agent_XXX_41505963.pdf))[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => Agent_XXX_41505963.pdf))))))) ) 
==========================================

EMAIL WITH TWO ATTACHMENTS - NOT FORWARDED
FROM print_r($structure->parts);

Array ([0] => stdClass Object([type] => 1[encoding] => 0[ifsubtype] => 1[subtype] => ALTERNATIVE[ifdescription] => 0[ifid] => 0[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => boundary[value] => 000000000000e9876f05b1696e42))[parts] => Array([0] => stdClass Object([type] => 0[encoding] => 0[ifsubtype] => 1[subtype] => PLAIN[ifdescription] => 0[ifid] => 0[lines] => 1[bytes] => 2[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => UTF-8)))[1] => stdClass Object([type] => 0[encoding] => 0[ifsubtype] => 1[subtype] => HTML[ifdescription] => 0[ifid] => 0[lines] => 1[bytes] => 123[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => UTF-8)))))[1] => stdClass Object([type] => 3[encoding] => 3[ifsubtype] => 1[subtype] => PDF[ifdescription] => 0[ifid] => 1[id] => [bytes] => 144866[ifdisposition] => 1[disposition] => attachment[ifdparameters] => 1[dparameters] => Array([0] => stdClass Object([attribute] => filename[value] => test_28Jul2020@110353@AM_A7.pdf))[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => test_28Jul2020@110353@AM_A7.pdf)))[2] => stdClass Object([type] => 3[encoding] => 3[ifsubtype] => 1[subtype] => PDF[ifdescription] => 0[ifid] => 1[id] => [bytes] => 281940[ifdisposition] => 1[disposition] => attachment[ifdparameters] => 1[dparameters] => Array([0] => stdClass Object([attribute] => filename[value] => test_28Jul2020@110342@AM_A4.pdf))[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => test_28Jul2020@110342@AM_A4.pdf))) ) 

Open in new window


Yes... MIME Parsing is complex, which is why I use PERL + MIME::Parser for this, as I've been using PERL so long... my brain is...

PERL-i-fied...

PERL snippet...

    my $parser = new MIME::Parser;

    $parser->output_to_core(1);
    $parser->decode_headers(1);

    my $msg = $parser->parse(\*STDIN);

    my $head = $msg->{mail_inet_head}->{mail_hdr_list};
    my $body = $msg->body;

Open in new window


This will give you access to headers + body in memory.

You'll add your other code after you have the Parser Tree data structures to access.

To recursively dump all MIME parts, single level only (not nested MIME parts)...

$parser->dump_skeleton();
my @parts = $obj->parts();

foreach my $part (@parts) {
    dump_part($stash,$part);
}

Open in new window


See if this helps.

Likely there's some PHP equivalent library + MIME::Parser is a serious workhorse, so might be hard to find an equivalent.
Random Aside: About MIME::Parser.

One project I run is a Realtime SPF Patching service.

This service requires parsing 1000s/minute DMARC report email, all of which have many random types of attachments.

The way I handle this is in steps...

1) Create a MIME parser.

2) Create a temporary directory.

3) Point the MIME::Parser object at this directory.

4) Run a parse operation.

5) At this point, you'll have a directory full nightmarish cruft.

Amidst this cruft, each attachment will... extrude... into it's own file...

6) Be sure to set the MIME::Parser flag to ignore errors, as a significant percentage of all MIME messages are mangled (incorrect MIME part formatting).

MIME::Parser is very smart + recovers so well from errors, I've never seen a crash in my code.

7) Once you have your directory of attachment files, now you have to run the file command or a libmagic binding in your language to determine if the attachment actually matches the MIME header + has been extracted correctly.

8) After libmagic says you have a winner, then you can process the attachment file.

9) If any libmagic errors occur, I forward these files into a staging directory, so I can write code to handle the new MIME part breakage.

So... per your comment above... MIME parsing... is complex... which is why I suggested you use a library...
Avatar of Marthaj

ASKER

Thank you for responding. Yes, I think I need a package for this problem. And that is exactly what I decided to do tonight.
I download the php-mime-mail.parser 3.04 (since my client is still on PHP 5.6). from this location:
https://github.com/php-mime-mail-parser/php-mime-mail-parser/releases/tag/3.0.4

It's a zip file. And I am trying to locate  php_mailparse.dll  for the version  php-mime-mail.parser 3.04.

I went to this link: http://pecl.php.net/package/mailparse
but it stated this: 'Dependencies for older releases can be found on the release overview page.'
and I sure as heck can not find the on the release overview page where it is suppose to be located.
So I am spinning again. Any help appreciated. This is just getting really ugly tonight !



SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Marthaj

ASKER

Thank you for this. You pointed out exactly what I was wondering about and that if it's stuffed into an array, I should be able to extract it. Right ?? I am unsure how to step down to it, but I am going to try. I also looked at a mime extractor etc.
I don't know which will win out. And thank you for link.
Everyone that has responded has been very generous with their knowledge. 
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Marthaj

ASKER

Thank you for responding. Yep, it's a drill down. And I understand where it's headed. - after pulling it out, doesn't decode properly.
And got pretty crazy. What I have decided to do is try the php-mime-mail.parser.
Here I go again ! 

Avatar of Marthaj

ASKER

Thank you both for helping. It has been appreciated very much. I wish I could select both of your answer as my solution,
Avatar of Marthaj

ASKER

Scott, David - managed to pull the files - stepping thru structure etc. David, thank you for your ideas - helpful.
And Scott, you were a good part of it - stepping thru the  structure work. And some more research.
Thank you both again.