Marthaj
asked on
PHP and extracting a file from a forwarded email
Asked by: dogsareit
Hello Folks. Been awhile. I am spinning my wheels on trying to extract a file that is attached to a email that was forwarded. I need to extract the file which will be written out a file (decoded). The user forwards an email that has a pdf file attached to it and so the 'forwarded email' appears as an attachment. That I can extract, but I can not extract the file attached to it. I have even tried to parse the forwarded email. I can display the forwarded email and its attached pdf but I must not be starting/ending in the right area to parse it because it will not base64_decode it.
I know this issue has had to occur before and that there must be a solution.
My coding works fine extracting attachments as ling as it is not attached to a forwarded email.
Any help/guidance would be appreciated. Thank you in advance.
My code to extract attachments.
And below that - is how I access the array etc. after extracting the attachments.
Hello Folks. Been awhile. I am spinning my wheels on trying to extract a file that is attached to a email that was forwarded. I need to extract the file which will be written out a file (decoded). The user forwards an email that has a pdf file attached to it and so the 'forwarded email' appears as an attachment. That I can extract, but I can not extract the file attached to it. I have even tried to parse the forwarded email. I can display the forwarded email and its attached pdf but I must not be starting/ending in the right area to parse it because it will not base64_decode it.
I know this issue has had to occur before and that there must be a solution.
My coding works fine extracting attachments as ling as it is not attached to a forwarded email.
Any help/guidance would be appreciated. Thank you in advance.
My code to extract attachments.
And below that - is how I access the array etc. after extracting the attachments.
Function RtnExtractAttachments($inbox, $email_number)
{
//echo '<BR>AT RtnExtractAttachments';
/* get mail structure */
$structure = imap_fetchstructure($inbox, $email_number);
$attachments = array();
/* if any attachments found... */
if(isset($structure->parts) && count($structure->parts))
{
for($i = 0; $i < count($structure->parts); $i++)
{
$attachments[$i] = array(
'is_attachment' => false,
'filename' => '',
'name' => '',
'attachment' => '');
if($structure->parts[$i]->ifdparameters)
{
foreach($structure->parts[$i]->dparameters as $object)
{
if(strtolower($object->attribute) == 'filename')
{
$attachments[$i]['is_attachment'] = true;
$attachments[$i]['filename'] = $object->value;
} // end of if(strtolower
} // foreach($structure->parts
} //if($structure->parts
if($structure->parts[$i]->ifparameters)
{
foreach($structure->parts[$i]->parameters as $object)
{
if(strtolower($object->attribute) == 'name')
{
$attachments[$i]['is_attachment'] = true;
$attachments[$i]['name'] = $object->value;
} // end of if(strtolower
} // end foreach($structure-->parts[$i]->parameters as $object)
} // end of if($structure->parts[$i]->ifparameters)
if($attachments[$i]['is_attachment'])
{
$attachments[$i]['attachment'] = imap_fetchbody($inbox, $email_number, $i+1);
/* 3 = BASE64 encoding */
if($structure->parts[$i]->encoding == 3)
{
$attachments[$i]['attachment'] = base64_decode($attachments[$i]['attachment']);
}
/* 4 = QUOTED-PRINTABLE encoding */
elseif($structure->parts[$i]->encoding == 4)
{
$attachments[$i]['attachment'] = quoted_printable_decode($attachments[$i]['attachment']);
} // end of elsef($structure->parts
} // end of
}
} // end of if(isset($structure->parts)
// GET RID OF EMPTY NULL VALUES IN THE ARRAY
$attachments = array_map('array_filter', $attachments);
$attachments = array_filter( $attachments );
// RETURN ARRAY THAT CONTAINS ALL THE ATTACHMENTS
return $attachments;
}
And the code I use to rewrite the file: foreach($attachments as $attachment)
{
if($attachment['is_attachment'] == 1)
{
$filename = $attachment['name'];
if(empty($filename)) $filename = $attachment['filename'];
{
// empty line
}
if (!preg_match('/pdf/', $filename))
{
echo '<BR><BR>ERROR ** NOT A PDF FILE - SKIPPING - NOT PROCESSING - SUBJECT LINE: ' . $subject . ' FILENAME: ' . $filename;
$_SESSION['NbrOfInvalidAttachments'] = ($_SESSION['NbrOfInvalidAttachments'] + 1);
$strLogMsg = 'ERROR - NOT PDF FILE' . ' FILENAME: ' . strval($filename) . ' SUBJECT LINE CONTAINS: ' . $subject . PHP_EOL;
RtnWriteLogMsg($strLogMsg);
continue;
}else{
echo '<BR><BR>IS A VALID PDF FILE - CONTINUING PROCESSING: ' . $filename;
// CREATE NEW SEQ NBR
$strSeqNbr = ($strSeqNbr + 1);
$strSeq = 'SEQNBR_' . strval($strSeqNbr);
// CREATE NEW FILENAME FOR UPLOADING
$wrkFilename = 'PDF_' . $strSeq . trim('.pdf');
$wrkFilename = str_replace ( ' ' , '_' , $wrkFilename);
echo '<BR><BR>NEW PDF FILENAME: ' . $wrkFilename;
$wrkNewFileName = trim('./workpdf/') . trim($wrkFilename);
file_put_contents($wrkNewFileName, $attachment['attachment']) or die('FPUT CONTENTS FAILED');
}
}
// END OF foreach($attachments as $attachment)
}
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
My coding works fine extracting attachments as ling as it is not attached to a forwarded email.
The issue to solve here seems to be getting the email attachment from a forwarded message. A quick google search and this seems to be what you are using https://gist.github.com/AikChun/8305789
On the line, if(isset($structure->parts) && count($structure->parts)) perhaps do a print_r or variable dump of $structure->parts so you can see the difference between an original imail that has an attachment vs a forwarded. I you can run that code inside of the loop to view that data, then post back here, we may be able to help figure it out, or you may get the ah ha.
ASKER
Thank you Scott for responding. Upon examination of the var-dump results, I see some interesting items but I not sure.
The first var_dump is from the forwarded email that has an file attached - it shows the subtype as string and RF822,
no disposition value and array size of 2. I think it having RF822 is a key. But I am not sure where to take it after that. I know it can be extracted or atleast I think so from my research, but I don't quite understand the RF822 business, I do and I don't. but I think it is a key to resolving it.
The second var_dump, that had 2 files attached to it, (and extracted just fine), has a subtype of string 'PDF',disposition pf 'string attachment' with length and array size of 3.
Below is the printout of both emails:
The first var_dump is from the forwarded email that has an file attached - it shows the subtype as string and RF822,
no disposition value and array size of 2. I think it having RF822 is a key. But I am not sure where to take it after that. I know it can be extracted or atleast I think so from my research, but I don't quite understand the RF822 business, I do and I don't. but I think it is a key to resolving it.
The second var_dump, that had 2 files attached to it, (and extracted just fine), has a subtype of string 'PDF',disposition pf 'string attachment' with length and array size of 3.
Below is the printout of both emails:
VAR_DUMP FROM FORWARDED EMAIL THAT CONTAINS A FILE ATTACHMENT
array (size=2)
0 =>
object(stdClass)[10]
public 'type' => int 0
public 'encoding' => int 1
public 'ifsubtype' => int 1
public 'subtype' => string 'PLAIN' (length=5)
public 'ifdescription' => int 0
public 'ifid' => int 0
public 'lines' => int 266
public 'bytes' => int 11414
public 'ifdisposition' => int 1
public 'disposition' => string 'inline' (length=6)
public 'ifdparameters' => int 0
public 'ifparameters' => int 1
public 'parameters' =>
array (size=3)
0 =>
object(stdClass)[11]
...
1 =>
object(stdClass)[12]
...
2 =>
object(stdClass)[13]
...
1 =>
object(stdClass)[14]
public 'type' => int 2
public 'encoding' => int 0
public 'ifsubtype' => int 1
public 'subtype' => string 'RFC822' (length=6)
public 'ifdescription' => int 0
public 'ifid' => int 0
public 'lines' => int 8444
public 'bytes' => int 632962
public 'ifdisposition' => int 0
public 'ifdparameters' => int 0
public 'ifparameters' => int 1
public 'parameters' =>
array (size=1)
0 =>
object(stdClass)[15]
...
public 'parts' =>
array (size=1)
0 =>
object(stdClass)[16]
...
==========================================
VAR_DUMP FROM EMAIL THAT CONTAINS HAD 2 FILE ATTACHMENTS
array (size=3)
0 =>
object(stdClass)[10]
public 'type' => int 1
public 'encoding' => int 0
public 'ifsubtype' => int 1
public 'subtype' => string 'ALTERNATIVE' (length=11)
public 'ifdescription' => int 0
public 'ifid' => int 0
public 'ifdisposition' => int 0
public 'ifdparameters' => int 0
public 'ifparameters' => int 1
public 'parameters' =>
array (size=1)
0 =>
object(stdClass)[11]
...
public 'parts' =>
array (size=2)
0 =>
object(stdClass)[12]
...
1 =>
object(stdClass)[14]
...
1 =>
object(stdClass)[16]
public 'type' => int 3
public 'encoding' => int 3
public 'ifsubtype' => int 1
public 'subtype' => string 'PDF' (length=3)
public 'ifdescription' => int 0
public 'ifid' => int 1
public 'id' => string '<f_kg5g5g221>' (length=13)
public 'bytes' => int 144866
public 'ifdisposition' => int 1
public 'disposition' => string 'attachment' (length=10)
public 'ifdparameters' => int 1
public 'dparameters' =>
array (size=1)
0 =>
object(stdClass)[17]
...
public 'ifparameters' => int 1
public 'parameters' =>
array (size=1)
0 =>
object(stdClass)[18]
...
2 =>
object(stdClass)[19]
public 'type' => int 3
public 'encoding' => int 3
public 'ifsubtype' => int 1
public 'subtype' => string 'PDF' (length=3)
public 'ifdescription' => int 0
public 'ifid' => int 1
public 'id' => string '<f_kg5g5g0p0>' (length=13)
public 'bytes' => int 281940
public 'ifdisposition' => int 1
public 'disposition' => string 'attachment' (length=10)
public 'ifdparameters' => int 1
public 'dparameters' =>
array (size=1)
0 =>
object(stdClass)[20]
...
public 'ifparameters' => int 1
public 'parameters' =>
array (size=1)
0 =>
object(stdClass)[21]
...
I don't really understand it either without looking at https://www.w3.org/Protocols/rfc822/ or https://www.php.net/manual/en/function.imap-fetchstructure.php
Just guessing, my next step would be looking at what is in the other arrays for clues. That will probably lead you to the right direction.
If you do find another option that works and uses composer, I never upload composer to the server, just the vendor folder that autoloads everything.
Just guessing, my next step would be looking at what is in the other arrays for clues. That will probably lead you to the right direction.
If you do find another option that works and uses composer, I never upload composer to the server, just the vendor folder that autoloads everything.
ASKER
Thank you for responding. So, what else would you suggest ?? It has been suggested to use a MIME parser but do they enable the ability to extract the attachment from the forwarded email ?? All confusing...sigh.
ASKER
I decided to post the results of the print_r: I alos attached it as a file. as it looks ugly below.FROM PRINT_R.txt
FORWARDED EMAIL THAT HAS AN ATTACHMENT (EMAIL FORWARDED WITH FILE ATTACHED)
FROM print_r($structure->parts);
Array ([0] => stdClass Object([type] => 0[encoding] => 1[ifsubtype] => 1[subtype] => PLAIN[ifdescription] => 0[ifid] => 0[lines] => 243[bytes] => 10343[ifdisposition] => 1[disposition] => inline[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => utf-8)[1] => stdClass Object([attribute] => format[value] => flowed)[2] => stdClass Object([attribute] => DelSp[value] => Yes)))[1] => stdClass Object([type] => 2[encoding] => 0[ifsubtype] => 1[subtype] => RFC822[ifdescription] => 0[ifid] => 0[lines] => 8829[bytes] => 666137[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => Forwarded Message))[parts] => Array([0] => stdClass Object([type] => 1[encoding] => 0[ifsubtype] => 1[subtype] => MIXED[ifdescription] => 0[ifid] => 0[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => boundary[value] => =====================_442203==_))[parts] => Array([0] => stdClass Object([type] => 1[encoding] => 0[ifsubtype] => 1[subtype] => ALTERNATIVE[ifdescription] => 0[ifid] => 0[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => boundary[value] => =====================_442203==.ALT))[parts] => Array([0] => stdClass Object([type] => 0[encoding] => 4[ifsubtype] => 1[subtype] => PLAIN[ifdescription] => 0[ifid] => 0[lines] => 285[bytes] => 10433[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => iso-8859-1)[1] => stdClass Object([attribute] => format[value] => flowed)))[1] => stdClass Object([type] => 0[encoding] => 4[ifsubtype] => 1[subtype] => HTML[ifdescription] => 0[ifid] => 0[lines] => 296[bytes] => 14668[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => iso-8859-1)))))[1] => stdClass Object([type] => 3[encoding] => 3[ifsubtype] => 1[subtype] => PDF[ifdescription] => 0[ifid] => 0[bytes] => 307224[ifdisposition] => 1[disposition] => attachment[ifdparameters] => 1[dparameters] => Array([0] => stdClass Object([attribute] => filename[value] => Guest_XXX_41505963.pdf))[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => Guest_XXX_41505963.pdf)))[2] => stdClass Object([type] => 3[encoding] => 3[ifsubtype] => 1[subtype] => PDF[ifdescription] => 0[ifid] => 0[bytes] => 331568[ifdisposition] => 1[disposition] => attachment[ifdparameters] => 1[dparameters] => Array([0] => stdClass Object([attribute] => filename[value] => Agent_XXX_41505963.pdf))[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => Agent_XXX_41505963.pdf))))))) )
==========================================
EMAIL WITH TWO ATTACHMENTS - NOT FORWARDED
FROM print_r($structure->parts);
Array ([0] => stdClass Object([type] => 1[encoding] => 0[ifsubtype] => 1[subtype] => ALTERNATIVE[ifdescription] => 0[ifid] => 0[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => boundary[value] => 000000000000e9876f05b1696e42))[parts] => Array([0] => stdClass Object([type] => 0[encoding] => 0[ifsubtype] => 1[subtype] => PLAIN[ifdescription] => 0[ifid] => 0[lines] => 1[bytes] => 2[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => UTF-8)))[1] => stdClass Object([type] => 0[encoding] => 0[ifsubtype] => 1[subtype] => HTML[ifdescription] => 0[ifid] => 0[lines] => 1[bytes] => 123[ifdisposition] => 0[ifdparameters] => 0[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => charset[value] => UTF-8)))))[1] => stdClass Object([type] => 3[encoding] => 3[ifsubtype] => 1[subtype] => PDF[ifdescription] => 0[ifid] => 1[id] => [bytes] => 144866[ifdisposition] => 1[disposition] => attachment[ifdparameters] => 1[dparameters] => Array([0] => stdClass Object([attribute] => filename[value] => test_28Jul2020@110353@AM_A7.pdf))[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => test_28Jul2020@110353@AM_A7.pdf)))[2] => stdClass Object([type] => 3[encoding] => 3[ifsubtype] => 1[subtype] => PDF[ifdescription] => 0[ifid] => 1[id] => [bytes] => 281940[ifdisposition] => 1[disposition] => attachment[ifdparameters] => 1[dparameters] => Array([0] => stdClass Object([attribute] => filename[value] => test_28Jul2020@110342@AM_A4.pdf))[ifparameters] => 1[parameters] => Array([0] => stdClass Object([attribute] => name[value] => test_28Jul2020@110342@AM_A4.pdf))) )
Yes... MIME Parsing is complex, which is why I use PERL + MIME::Parser for this, as I've been using PERL so long... my brain is...
PERL-i-fied...
PERL snippet...
This will give you access to headers + body in memory.
You'll add your other code after you have the Parser Tree data structures to access.
To recursively dump all MIME parts, single level only (not nested MIME parts)...
See if this helps.
Likely there's some PHP equivalent library + MIME::Parser is a serious workhorse, so might be hard to find an equivalent.
PERL-i-fied...
PERL snippet...
my $parser = new MIME::Parser;
$parser->output_to_core(1);
$parser->decode_headers(1);
my $msg = $parser->parse(\*STDIN);
my $head = $msg->{mail_inet_head}->{mail_hdr_list};
my $body = $msg->body;
This will give you access to headers + body in memory.
You'll add your other code after you have the Parser Tree data structures to access.
To recursively dump all MIME parts, single level only (not nested MIME parts)...
$parser->dump_skeleton();
my @parts = $obj->parts();
foreach my $part (@parts) {
dump_part($stash,$part);
}
See if this helps.
Likely there's some PHP equivalent library + MIME::Parser is a serious workhorse, so might be hard to find an equivalent.
Random Aside: About MIME::Parser.
One project I run is a Realtime SPF Patching service.
This service requires parsing 1000s/minute DMARC report email, all of which have many random types of attachments.
The way I handle this is in steps...
1) Create a MIME parser.
2) Create a temporary directory.
3) Point the MIME::Parser object at this directory.
4) Run a parse operation.
5) At this point, you'll have a directory full nightmarish cruft.
Amidst this cruft, each attachment will... extrude... into it's own file...
6) Be sure to set the MIME::Parser flag to ignore errors, as a significant percentage of all MIME messages are mangled (incorrect MIME part formatting).
MIME::Parser is very smart + recovers so well from errors, I've never seen a crash in my code.
7) Once you have your directory of attachment files, now you have to run the file command or a libmagic binding in your language to determine if the attachment actually matches the MIME header + has been extracted correctly.
8) After libmagic says you have a winner, then you can process the attachment file.
9) If any libmagic errors occur, I forward these files into a staging directory, so I can write code to handle the new MIME part breakage.
So... per your comment above... MIME parsing... is complex... which is why I suggested you use a library...
One project I run is a Realtime SPF Patching service.
This service requires parsing 1000s/minute DMARC report email, all of which have many random types of attachments.
The way I handle this is in steps...
1) Create a MIME parser.
2) Create a temporary directory.
3) Point the MIME::Parser object at this directory.
4) Run a parse operation.
5) At this point, you'll have a directory full nightmarish cruft.
Amidst this cruft, each attachment will... extrude... into it's own file...
6) Be sure to set the MIME::Parser flag to ignore errors, as a significant percentage of all MIME messages are mangled (incorrect MIME part formatting).
MIME::Parser is very smart + recovers so well from errors, I've never seen a crash in my code.
7) Once you have your directory of attachment files, now you have to run the file command or a libmagic binding in your language to determine if the attachment actually matches the MIME header + has been extracted correctly.
8) After libmagic says you have a winner, then you can process the attachment file.
9) If any libmagic errors occur, I forward these files into a staging directory, so I can write code to handle the new MIME part breakage.
So... per your comment above... MIME parsing... is complex... which is why I suggested you use a library...
ASKER
Thank you for responding. Yes, I think I need a package for this problem. And that is exactly what I decided to do tonight.
I download the php-mime-mail.parser 3.04 (since my client is still on PHP 5.6). from this location:
https://github.com/php-mime-mail-parser/php-mime-mail-parser/releases/tag/3.0.4
It's a zip file. And I am trying to locate php_mailparse.dll for the version php-mime-mail.parser 3.04.
I went to this link: http://pecl.php.net/package/mailparse
but it stated this: 'Dependencies for older releases can be found on the release overview page.'
and I sure as heck can not find the on the release overview page where it is suppose to be located.
So I am spinning again. Any help appreciated. This is just getting really ugly tonight !
I download the php-mime-mail.parser 3.04 (since my client is still on PHP 5.6). from this location:
https://github.com/php-mime-mail-parser/php-mime-mail-parser/releases/tag/3.0.4
It's a zip file. And I am trying to locate php_mailparse.dll for the version php-mime-mail.parser 3.04.
I went to this link: http://pecl.php.net/package/mailparse
but it stated this: 'Dependencies for older releases can be found on the release overview page.'
and I sure as heck can not find the on the release overview page where it is suppose to be located.
So I am spinning again. Any help appreciated. This is just getting really ugly tonight !
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you for this. You pointed out exactly what I was wondering about and that if it's stuffed into an array, I should be able to extract it. Right ?? I am unsure how to step down to it, but I am going to try. I also looked at a mime extractor etc.
I don't know which will win out. And thank you for link.
Everyone that has responded has been very generous with their knowledge.
I don't know which will win out. And thank you for link.
Everyone that has responded has been very generous with their knowledge.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you for responding. Yep, it's a drill down. And I understand where it's headed. - after pulling it out, doesn't decode properly.
And got pretty crazy. What I have decided to do is try the php-mime-mail.parser.
Here I go again !
And got pretty crazy. What I have decided to do is try the php-mime-mail.parser.
Here I go again !
ASKER
Thank you both for helping. It has been appreciated very much. I wish I could select both of your answer as my solution,
ASKER
Scott, David - managed to pull the files - stepping thru structure etc. David, thank you for your ideas - helpful.
And Scott, you were a good part of it - stepping thru the structure work. And some more research.
Thank you both again.
And Scott, you were a good part of it - stepping thru the structure work. And some more research.
Thank you both again.
ASKER
They will NOT allow the use of Composer to install any packages so that's an additional yeeech,