if readable text, do something

non technical explanation:
if the characters dont have a space, it is not readable text, may be an image

problem gets more difficult, because it may be character encoding of text and image


      GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ªž‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâä鐒–ïòøíðöŸ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØܝž ðñóÀÃÈ !<=>xyzrsthij‚ƒØÙÚÔÕÖijjÂÃ訦œœšááàŽ"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ›‘vìݵÔÇ£‰v–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.tS‘ƒ^fðٝïØœìÕšëÔšÛƏØÍÖÁŒÔ¿‹³¢u¯€[R<SK7×ÏðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛʝëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðéאŒ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž›››———•••
......


note: doesnt allways begin with GIF89a

if the space to character ratio is more than 1 space to 20 characters, do something
LVL 1
rgb192Asked:
Who is Participating?
 
gr8gonzoConnect With a Mentor ConsultantCommented:
The getimagesize() function checks to see if the entire file is an image. If the entire file is an email and the image is somewhere INSIDE the email, then that will not work. You need to extract the data first (you can use the mime parser class for that), THEN save the extracted data to a file, and THEN use getimagesize() on that file to test the data.

You cannot rely on the presence of special characters to tell you whether something is readable or not. It sounds good in your head, but if you try to do that, you will end up getting images that have comments in them, or you'll miss text that is encoded differently.

That said, a MIME-encoded email comes in different parts and each part is labeled with a content-type and separated by a boundary. The email basically looks like:

<email headers>
Content-type: multipart/mixed; boundary="---abc123---"

---abc123---
Content-Type: text/plain

<text body of an email>

---abc123---
Content-Type: text/html

<html body of an email>

---abc123---
Content-Type: image/gif
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="abc.gif"

<base64 encoded GIF image>

---abc123---

Open in new window



Now, that's a very generic look at it, but you can see how it is structured and how each different part has its own content type. The MIME parser class should be able to extract each part and tell you the value of "Content-Type" and give you the data for that part.

If you simply check the value of Content-Type to see if it is text/plain or text/html, that should tell you that it is a readable kind of content (although you probably do not want to touch anything that is an HTML file and also an attachment, since those can carry viruses).

So you may want to try focusing on using the MIME parser class to pull the parts and simply look at the content-type for things that indicate that they are text and not attachments. Don't worry about trying to analyze binary data.

Also keep in mind that there are thousands of email clients and scripts out there, and not all of them pay attention to the rules. Someone can send you a broken email, and the MIME parser class may not be able to parse it. That is fine - at some point you simply need to accept that the email is broken and you skip it. Sometimes spammers intentionally send broken emails so that viruses can be unleashed when you try to open the email to see what is wrong.
0
 
gr8gonzoConsultantCommented:
Don't try to use spaces and ratios to determine this - it will not be a reliable detection.

There's something called "MIME Magic" that will examine the first few bytes of a file and try to detect what kind of file it is (every file format has a "signature"). PHP has a mime_content_type() function for doing this for you.

If it's always either an image or text, you could also use the getimagesize() function:

if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
}
else
{
  // $filename is not an image
}
0
 
rgb192Author Commented:
note: there is no file
the input is from an array var dump
of a broken up email
and there may be text attached to an image


i could not save

  GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ªž‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâä鐒–ïòøíðöŸ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØܝž ðñóÀÃÈ !<=>xyzrsthij‚ƒØÙÚÔÕÖijjÂÃ訦œœšááàŽ"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ›‘vìݵÔÇ£‰v–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.tS‘ƒ^fðٝïØœìÕšëÔšÛƏØÍÖÁŒÔ¿‹³¢u¯€[R<SK7×ÏðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛʝëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðéאŒ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž›››———•••‰

Open in new window


because my code editor forced me to traslitatrate


<?php
$filename='
GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ª??ž??‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw??“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâäé?’–ïòøíðö?Ÿ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØÜ?ž ðñóÀÃÈ !<=>xyzrsthij?‚ƒØÙÚÔÕÖijjÂÃ訦œœšááà??Ž"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ?›‘vìݵÔÇ£?‰v?–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.?tS‘ƒ^??fðÙ?ïØœìÕšëÔšÛÆ?ØÃ?ÖÁŒÔ¿‹³¢u¯€[R<SK7×Ã?ðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛÊ?ëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðé×?Œ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚?€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž???›››———•••???‰
';
if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
}
else
{
  // $filename is not an image
}

Open in new window





Warning: getimagesize( GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ª??ž??‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw??“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâäé?’–ïòøíðö?Ÿ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØÜ?ž ðñóÀÃÈ !<=>xyzrsthij?‚ƒØÙÚÔÕÖijjÂÃ訦œœšááà??Ž"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ?›‘vìݵÔÇ£?‰v?–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.?tS‘ƒ^??fðÙ?ïØœìÕšëÔšÛÆ?ØÃ?ÖÁŒÔ¿‹³¢u¯€[R<SK7×Ã?ðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛÊ?ëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðé×?Œ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚?€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž???›››———•••???‰ ) [function.getimagesize]: failed to open stream: No error in  on line 5
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
gr8gonzoConsultantCommented:
You would need to save it to a temporary file to do this.

$tempfile = tempnam();
file_put_contents($tempfile,"GIF89...");
if( ($size = getimagesize($tempfile)) !== false )
{
}
else
{
}
unlink($tempfile);
0
 
rgb192Author Commented:
$filename='C:/Mail-Backup/2013/1/24/655902028-22789.eml';
if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
  echo 'image';
}
else
{
  // $filename is not an image
  echo 'not';
}

Open in new window



output is
not image



but the .eml file has an image in it

I am using

http://www.phpclasses.org/package/3169-PHP-Decode-MIME-e-mail-messages.html

(please dont comment too much about the mime parser class, which sometimes does not function properly because I am only a beginner)

and I am not looking for a image
I am looking when a body is not readable
and when body is many special characters because there is an image

so, I would like to look at a small block of text and say, this text is not readable, do something else


for example

(if $text is readable){
echo text
}else{
do something else
}
0
 
rgb192Author Commented:
thanks for the detailed explanation about file types

now I will look at email files more closely (for viruses)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.