Solved

if readable text, do something

Posted on 2013-02-05
6
161 Views
Last Modified: 2013-02-06
non technical explanation:
if the characters dont have a space, it is not readable text, may be an image

problem gets more difficult, because it may be character encoding of text and image


      GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ªž‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâä鐒–ïòøíðöŸ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØܝž ðñóÀÃÈ !<=>xyzrsthij‚ƒØÙÚÔÕÖijjÂÃ訦œœšááàŽ"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ›‘vìݵÔÇ£‰v–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.tS‘ƒ^fðٝïØœìÕšëÔšÛƏØÍÖÁŒÔ¿‹³¢u¯€[R<SK7×ÏðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛʝëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðéאŒ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž›››———•••‰
......


note: doesnt allways begin with GIF89a

if the space to character ratio is more than 1 space to 20 characters, do something
0
Comment
Question by:rgb192
  • 3
  • 3
6 Comments
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 38856494
Don't try to use spaces and ratios to determine this - it will not be a reliable detection.

There's something called "MIME Magic" that will examine the first few bytes of a file and try to detect what kind of file it is (every file format has a "signature"). PHP has a mime_content_type() function for doing this for you.

If it's always either an image or text, you could also use the getimagesize() function:

if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
}
else
{
  // $filename is not an image
}
0
 

Author Comment

by:rgb192
ID: 38856644
note: there is no file
the input is from an array var dump
of a broken up email
and there may be text attached to an image


i could not save

  GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ªž‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâä鐒–ïòøíðöŸ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØܝž ðñóÀÃÈ !<=>xyzrsthij‚ƒØÙÚÔÕÖijjÂÃ訦œœšááàŽ"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ›‘vìݵÔÇ£‰v–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.tS‘ƒ^fðٝïØœìÕšëÔšÛƏØÍÖÁŒÔ¿‹³¢u¯€[R<SK7×ÏðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛʝëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðéאŒ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž›››———•••‰

Open in new window


because my code editor forced me to traslitatrate


<?php
$filename='
GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ª??ž??‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw??“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâäé?’–ïòøíðö?Ÿ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØÜ?ž ðñóÀÃÈ !<=>xyzrsthij?‚ƒØÙÚÔÕÖijjÂÃ訦œœšááà??Ž"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ?›‘vìݵÔÇ£?‰v?–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.?tS‘ƒ^??fðÙ?ïØœìÕšëÔšÛÆ?ØÃ?ÖÁŒÔ¿‹³¢u¯€[R<SK7×Ã?ðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛÊ?ëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðé×?Œ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚?€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž???›››———•••???‰
';
if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
}
else
{
  // $filename is not an image
}

Open in new window





Warning: getimagesize( GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ª??ž??‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw??“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâäé?’–ïòøíðö?Ÿ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØÜ?ž ðñóÀÃÈ !<=>xyzrsthij?‚ƒØÙÚÔÕÖijjÂÃ訦œœšááà??Ž"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ?›‘vìݵÔÇ£?‰v?–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.?tS‘ƒ^??fðÙ?ïØœìÕšëÔšÛÆ?ØÃ?ÖÁŒÔ¿‹³¢u¯€[R<SK7×Ã?ðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛÊ?ëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðé×?Œ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚?€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž???›››———•••???‰ ) [function.getimagesize]: failed to open stream: No error in  on line 5
0
 
LVL 34

Expert Comment

by:gr8gonzo
ID: 38857313
You would need to save it to a temporary file to do this.

$tempfile = tempnam();
file_put_contents($tempfile,"GIF89...");
if( ($size = getimagesize($tempfile)) !== false )
{
}
else
{
}
unlink($tempfile);
0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:rgb192
ID: 38857806
$filename='C:/Mail-Backup/2013/1/24/655902028-22789.eml';
if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
  echo 'image';
}
else
{
  // $filename is not an image
  echo 'not';
}

Open in new window



output is
not image



but the .eml file has an image in it

I am using

http://www.phpclasses.org/package/3169-PHP-Decode-MIME-e-mail-messages.html

(please dont comment too much about the mime parser class, which sometimes does not function properly because I am only a beginner)

and I am not looking for a image
I am looking when a body is not readable
and when body is many special characters because there is an image

so, I would like to look at a small block of text and say, this text is not readable, do something else


for example

(if $text is readable){
echo text
}else{
do something else
}
0
 
LVL 34

Accepted Solution

by:
gr8gonzo earned 500 total points
ID: 38859548
The getimagesize() function checks to see if the entire file is an image. If the entire file is an email and the image is somewhere INSIDE the email, then that will not work. You need to extract the data first (you can use the mime parser class for that), THEN save the extracted data to a file, and THEN use getimagesize() on that file to test the data.

You cannot rely on the presence of special characters to tell you whether something is readable or not. It sounds good in your head, but if you try to do that, you will end up getting images that have comments in them, or you'll miss text that is encoded differently.

That said, a MIME-encoded email comes in different parts and each part is labeled with a content-type and separated by a boundary. The email basically looks like:

<email headers>
Content-type: multipart/mixed; boundary="---abc123---"

---abc123---
Content-Type: text/plain

<text body of an email>

---abc123---
Content-Type: text/html

<html body of an email>

---abc123---
Content-Type: image/gif
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="abc.gif"

<base64 encoded GIF image>

---abc123---

Open in new window



Now, that's a very generic look at it, but you can see how it is structured and how each different part has its own content type. The MIME parser class should be able to extract each part and tell you the value of "Content-Type" and give you the data for that part.

If you simply check the value of Content-Type to see if it is text/plain or text/html, that should tell you that it is a readable kind of content (although you probably do not want to touch anything that is an HTML file and also an attachment, since those can carry viruses).

So you may want to try focusing on using the MIME parser class to pull the parts and simply look at the content-type for things that indicate that they are text and not attachments. Don't worry about trying to analyze binary data.

Also keep in mind that there are thousands of email clients and scripts out there, and not all of them pay attention to the rules. Someone can send you a broken email, and the MIME parser class may not be able to parse it. That is fine - at some point you simply need to accept that the email is broken and you skip it. Sometimes spammers intentionally send broken emails so that viruses can be unleashed when you try to open the email to see what is wrong.
0
 

Author Closing Comment

by:rgb192
ID: 38860347
thanks for the detailed explanation about file types

now I will look at email files more closely (for viruses)
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many.  But the PHP code for processing the checkboxes can be confusing at first.  What if a checkbox is…
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now