Solved

if readable text, do something

Posted on 2013-02-05
6
167 Views
Last Modified: 2013-02-06
non technical explanation:
if the characters dont have a space, it is not readable text, may be an image

problem gets more difficult, because it may be character encoding of text and image


      GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ªž‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâä鐒–ïòøíðöŸ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØܝž ðñóÀÃÈ !<=>xyzrsthij‚ƒØÙÚÔÕÖijjÂÃ訦œœšááàŽ"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ›‘vìݵÔÇ£‰v–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.tS‘ƒ^fðٝïØœìÕšëÔšÛƏØÍÖÁŒÔ¿‹³¢u¯€[R<SK7×ÏðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛʝëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðéאŒ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž›››———•••
......


note: doesnt allways begin with GIF89a

if the space to character ratio is more than 1 space to 20 characters, do something
0
Comment
Question by:rgb192
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
6 Comments
 
LVL 35

Expert Comment

by:gr8gonzo
ID: 38856494
Don't try to use spaces and ratios to determine this - it will not be a reliable detection.

There's something called "MIME Magic" that will examine the first few bytes of a file and try to detect what kind of file it is (every file format has a "signature"). PHP has a mime_content_type() function for doing this for you.

If it's always either an image or text, you could also use the getimagesize() function:

if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
}
else
{
  // $filename is not an image
}
0
 

Author Comment

by:rgb192
ID: 38856644
note: there is no file
the input is from an array var dump
of a broken up email
and there may be text attached to an image


i could not save

  GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ªž‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâä鐒–ïòøíðöŸ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØܝž ðñóÀÃÈ !<=>xyzrsthij‚ƒØÙÚÔÕÖijjÂÃ訦œœšááàŽ"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ›‘vìݵÔÇ£‰v–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.tS‘ƒ^fðٝïØœìÕšëÔšÛƏØÍÖÁŒÔ¿‹³¢u¯€[R<SK7×ÏðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛʝëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðéאŒ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž›››———•••‰

Open in new window


because my code editor forced me to traslitatrate


<?php
$filename='
GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ª??ž??‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw??“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâäé?’–ïòøíðö?Ÿ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØÜ?ž ðñóÀÃÈ !<=>xyzrsthij?‚ƒØÙÚÔÕÖijjÂÃ訦œœšááà??Ž"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ?›‘vìݵÔÇ£?‰v?–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.?tS‘ƒ^??fðÙ?ïØœìÕšëÔšÛÆ?ØÃ?ÖÁŒÔ¿‹³¢u¯€[R<SK7×Ã?ðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛÊ?ëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðé×?Œ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚?€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž???›››———•••???‰
';
if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
}
else
{
  // $filename is not an image
}

Open in new window





Warning: getimagesize( GIF89a@„÷ÿÿÿJJMIIKÑÑÔ‘‘“ííïïïðëëì××ØÅÅÆ©©ª??ž??‚ÑÒؽ¾Ãxy}stxƒ„ˆ‚ƒ‡³´¸ðñõíîò×ÙàðòùbcfZ[^ÛÝãØÚàÖØÞÔÖÜíïõstw??“¶·º©ª­ïðóëìïÛÜßØÙÜÖ×ÚÔÕØÏÑÖÅÇÌðò÷ïñöëíòâäé?’–ïòøíðö?Ÿ£QRTÔÖÚÌÎÒjkmcdfïñõÚÜàØÚÞÖØÜ?ž ðñóÀÃÈ !<=>xyzrsthij?‚ƒØÙÚÔÕÖijjÂÃ訦œœšááà??Ž"![YRÕÓÌÌÉ¿h^BbY?©™nÔÁŽ<7)ðܦsjQªŸ?›‘vìݵÔÇ£?‰v?–‚ðåǬ¥’xti¾¸§ÕÑÅÓÏÄxvpðïìIA.?tS‘ƒ^??fðÙ?ïØœìÕšëÔšÛÆ?ØÃ?ÖÁŒÔ¿‹³¢u¯€[R<SK7×Ã?ðÚ¡íןãΘë՞ǵ†bYBÙÆ”ìØ£ÛÈ—ÕÔØÇšðݬíÚªÛÊ?ëØ©bZGÕÅ›ìÛ¯ðß³ëÚ¯ÚÊ¢×É¥<8.ðá»ÚÍ«ÕÊ®âØÀÖͶîäË×ϼ„€vðé×?Œ‚ìæ×´°¦#<5%60"73*îêáÛØÑQNHJIGðíç¾½»ÛÚØØ×ÕÖÕÓÔÓÑsrq‚?€ÁÀ¿µ´³^]]üüüúúúøøøöööõõõòòòñññïïïíííëëëèèèäääâââÛÛÛØØØÖÖÖÔÔÔÒÒÒÑÑÑÏÏÏÌÌÌÈÈÈÆÆÆÅÅŶ¶¶³³³±±±®®®¦¦¦¥¥¥¢¢¢žžž???›››———•••???‰ ) [function.getimagesize]: failed to open stream: No error in  on line 5
0
 
LVL 35

Expert Comment

by:gr8gonzo
ID: 38857313
You would need to save it to a temporary file to do this.

$tempfile = tempnam();
file_put_contents($tempfile,"GIF89...");
if( ($size = getimagesize($tempfile)) !== false )
{
}
else
{
}
unlink($tempfile);
0
Why Off-Site Backups Are The Only Way To Go

You are probably backing up your data—but how and where? Ransomware is on the rise and there are variants that specifically target backups. Read on to discover why off-site is the way to go.

 

Author Comment

by:rgb192
ID: 38857806
$filename='C:/Mail-Backup/2013/1/24/655902028-22789.eml';
if( ($size = getimagesize($filename)) !== false )
{
  // $filename is an image
  echo 'image';
}
else
{
  // $filename is not an image
  echo 'not';
}

Open in new window



output is
not image



but the .eml file has an image in it

I am using

http://www.phpclasses.org/package/3169-PHP-Decode-MIME-e-mail-messages.html

(please dont comment too much about the mime parser class, which sometimes does not function properly because I am only a beginner)

and I am not looking for a image
I am looking when a body is not readable
and when body is many special characters because there is an image

so, I would like to look at a small block of text and say, this text is not readable, do something else


for example

(if $text is readable){
echo text
}else{
do something else
}
0
 
LVL 35

Accepted Solution

by:
gr8gonzo earned 500 total points
ID: 38859548
The getimagesize() function checks to see if the entire file is an image. If the entire file is an email and the image is somewhere INSIDE the email, then that will not work. You need to extract the data first (you can use the mime parser class for that), THEN save the extracted data to a file, and THEN use getimagesize() on that file to test the data.

You cannot rely on the presence of special characters to tell you whether something is readable or not. It sounds good in your head, but if you try to do that, you will end up getting images that have comments in them, or you'll miss text that is encoded differently.

That said, a MIME-encoded email comes in different parts and each part is labeled with a content-type and separated by a boundary. The email basically looks like:

<email headers>
Content-type: multipart/mixed; boundary="---abc123---"

---abc123---
Content-Type: text/plain

<text body of an email>

---abc123---
Content-Type: text/html

<html body of an email>

---abc123---
Content-Type: image/gif
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="abc.gif"

<base64 encoded GIF image>

---abc123---

Open in new window



Now, that's a very generic look at it, but you can see how it is structured and how each different part has its own content type. The MIME parser class should be able to extract each part and tell you the value of "Content-Type" and give you the data for that part.

If you simply check the value of Content-Type to see if it is text/plain or text/html, that should tell you that it is a readable kind of content (although you probably do not want to touch anything that is an HTML file and also an attachment, since those can carry viruses).

So you may want to try focusing on using the MIME parser class to pull the parts and simply look at the content-type for things that indicate that they are text and not attachments. Don't worry about trying to analyze binary data.

Also keep in mind that there are thousands of email clients and scripts out there, and not all of them pay attention to the rules. Someone can send you a broken email, and the MIME parser class may not be able to parse it. That is fine - at some point you simply need to accept that the email is broken and you skip it. Sometimes spammers intentionally send broken emails so that viruses can be unleashed when you try to open the email to see what is wrong.
0
 

Author Closing Comment

by:rgb192
ID: 38860347
thanks for the detailed explanation about file types

now I will look at email files more closely (for viruses)
0

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question