How to detect binary chars in a file using preg_match in PHP 5?

What pattern could I use to detect whether a file contains non-printable chars using preg_match (or ereg)?

The logic in the conditional below could be reversed depending upon the easiest pattern.

Thanks in advance.
$fileBuffer = file_get_contents($filePath);	
$pattern = '/pattern??/';
$result = preg_match($pattern, $fileBuffer);
if (false === $result)
    return "binary file";
else
    return "text file";

Open in new window

DigitalDave1Asked:
Who is Participating?
 
Ray PaseurCommented:
For a more expanded view of things, the pattern [\x00-\x1f] matches all control characters including the NUL.
0
 
LordOfPortsCommented:
The is_binary http://us2.php.net/is_binary function might be of interest.
0
 
LordOfPortsCommented:
My mistake, sorry, is_binary is available starting with PHP 6.
0
Cloud Class® Course: C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

 
DigitalDave1Author Commented:
Yes I saw is_binary(). But we are running PHP 5.x.

0
 
Ray PaseurCommented:
I use a "clean_string()" function to remove not only binary characters, but also unwanted characters.  The code snippet just tests for numbers, but you can add all the alpha and special characters to the REGEX.

So something like this...

$str = "12345";
if (!is_clean_numeric_string($str)) die("BAD NUMBER!");

HTH, ~Ray
function is_clean_numeric_string($string) // Q-N-D IS IT NUMERIC?
{ 
   $str = trim(ereg_replace(" +", " ", $string));
   $new = ereg_replace("[^0-9]", "?", $str);
	
   if ($new != $str) 
   {
      return FALSE; 
   } else {
      return ( $new ); 
   }
}

Open in new window

0
 
LordOfPortsCommented:
Try using is_string http://us2.php.net/manual/en/function.is-string.php on $fileBuffer:
$fileBuffer = file_get_contents($filePath);     
 
$result = is_string($fileBuffer);
 
if (false === $result)
    return "binary file";
else
    return "text file";

Open in new window

0
 
DigitalDave1Author Commented:
Worked out a preg_match pattern to test for the non-printing chars that exclude \n \r \t  etc.:

$pattern = '/[\x00-\x08\x0E-\x1F\x7F]/';

Thanks for the clues that led to this idea.


0
 
Ray PaseurCommented:
Thanks for the points -- it's a good question! ~Ray
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.