Solved

How to detect binary chars in a file using preg_match in PHP 5?

Posted on 2009-04-10
8
952 Views
Last Modified: 2012-05-06
What pattern could I use to detect whether a file contains non-printable chars using preg_match (or ereg)?

The logic in the conditional below could be reversed depending upon the easiest pattern.

Thanks in advance.
$fileBuffer = file_get_contents($filePath);	
$pattern = '/pattern??/';
$result = preg_match($pattern, $fileBuffer);
if (false === $result)
    return "binary file";
else
    return "text file";

Open in new window

0
Comment
Question by:DigitalDave1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
8 Comments
 
LVL 19

Expert Comment

by:LordOfPorts
ID: 24120236
The is_binary http://us2.php.net/is_binary function might be of interest.
0
 
LVL 19

Expert Comment

by:LordOfPorts
ID: 24120246
My mistake, sorry, is_binary is available starting with PHP 6.
0
 

Author Comment

by:DigitalDave1
ID: 24120249
Yes I saw is_binary(). But we are running PHP 5.x.

0
Secure Your WordPress Site: 5 Essential Approaches

WordPress is the web's most popular CMS, but its dominance also makes it a target for attackers. Our eBook will show you how to:

Prevent costly exploits of core and plugin vulnerabilities
Repel automated attacks
Lock down your dashboard, secure your code, and protect your users

 
LVL 110

Assisted Solution

by:Ray Paseur
Ray Paseur earned 500 total points
ID: 24120253
I use a "clean_string()" function to remove not only binary characters, but also unwanted characters.  The code snippet just tests for numbers, but you can add all the alpha and special characters to the REGEX.

So something like this...

$str = "12345";
if (!is_clean_numeric_string($str)) die("BAD NUMBER!");

HTH, ~Ray
function is_clean_numeric_string($string) // Q-N-D IS IT NUMERIC?
{ 
   $str = trim(ereg_replace(" +", " ", $string));
   $new = ereg_replace("[^0-9]", "?", $str);
	
   if ($new != $str) 
   {
      return FALSE; 
   } else {
      return ( $new ); 
   }
}

Open in new window

0
 
LVL 19

Expert Comment

by:LordOfPorts
ID: 24120255
Try using is_string http://us2.php.net/manual/en/function.is-string.php on $fileBuffer:
$fileBuffer = file_get_contents($filePath);     
 
$result = is_string($fileBuffer);
 
if (false === $result)
    return "binary file";
else
    return "text file";

Open in new window

0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 24120266
For a more expanded view of things, the pattern [\x00-\x1f] matches all control characters including the NUL.
0
 

Author Comment

by:DigitalDave1
ID: 24126414
Worked out a preg_match pattern to test for the non-printing chars that exclude \n \r \t  etc.:

$pattern = '/[\x00-\x08\x0E-\x1F\x7F]/';

Thanks for the clues that led to this idea.


0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 24126507
Thanks for the points -- it's a good question! ~Ray
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article is meant to give a basic understanding of how to use R Sweave as a way to merge LaTeX and R code seamlessly into one presentable document.
These days socially coordinated efforts have turned into a critical requirement for enterprises.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question