Avatar of Ray Paseur
Ray Paseur
Flag for United States of America

asked on 

Eliminate unprintable characters from mixed character/binary string

I have a PHP script that processes a long data string.  The string is read into my script via file_get_contents() from a Word, Powerpoint, Excel, flat text file, etc.  So some of the string is legible clear text and some of it is not so useful binary stuff like CR and LF, and application dependent binary values and just noise, for my purposes.

I want an efficient way to eliminate all the characters EXCEPT these listed below, by translating the non-listed characters into blanks.

Characters I want to KEEP include a-z A-Z 0-9 &!#$%@_ as well as the dot (period) and dash (minus).  Everything else should be converted into a blank.

I'm not worried about multibyte or UTF-16 or anything like that - the ASCII character set, considered as if one byte at a time, will do just fine.

Sample input and output are in the code snippet.  Anybody got a high-performance solution to help translate this stuff?

Many thanks, ~Ray
Hello, Bill?<A>3%($$)
Hello  Bill  A 3% $$

Open in new window

PHPRegular Expressions

Avatar of undefined
Last Comment

8/22/2022 - Mon