Do not use on any
shared computer
August 29, 2008 11:50pm pdt
 
[x]
Attachment Details

PHP PCRE Regex or str_replace for Hex Characters "0D-0A 0C 0D 0A"

Tags: hex, 0d, 0a, php
I have hundreds of TXT files, in ASCII format, that have a lots of page-breaks that I want to remove or replace.  
    When I look at it in Windows Notepad, it just looks like a little vertical rectangle on its own line.  As best I can tell from a hex viewer, the hex characters for this page-break are:
     0D-0A 0C 0D 0A

I know how to do PCRE regex for a single hex character like
     $string = preg_replace('/\x0A/', '\n', $string);
but I would prefer to do a full-replace of the whole page-break code (for example, to replace the whole thing with an empty string).
    I don't know how to handle a hyphenated code e.g. "0D-0A" or a whole string of codes like 0D-0A 0C 0D 0A .

    And because str_replace is faster than preg_replace, I hope there's a way to do it with str_replace.
    Please advise.
Start your free trial to view this solution
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

Question Stats
Zone: Programming
Question Asked By: FrankTech
Solution Provided By: ozo
Participating Experts: 3
Solution Grade: A
Views: 156
Translate:
Loading Advertisement...
 
[+][-]Accepted Solution by ozo

Rank: Wizard

Accepted Solution by ozo:

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
 
[+][-]Author Comment by FrankTech
Author Comment by FrankTech:

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
 
Loading Advertisement...
Open Discussion
Open Discussion
 
Comment by b0lsc0tt
FrankTech,

If the suggestion above doesn't work or isn't enough could you upload a small sample of the text with those characters.  You can zip the file and upload the zip file to www.ee-stuff.com , a site just for EE members.  Let me know if you have a question or need more info on this.   What is the hex viewer you used?

Most likely you will need to use a PHP function that will support an expression.  I don't believe str_replace will.

Let me know if you have any questions or need more information.

b0lsc0tt
 
 
Comment by b0lsc0tt
Oops, should've refreshed.  I left my browser too long. :)  Good job Ozo!

bol
 
 
Comment by FrankTech
b0lsc0tt,
   Thanks. I discovered that str_replace _does_ work for single hex characters if I use double quotes, like this:
        $text = str_replace("\x0A", "", $text);
OR with combinations of single characters, like this:
        $text = str_replace("\x0A\x0C\x0D", "", $text);

but I couldn't get it to work with the hyphated character \x0A-\x0D  .  I wish it could, because that would be faster than preg_replace.
 
 
Comment by b0lsc0tt
Yes, I didn't think of that possibility.  That isn't really an expression still but is very helpful to remember.  In a string that uses double quotes PHP will understand a number of "special characters."  As you found out the \xHH is used for characters with a hex reference.  Since the hexadecimal reference only allows 1 or 2 characters the hypenated one would have a problem.  I am actually still curious as to what that charact is if it really is a single character.  Oh, well. :)

If you are curious there is info on the other characters at http://us.php.net/manual/en/language.types.string.php#language.types.string.syntax.double .

bol
 
 
Comment by ddrudik
FrankTech, note that the construct "[\x0A-\x0D]" is the same as either "[\x0A\x0B\x0C\x0D]" or "(\x0A|\x0B|\x0C|\x0D)".  Normally a page break in a text file would be represented as the sequence "\x0C\x0D\x0A", the "\x0D\x0A" you saw prior to the page break was likely a CRLF sequence.

 
 
20080723-EE-VQP-34 / EE_QW_2_20070628