Solved

preg_match bug?

Posted on 2011-03-08
19
512 Views
Last Modified: 2012-05-11
I have been trying to figure out how to replace some text within a large string using preg_match, however I can't get it working.  
<?php
ini_set("memory_limit",'1024M'); 
ini_set("max_execution_time",'5000');
ini_set("display_errors",true);
ini_set("pcre.backtrack_limit", 100000);
ini_set("pcre.recursion_limit", 100000);
error_reporting(E_ALL);
$sString    =file_get_contents("text.txt"); 
$sString    =preg_replace('~(\\\r\\\n){3,}~','\r\n',$sString);
 var_export($sString);
?>

Open in new window


I'm running it on a server H Proliant MML350 G5 with 5GB Memory/XEON 2.33GHz with Linux OpenSUSE 11.1 x64.

PHP version:  5.2.13
preg.zip
0
Comment
Question by:Ludwig Diehl
  • 6
  • 5
  • 5
  • +1
19 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 35075291
Don't know if this will work, but some notes.

Use double quotes to cause "interpolation" (translation) of the \r and \n in the replacement string.  (I'm guessing you want real newlines, but I may be wrong).  Single quotes does not interpolate.

The first argument to preg_replace can also be a pattern (as opposed to a string which is what you were passing).  I think this will work better.  I'm guessing you want to find runs of at least three \r\n in the string and replace with a true newline?

$sString    =preg_replace(/(\\r\\n){3,}/,"\r\n",$sString);

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35077812
Could you do us a favor, please.  Post some of the original data, and post some of the desired output.  Like what was in "text.txt" and what did you expect to find in $sString after the processing completed.  Armed with that and a good explanation of your rules for data transformation we will probably be able to show you the code that will achieve your objective.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35077873
Also, is it a one off job? If so, using Perl could be a backup option. I've run Perl regex substitution scripts over files gigabytes in size with far more complicated patterns, and the performance and reliability was excellent.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35077949
@TerryAtOpus:  Just between you and me, we might want to use the PHP function nl2br() or the strip_tags() function or some combination of replacement of PHP_EOL with NULL.  It's easier when we see the input and are told about the desired output, eh!
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35077991
Ray, I agree. And generally it's better to avoid trying to learn a new tool if possible, but it's also reassuring to know that there's a viable backup option - I've seen people spend days on problems that didn't really need to be solved as they could be worked around easily.
0
 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35084639
Just try this simple example:

ini_set("memory_limit",'3024M'); 
ini_set("max_execution_time",'5000');
ini_set( 'pcre.backtrack_limit', 10000000);
ini_set( 'pcre.recursion_limit', 10000000);
$nMultiplier=6371;
$sString=str_repeat('\r\n',$nMultiplier).'This is the first string I want to get'.str_repeat('\r\n',$nMultiplier).'This is another string I want'.str_repeat('\r\n',$nMultiplier).'This is the last string';
echo 'Length: '.strlen($sString).'<br/>'.$sString,'<hr/>';
$sString    =preg_replace('~(\\\r\\\n){3,}~',"\r\n",$sString);               
echo $sString,'<br/>';

Open in new window


now, if you increment $nMultiplier by 1 you'll notice that it doesn't work. I have tried using ere_preg and it does work, but I want to use preg_replace.
The thing is that wrongly data was stored onto database double escaped and thus "\r\n" and in some cases several thousand hundreds of such string were stored. I want to replace them where pattern is matched.
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 35084748
I post this again - I believe you should be using a pattern, not a string, as the first argument to preg_replace. Start with something simple like this, which should replace all literal \r\n with newlines.  If this works, we can deal with collapsing the long runs.

 $sString    =preg_replace(/\\r\\n/,"\n",$sString); 

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35085860
@ludwigDiehl:  Let me try this again.

"Could you do us a favor, please.  Post some of the original data, and post some of the desired output.  Like what was in "text.txt" and what did you expect to find in $sString after the processing completed.  Armed with that and a good explanation of your rules for data transformation we will probably be able to show you the code that will achieve your objective."

It might be as simple as something like this little function.  Example on the web here:
http://www.laprbass.com/RAY_temp_ludwigdiehl.php
<?php // RAY_temp_ludwigdiehl.php
error_reporting(E_ALL);


// SHOW HOW TO GET BACK TO ONE NORMAL \r\n SEQUENCE


// SOME TEST DATA
$str = "This is a test string with \r\n\r\n\r\n\r\n\r\r\r\r\n\n\n too many strange CR/LF sequences";

// SHOW THE TEST DATA PREFORMATTED SO WE CAN SEE THE SEQUENCES
echo "<pre>";
echo $str;
echo PHP_EOL;
echo PHP_EOL;


// FIX THE STRING
$new = fix_str($str);
echo $new;
echo PHP_EOL;
echo PHP_EOL;



// A FUNCTION TO RESTORE SANITY TO TOO MANY CR/LF SEQUENCES
function fix_str($s)
{
    // WILL STOP IF THE STRING GOES EMPTY
    while ($s)
    {
        // WILL STOP IF THE STRING IS SANE
        $retry = FALSE;

        // REMOVE DOUBLED EOL CHARACTERS
        if (strpos($s, "\n\n") !== FALSE)
        {
            $retry = TRUE;
            $s = str_replace("\n\n", "\n", $s);
        }

        // REMOVE DOUBLED CR CHARACTERS
        if (strpos($s, "\r\r") !== FALSE)
        {
            $retry = TRUE;
            $s = str_replace("\r\r", "\r", $s);
        }

        // REMOVE DOUBLED WINDOWS CR/EOL CHARACTER SETS
        if (strpos($s, "\r\n\r\n") !== FALSE)
        {
            $retry = TRUE;
            $s = str_replace("\r\n\r\n", "\r\n", $s);
        }

        // SHOULD WE STOP OR TRY MORE REDUCTIONS
        if (!$retry) break;
    }

    return $s;
}

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35085873
PS: you might also want to trim($s) before line 60.
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35086528
sjklein42: the given example does not work at all. It expects a string..

Ray_Paseur: Thx for the proposal, but I want to use preg_match and also know why is this happening.
As I said before, I tried using ereg_replace and it works perfectly so, no need to use and alternate method, thanks anyway for your example.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35086689
OK, so are you not going to post the test data and the desired results set?  It would be so much easier for you to get a timely answer if you posted that.

If you want a solution, choose the code at ID:35085860.  It is a fully tested solution to the data-related problems.  It works.

And if it's just your personal learning exercise, best of luck learning to use preg_match.  Since you have a solution it's appropriate to close this question now.

Over and out, ~Ray
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 35086773
Sorry.  Once more with feeling.  Made it a string and a pattern!

 $sString    = preg_replace("/\\r\\n/", "\n", $sString);  

Open in new window

0
 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35087545
Ray_Paseur:  post ID: 35084639 reproduce exactly test data. I don't want to post the whole text as it is a 65K-character string. In my first post I didn't ask for an alternate solution. I said "I can't get it working".
Again, as I said before I have the solution using ereg or something like what u posted, however with this I'm trying to figure out what is happening. I've been using preg_match for more than 5 years and never have a problem like this so that's the point. Thanks either way.

sjklein42: it doesn't work ;)
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 35087698
Trying your example code worked fine, even when I added 1 to nMultiplier. Adding 1000 however caused a seg fault when running from the linux command line, but not when run through Apache (it still worked).

This fixed it from the command line:
$sString    = preg_replace("/\\\\r\\\\n/", "\r\n", $sString);
$sString    = preg_replace("/(\r\n){3,1000}/", "\r\n", $sString);
$sString    = preg_replace("/(\r\n){3,}/", "\r\n", $sString);
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35087713
The {3,} repeat seems to be the cause of the problem. Giving it a limit of 1000 repetitions obviously prevented the problem, but it's hardly an elegant solution.
0
 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35099471
Yes your example indeed work, however that's not supposed to be a documented preg_match limitation. Moreover, is it not PCRE(preg) supposed to be better in performance than POSIX (ereg)?.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35099547
I'd guess yes, and I know that the ereg functions are now deprecated, but I don't have much knowledge on the limits of PCRE. (I think your original question has at least been answered!)
0
 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35100716
you are right my friend!
0
 
LVL 6

Author Closing Comment

by:Ludwig Diehl
ID: 35130778
Solution is not exactly what I was looking for, however it still uses preg_match
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php (http://lmgtfy.com?q=how+easy+is+php)  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now