Solved

preg_match bug?

Posted on 2011-03-08
19
519 Views
Last Modified: 2012-05-11
I have been trying to figure out how to replace some text within a large string using preg_match, however I can't get it working.  
<?php
ini_set("memory_limit",'1024M'); 
ini_set("max_execution_time",'5000');
ini_set("display_errors",true);
ini_set("pcre.backtrack_limit", 100000);
ini_set("pcre.recursion_limit", 100000);
error_reporting(E_ALL);
$sString    =file_get_contents("text.txt"); 
$sString    =preg_replace('~(\\\r\\\n){3,}~','\r\n',$sString);
 var_export($sString);
?>

Open in new window


I'm running it on a server H Proliant MML350 G5 with 5GB Memory/XEON 2.33GHz with Linux OpenSUSE 11.1 x64.

PHP version:  5.2.13
preg.zip
0
Comment
Question by:Ludwig Diehl
  • 6
  • 5
  • 5
  • +1
19 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 35075291
Don't know if this will work, but some notes.

Use double quotes to cause "interpolation" (translation) of the \r and \n in the replacement string.  (I'm guessing you want real newlines, but I may be wrong).  Single quotes does not interpolate.

The first argument to preg_replace can also be a pattern (as opposed to a string which is what you were passing).  I think this will work better.  I'm guessing you want to find runs of at least three \r\n in the string and replace with a true newline?

$sString    =preg_replace(/(\\r\\n){3,}/,"\r\n",$sString);

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35077812
Could you do us a favor, please.  Post some of the original data, and post some of the desired output.  Like what was in "text.txt" and what did you expect to find in $sString after the processing completed.  Armed with that and a good explanation of your rules for data transformation we will probably be able to show you the code that will achieve your objective.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35077873
Also, is it a one off job? If so, using Perl could be a backup option. I've run Perl regex substitution scripts over files gigabytes in size with far more complicated patterns, and the performance and reliability was excellent.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35077949
@TerryAtOpus:  Just between you and me, we might want to use the PHP function nl2br() or the strip_tags() function or some combination of replacement of PHP_EOL with NULL.  It's easier when we see the input and are told about the desired output, eh!
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35077991
Ray, I agree. And generally it's better to avoid trying to learn a new tool if possible, but it's also reassuring to know that there's a viable backup option - I've seen people spend days on problems that didn't really need to be solved as they could be worked around easily.
0
 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35084639
Just try this simple example:

ini_set("memory_limit",'3024M'); 
ini_set("max_execution_time",'5000');
ini_set( 'pcre.backtrack_limit', 10000000);
ini_set( 'pcre.recursion_limit', 10000000);
$nMultiplier=6371;
$sString=str_repeat('\r\n',$nMultiplier).'This is the first string I want to get'.str_repeat('\r\n',$nMultiplier).'This is another string I want'.str_repeat('\r\n',$nMultiplier).'This is the last string';
echo 'Length: '.strlen($sString).'<br/>'.$sString,'<hr/>';
$sString    =preg_replace('~(\\\r\\\n){3,}~',"\r\n",$sString);               
echo $sString,'<br/>';

Open in new window


now, if you increment $nMultiplier by 1 you'll notice that it doesn't work. I have tried using ere_preg and it does work, but I want to use preg_replace.
The thing is that wrongly data was stored onto database double escaped and thus "\r\n" and in some cases several thousand hundreds of such string were stored. I want to replace them where pattern is matched.
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 35084748
I post this again - I believe you should be using a pattern, not a string, as the first argument to preg_replace. Start with something simple like this, which should replace all literal \r\n with newlines.  If this works, we can deal with collapsing the long runs.

 $sString    =preg_replace(/\\r\\n/,"\n",$sString); 

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35085860
@ludwigDiehl:  Let me try this again.

"Could you do us a favor, please.  Post some of the original data, and post some of the desired output.  Like what was in "text.txt" and what did you expect to find in $sString after the processing completed.  Armed with that and a good explanation of your rules for data transformation we will probably be able to show you the code that will achieve your objective."

It might be as simple as something like this little function.  Example on the web here:
http://www.laprbass.com/RAY_temp_ludwigdiehl.php
<?php // RAY_temp_ludwigdiehl.php
error_reporting(E_ALL);


// SHOW HOW TO GET BACK TO ONE NORMAL \r\n SEQUENCE


// SOME TEST DATA
$str = "This is a test string with \r\n\r\n\r\n\r\n\r\r\r\r\n\n\n too many strange CR/LF sequences";

// SHOW THE TEST DATA PREFORMATTED SO WE CAN SEE THE SEQUENCES
echo "<pre>";
echo $str;
echo PHP_EOL;
echo PHP_EOL;


// FIX THE STRING
$new = fix_str($str);
echo $new;
echo PHP_EOL;
echo PHP_EOL;



// A FUNCTION TO RESTORE SANITY TO TOO MANY CR/LF SEQUENCES
function fix_str($s)
{
    // WILL STOP IF THE STRING GOES EMPTY
    while ($s)
    {
        // WILL STOP IF THE STRING IS SANE
        $retry = FALSE;

        // REMOVE DOUBLED EOL CHARACTERS
        if (strpos($s, "\n\n") !== FALSE)
        {
            $retry = TRUE;
            $s = str_replace("\n\n", "\n", $s);
        }

        // REMOVE DOUBLED CR CHARACTERS
        if (strpos($s, "\r\r") !== FALSE)
        {
            $retry = TRUE;
            $s = str_replace("\r\r", "\r", $s);
        }

        // REMOVE DOUBLED WINDOWS CR/EOL CHARACTER SETS
        if (strpos($s, "\r\n\r\n") !== FALSE)
        {
            $retry = TRUE;
            $s = str_replace("\r\n\r\n", "\r\n", $s);
        }

        // SHOULD WE STOP OR TRY MORE REDUCTIONS
        if (!$retry) break;
    }

    return $s;
}

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35085873
PS: you might also want to trim($s) before line 60.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35086528
sjklein42: the given example does not work at all. It expects a string..

Ray_Paseur: Thx for the proposal, but I want to use preg_match and also know why is this happening.
As I said before, I tried using ereg_replace and it works perfectly so, no need to use and alternate method, thanks anyway for your example.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 35086689
OK, so are you not going to post the test data and the desired results set?  It would be so much easier for you to get a timely answer if you posted that.

If you want a solution, choose the code at ID:35085860.  It is a fully tested solution to the data-related problems.  It works.

And if it's just your personal learning exercise, best of luck learning to use preg_match.  Since you have a solution it's appropriate to close this question now.

Over and out, ~Ray
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 35086773
Sorry.  Once more with feeling.  Made it a string and a pattern!

 $sString    = preg_replace("/\\r\\n/", "\n", $sString);  

Open in new window

0
 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35087545
Ray_Paseur:  post ID: 35084639 reproduce exactly test data. I don't want to post the whole text as it is a 65K-character string. In my first post I didn't ask for an alternate solution. I said "I can't get it working".
Again, as I said before I have the solution using ereg or something like what u posted, however with this I'm trying to figure out what is happening. I've been using preg_match for more than 5 years and never have a problem like this so that's the point. Thanks either way.

sjklein42: it doesn't work ;)
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 35087698
Trying your example code worked fine, even when I added 1 to nMultiplier. Adding 1000 however caused a seg fault when running from the linux command line, but not when run through Apache (it still worked).

This fixed it from the command line:
$sString    = preg_replace("/\\\\r\\\\n/", "\r\n", $sString);
$sString    = preg_replace("/(\r\n){3,1000}/", "\r\n", $sString);
$sString    = preg_replace("/(\r\n){3,}/", "\r\n", $sString);
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35087713
The {3,} repeat seems to be the cause of the problem. Giving it a limit of 1000 repetitions obviously prevented the problem, but it's hardly an elegant solution.
0
 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35099471
Yes your example indeed work, however that's not supposed to be a documented preg_match limitation. Moreover, is it not PCRE(preg) supposed to be better in performance than POSIX (ereg)?.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 35099547
I'd guess yes, and I know that the ereg functions are now deprecated, but I don't have much knowledge on the limits of PCRE. (I think your original question has at least been answered!)
0
 
LVL 6

Author Comment

by:Ludwig Diehl
ID: 35100716
you are right my friend!
0
 
LVL 6

Author Closing Comment

by:Ludwig Diehl
ID: 35130778
Solution is not exactly what I was looking for, however it still uses preg_match
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now