Link to home
Start Free TrialLog in
Avatar of seanhess
seanhess

asked on

PHP pcre Regular Expression max characters? -- regex, preg_match

PHP's preg_match seems to be failing when the subject has too many characters in it.  

This is my expression::
   preg_match('/(.*<div id="storeArea">\s*)(.*)(\s*<\/div>\s*<!--POST-BODY-START-->.*)/si', $subject, $regs)

And here is the subject
<html>
<body>
<div id="storeArea">
<div>aaa ... </div>
<div>aaa ... </div>
<div>aaa ... </div>
</div>
<!--POST-BODY-START-->
<!--POST-BODY-END-->
      </body>
</html>

It will match fine on that, but if the aaa .... is made HUGELY long, it won't match any more.  I tested it with 80,000 a's in each line.  If there was only one line (div tag) of A's, it would match, but it wouldn't match 3 lines of a's.

Is there a character limit to preg_match?  Why would it behave like this?  Can I fix the regular expression?

Thanks!

We're sending info to a php script.  It is supposed to match a regular expression against the data.
ASKER CERTIFIED SOLUTION
Avatar of Steve Bink
Steve Bink
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of seanhess
seanhess

ASKER

Hmm... I still don't understand why it worked in Perl but not PHP, but I used a split, and then performed a regex only on the second half.  

This seems to work ... I hope we don't run into problems with the second half!
Well, PCRE stands for Perl COMPATIBLE Regular Expressions, so I'm guessing it it missing a few features/functions found in the original.  I've only taken a brief look at the actual PerlRE docs, and it looks much more painful overall.  Perhaps you can write the routine in Perl and call it from PHP?
That's alright.. I did get it working.  Once I knew the size was the problem, it wasn't hard to fix.

> "Perhaps you can write the routine in Perl and call it from PHP?"
No way... that would be way more work than it is worth.  I'll stick with the split.. it only added one line.
Don't fix what ain't broke, right?  :)  Good luck to you, and thanks for the points!