Link to home
Create AccountLog in
PHP

PHP

--

Questions

--

Followers

Top Experts

Avatar of garyhoffmann
garyhoffmann

I need help deciphering some regular expressions
I've never been an expert in regular expressions.  Below, I'm pasting in several preg_replace commands that are in a PHP script.  I'm hoping that someone here that knows regular expressions like the back of their hands can tell me what these are doing faster than I could possibly decipher them on my own.

      $string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])(on|xmlns)[^>]*>#iUu', "$1>", $string);

      $string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iUu', '$1=$2nojavascript...', $string);
echo "<br>String is now {$string}<br>";
     
      $string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iUu', '$1=$2novbscript...', $string);
echo "<br>String is now {$string}<br>";
     
      $string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*-moz-binding[\x00-\x20]*:#Uu', '$1=$2nomozbinding...', $string);
      $string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*data[\x00-\x20]*:#Uu', '$1=$2nodata...', $string);

      $string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])style[^>]*>#iUu', "$1>", $string);

      $string = preg_replace('#</*\w+:\w[^>]*>#i', "", $string);

Thank you in advance for any assistance.

Gary.

Zero AI Policy

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


SOLUTION
Avatar of Dan CraciunDan Craciun🇷🇴

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.
Create Account

Avatar of Ray PaseurRay Paseur🇺🇸

Link to purchase RegexBuddy here:
http://www.regexbuddy.com/tutorial.html

This is helpful, too:
http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/pdf/

FWIW, if you're originating the regular expressions, writing them like this with comments on separate lines helps with the understanding.

// FIND ANY WORD WITH A CHARACTER REPEATED 3 OR MORE TIMES
$rgx
= '#'          // REGEX DELIMITER
. '(\w)'       // GROUP OF ANY WORD CHARACTER
. '\1'         // BACKREFERENCE TO GROUP 1
. '{2,}'       // REPEATED TWO OR MORE TIMES
. '#'          // REGEX DELIMITER
;

Open in new window


ASKER CERTIFIED SOLUTION
Avatar of kaufmedkaufmed🇺🇸

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.

Avatar of garyhoffmanngaryhoffmann

ASKER

@Dan Crucian - RegexBuddy does seem like it would be very helpful - it appears to have a "PHP Mode", so I'm hoping it deals with the things such as @kaufmed pointed out.

@kaufmed - without your help, I was feeling that I was even more confused - thank you!

Avatar of kaufmedkaufmed🇺🇸

For what it's worth, these patterns look to be doing some sort of XML/HTML parsing. Generally speaking, regex isn't the tool for this. You'd typically use a library that is setup specifically for handling XML/HTML.

Glad to help  = )

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


@kaufmed - they are - they are trying to strip potentially dangerous items out of user submitted forms, but the problem is they were stripping almost anything entered (into a WYSIWYG editor) and returning blank strings most of the time.

Avatar of Ray PaseurRay Paseur🇺🇸

Ahh -- XML parsing?  Maybe you can post a new question with some examples of the data you want to redact.  We can help with that, and there is no REGEX involved!

The first expression is exactly one I have been having problems with because its syntax is wrong causing error '4' found by preg_last_error() and this is the reason O.P. found many null returns.
Where its going wrong I dont know as there is no decoding utility which shows where a regex is wrong.
Anyone hazard a guess whats wrong in expression 1 as it applies to the others as well they all come up with last_error.
Regexbuddy is decoding the modifiers as part of the expression so Im thinking the expression itself has erroneous syntax.
I'd like to know what the # delimiters mean as there are various including / but cant find any reference elsewhere.
Alistair

Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of kaufmedkaufmed🇺🇸

@Alistair George

I would suggest opening a new thread  = )
PHP

PHP

--

Questions

--

Followers

Top Experts

PHP is a widely-used server-side scripting language especially suited for web development, powering tens of millions of sites from Facebook to personal WordPress blogs. PHP is often paired with the MySQL relational database, but includes support for most other mainstream databases. By utilizing different Server APIs, PHP can work on many different web servers as a server-side scripting language.