Link to home
Start Free TrialLog in
Avatar of bogmar
bogmar

asked on

remove attribute tag

Hi

I try to clean all the attributes at the tags "P" AND "DIV" except align=* <DIV>

Here is the code:
<?

/// THIS WORK
$data = '===<p er=gg a="b" align=center x="y" er=gg>===';
$data = preg_replace("/<(p|div)[^>]*( align=[^\s>]+)[^>]*>/i", "<$1$2>", $data );
echo "<textarea cols=80 rows=5>$data</textarea>\n";

/// THIS WORK
$data = '===<p er=gg a="b" align="center" x="y" er=gg>===';
$data = preg_replace("/<(p|div)[^>]*( align=[^\s>]+)[^>]*>/i", "<$1$2>", $data );
echo "<textarea cols=80 rows=5>$data</textarea>\n";

/// THIS DO NOT WORK
$data = '===<p er=gg a="b" x="y" er=gg>===';
$data = preg_replace("/<(p|div)[^>]*( align=[^\s>]+)[^>]*>/i", "<$1$2>", $data );
echo "<textarea cols=80 rows=5>$data</textarea>\n";


?>
</body>
</html>
Avatar of Umesh
Umesh
Flag of India image

Try this..

<?

/// THIS WORK
$data = '===<p er=gg a="b" align=center x="y" er=gg>===';
$data = preg_replace("/<(p|div)[^>]*( align=[^\s>]+)[^>]*>/i", "<$1$2>", $data );
echo "<textarea cols=80 rows=5>$data</textarea>\n";

/// THIS WORK
$data = '===<p er=gg a="b" align="center" x="y" er=gg>===';
$data = preg_replace("/<(p|div)[^>]*( align=[^\s>]+)[^>]*>/i", "<$1$2>", $data );
echo "<textarea cols=80 rows=5>$data</textarea>\n";

/// THIS DO NOT WORK
$data = '===<p er=gg a="b" x="y" er=gg>===';
$data = preg_replace("/<(p|div)[^>]*( align=[^\s>]+)*[^>]*>/i", "<$1$2>", $data );
echo "<textarea cols=80 rows=5>$data</textarea>\n";


?>
Earlier it was assuming that the attribute align will be there in the <P|Div tag.. I have suffixed the * to ( align=[^\s>]+)..


Hope this Helps!
Avatar of bogmar
bogmar

ASKER

Thanks but you cheated a little bit. The idea is to make it work for any $data variable.
This is a very difficult thing, to do as regex, because if you use
/<(p|div)[^>]*( align=[^\s>]+)?[^>]*>/i
for all data the [^>]* of the regex-machine eats all characters. It should stop if it find the keyword allign. Such things can be done with regular expressions and conditions, but haven't found out how excatly to use that
An easy way would be to spilt this up into two str-repaces. The one how you do it at the moment. The second and first to use, for d and div-tags which do not contain the align like:
/<(p|div)[^>]>/i
Avatar of bogmar

ASKER

let me know if you find a solution.
I will accept any working solution that will work for ANY $data variable.
Avatar of bogmar

ASKER

I am sorry hernst42 but I don’t think that I understood correctly.
Can you please provide the code?
untested:

$data = preg_replace("/<(p|div)[^>]*?( align=(['\"])\w+\\3)[^>]*>/i", "<$1$2>", $data);
oops, I tested it and it doesn't work, lemme keep working on it.
I thougth with two str_replaces like:

$regex = array("/<(p|div)([^>]*>/i", "/<(p|div)([^>]*)?( align=[^\s>]+)[^>]*>/iu");
$replace = array("<$1>", "<$1$2>");

$data = preg_replace($regex, $replace, $data );
echo "<textarea cols=80 rows=5>$data</textarea>\n";

could work, but it will also replace the align :-( So it seems to me that you have conditions in the regex or go a complete other way to remove all tags except the align
ASKER CERTIFIED SOLUTION
Avatar of Umesh
Umesh
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ok doing it via a regex is nearly impossible. So the following code will work, even if those settings are mixed up.

function sanitize($id, $args) {
    if (preg_match('/(align=[^\s>]*)/i', $args, $m)) {
        $m[1] = preg_replace('/\\\\\"/', '"', $m[1]);
        $m[1] = preg_replace('/\\\\/', '\\', $m[1]);
        return "<$id " . $m[1] . ">";
    }
    return "<$id>";
}

$data = '===<p er=gg a="b" align=center x="y" er=gg>===i ===<p er=gg a="b" x="y" er=gg>===';
$data = preg_replace("/<(p|div)([^>]*)>/ie", "sanitize('$1', '$2')", $data );
echo "<textarea cols=80 rows=5>$data</textarea>\n";