I'm trying to make a regex that removes in-line styles from HTML. I.e. <div style="color: red"> becomes <div>.
I also want to make sure that this matches any kind of messed-up HTML it receives, like whitespace in the middle of the thing, single quotes or double quotes, uppercase STYLE or lowercase, or something like sTyLe or whatnot. It needs to be fool-proof.
Here's what I've got:
The i takes care of the case, so we can start with /style/i
Then, I need to make sure we match whitespace between style and the = sign. So I add \s*.
Then I match the = sign.
Then I match more whitespace with \s*
Then I need to match the first quote, so I've used ('|") to match single or double quotes.
Then, we match anything, for which I use .*
Then we match the closing quote with another ('|").
Ok. So the problem here is the "anything" part. When this runs, it not only matches
But it also matches
style="color: red">Hey guys! what'
So, it can grab HTML that it's not supposed to.
But, I can't just say stop at the first quote, because you can have valid in-lie styles that contain quotes. Like:
style="background: url('mygraphic.jpg') top left no-repeat);"
So, this one has a ' in it. So it would match.