CAS-IT
asked on
RegEx to match style="something" ??
Hi guys,
I'm trying to make a regex that removes in-line styles from HTML. I.e. <div style="color: red"> becomes <div>.
I also want to make sure that this matches any kind of messed-up HTML it receives, like whitespace in the middle of the thing, single quotes or double quotes, uppercase STYLE or lowercase, or something like sTyLe or whatnot. It needs to be fool-proof.
Here's what I've got:
/style\s*=\s*('|").*('|")/ gi
The i takes care of the case, so we can start with /style/i
Then, I need to make sure we match whitespace between style and the = sign. So I add \s*.
Then I match the = sign.
Then I match more whitespace with \s*
Then I need to match the first quote, so I've used ('|") to match single or double quotes.
Then, we match anything, for which I use .*
Then we match the closing quote with another ('|").
Ok. So the problem here is the "anything" part. When this runs, it not only matches
style="color: red"
But it also matches
style="color: red">Hey guys! what'
So, it can grab HTML that it's not supposed to.
But, I can't just say stop at the first quote, because you can have valid in-lie styles that contain quotes. Like:
style="background: url('mygraphic.jpg') top left no-repeat);"
So, this one has a ' in it. So it would match.
I'm stuck!!
I'm trying to make a regex that removes in-line styles from HTML. I.e. <div style="color: red"> becomes <div>.
I also want to make sure that this matches any kind of messed-up HTML it receives, like whitespace in the middle of the thing, single quotes or double quotes, uppercase STYLE or lowercase, or something like sTyLe or whatnot. It needs to be fool-proof.
Here's what I've got:
/style\s*=\s*('|").*('|")/
The i takes care of the case, so we can start with /style/i
Then, I need to make sure we match whitespace between style and the = sign. So I add \s*.
Then I match the = sign.
Then I match more whitespace with \s*
Then I need to match the first quote, so I've used ('|") to match single or double quotes.
Then, we match anything, for which I use .*
Then we match the closing quote with another ('|").
Ok. So the problem here is the "anything" part. When this runs, it not only matches
style="color: red"
But it also matches
style="color: red">Hey guys! what'
So, it can grab HTML that it's not supposed to.
But, I can't just say stop at the first quote, because you can have valid in-lie styles that contain quotes. Like:
style="background: url('mygraphic.jpg') top left no-repeat);"
So, this one has a ' in it. So it would match.
I'm stuck!!
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Accepted my own post because I ended up using a combination of the two RegExs presented.
The problem with using dot-star (or even dot-star-question) is that if you don't have a closing quotation mark, then your pattern is going to consume up to the next available quotation mark. If your HTML is properly structured, then this should be inconsequential; if it's not, then you are going to have problems in your replacement.
ASKER
You're right.
Hm...
I dunno. I just need to trust it. If I run into that problem I'll open another question!
Hm...
I dunno. I just need to trust it. If I run into that problem I'll open another question!
Open in new window