We help IT Professionals succeed at work.

RegEx to match style="something" ??

CAS-IT
CAS-IT asked
on
Hi guys,

I'm trying to make a regex that removes in-line styles from HTML. I.e. <div style="color: red"> becomes <div>.

I also want to make sure that this matches any kind of messed-up HTML it receives, like whitespace in the middle of the thing, single quotes or double quotes, uppercase STYLE or lowercase, or something like sTyLe or whatnot. It needs to be fool-proof.

Here's what I've got:

/style\s*=\s*('|").*('|")/gi

The i takes care of the case, so we can start with /style/i

Then, I need to make sure we match whitespace between style and the = sign. So I add \s*.

Then I match the = sign.

Then I match more whitespace with \s*

Then I need to match the first quote, so I've used ('|") to match single or double quotes.

Then, we match anything, for which I use .*

Then we match the closing quote with another ('|").

Ok. So the problem here is the "anything" part. When this runs, it not only matches

style="color: red"

But it also matches

style="color: red">Hey guys! what'

So, it can grab HTML that it's not supposed to.

But, I can't just say stop at the first quote, because you can have valid in-lie styles that contain quotes. Like:

style="background: url('mygraphic.jpg') top left no-repeat);"

So, this one has a ' in it. So it would match.

I'm stuck!!

Comment
Watch Question

Freelancer
CERTIFIED EXPERT
Top Expert 2010
Commented:
Try this:

\s*style[^\<]+(?=>)

Within RegexBuddy it seems to work fine :-)

Cheers
CERTIFIED EXPERT
Commented:
str.replace("(div [^>]*)>" , ">$1")
CERTIFIED EXPERT

Commented:
This appears to be working fine


str.replace("(div) ([^>]*)>","$1>$2")

Open in new window

Commented:
Got it!

I was working through these. marqusG's kind of worked. farzan - yours didn't work, sorry :(

What turned me on though, was the ? in marqus's post. I looked that up, and it says it forces minimal matching, which is EXACTLY what I needed. Mine worked, it was just going and matching too much. All I needed was that ? to force it to minimal. So, here is the final product:

theString.replace(/\s*style\s*=\s*('|").*?('|")/gi,"");

The RegEx is just:

\s*style\s*=\s*('|").*?('|")/gi

Thanks guys!

Author

Commented:
Accepted my own post because I ended up using a combination of the two RegExs presented.
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2015

Commented:
The problem with using dot-star (or even dot-star-question) is that if you don't have a closing quotation mark, then your pattern is going to consume up to the next available quotation mark. If your HTML is properly structured, then this should be inconsequential; if it's not, then you are going to have problems in your replacement.

Author

Commented:
You're right.

Hm...

I dunno. I just need to trust it. If I run into that problem I'll open another question!

Explore More ContentExplore courses, solutions, and other research materials related to this topic.