Regex to match certain HTML attributes
Posted on 2006-11-27
Regexes are sometimes quite challenging. I've been banging my head on this table for hours now and want to stop. Please help me get rid of this headache!
I need to remove all style and class attributes in an HTML file whilst leaving all other attributes untouched. I just need the regex for this - I've written a generic filter that uses the Regex, but I just can't seem to get this one to work (I'm failing to get the regex to ignore other attributes between the tag and the style=...).
Given the following HTML (which came from pasting from the trully awful MS Werd - I really couldn't invent this rubbish if I tried!):
<H1 style="MARGIN: 0cm 0cm 0pt"><FONT color=#000000>blah blah<SPAN style="mso-spacerun: yes"> </SPAN></font></H1>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt; TEXT-ALIGN: justify"><?xml:namespace prefix = st1 ns = "urn:schemas-microsoft-com:office:smarttags" /><st1:PlaceName w:st="on"><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">blah blah</SPAN></st1:PlaceName><SPAN style="FONT-SIZE: 10pt; COLOR: #ff9900; FONT-FAMILY: 'Century Gothic'">
I need just the Regex and the Replacement strings. It should:
- remove (match) style and class attributes
- work with and without quotes - note that 'Century Gothic' is wrapped with single quotes
- assume the attribute quotes are "double" (or missing)
- the attributes must be allowed to be in *any* order in the tag
- all other attributes and tags must be left in situ
I've other regexes that clean the rest of the vomit - at least ten of them!
For a bonus, if anyone has the name of the idiot who created the Werd HTML engine..... I'd just love to write to his/her mother and tell her how her child is messing with people's heads:-)