Link to home
Start Free TrialLog in
Avatar of mcainc
mcainc

asked on

Regular Expression to remove JavaScript / CSS from HTML source

So far I have a regex that I use to strip the HTML tags from a page however this doesnt work correctly with CSS and JavaScript...

Im looking for a regular expression to remove script (javascript, etc) and styles from the html source i have in a local string variable

examples of what i need to remove:

[style type="text/css"] blah [/style]
[style] blah [/style]
[script language="JavaScript"] blah [/script]
[script type="text/javascript"] blah [/script]

is this possible w/ regexp?
Avatar of mcainc
mcainc

ASKER

i'm using vb.net by the way (that is if there is a different method for doing this)
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mcainc

ASKER

i didn't know i could post < > on here so i just used [ ] instead...
Avatar of mcainc

ASKER

hmm.. can you clean this up a bit with <> tags
"<style.*?</style>"
"<script.*?</script>
Avatar of mcainc

ASKER

hmm, this doesn't seem to work:

here is the function returning a string

    Public Function RemoveStyleBlocks(ByVal strSource As String) As String
        Return Regex.Replace(strSource, "<style.*?</style>", "")
    End Function

i have a function that works for removing html tags for your reference, perhaps something else is required in your script/style regex?

    Public Function RemoveHTMLTags(ByVal strSource As String) As String
        Return Regex.Replace(strSource, "<[^>]*>", "")
    End Function
if strSource spans multiple lines
Regex.Replace(strSource,"<style.*?</style>", "",RegexOptions.Singleline)
Avatar of mcainc

ASKER

ah great, that appears to work perfectly... thank you!