mcainc
asked on
Regular Expression to remove JavaScript / CSS from HTML source
So far I have a regex that I use to strip the HTML tags from a page however this doesnt work correctly with CSS and JavaScript...
Im looking for a regular expression to remove script (javascript, etc) and styles from the html source i have in a local string variable
examples of what i need to remove:
[style type="text/css"] blah [/style]
[style] blah [/style]
[script language="JavaScript"] blah [/script]
[script type="text/javascript"] blah [/script]
is this possible w/ regexp?
Im looking for a regular expression to remove script (javascript, etc) and styles from the html source i have in a local string variable
examples of what i need to remove:
[style type="text/css"] blah [/style]
[style] blah [/style]
[script language="JavaScript"] blah [/script]
[script type="text/javascript"] blah [/script]
is this possible w/ regexp?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
i didn't know i could post < > on here so i just used [ ] instead...
ASKER
hmm.. can you clean this up a bit with <> tags
"<style.*?</style>"
"<script.*?</script>
"<script.*?</script>
ASKER
hmm, this doesn't seem to work:
here is the function returning a string
Public Function RemoveStyleBlocks(ByVal strSource As String) As String
Return Regex.Replace(strSource, "<style.*?</style>", "")
End Function
i have a function that works for removing html tags for your reference, perhaps something else is required in your script/style regex?
Public Function RemoveHTMLTags(ByVal strSource As String) As String
Return Regex.Replace(strSource, "<[^>]*>", "")
End Function
here is the function returning a string
Public Function RemoveStyleBlocks(ByVal strSource As String) As String
Return Regex.Replace(strSource, "<style.*?</style>", "")
End Function
i have a function that works for removing html tags for your reference, perhaps something else is required in your script/style regex?
Public Function RemoveHTMLTags(ByVal strSource As String) As String
Return Regex.Replace(strSource, "<[^>]*>", "")
End Function
if strSource spans multiple lines
Regex.Replace(strSource,"< style.*?</ style>", "",RegexOptions.Singleline )
Regex.Replace(strSource,"<
ASKER
ah great, that appears to work perfectly... thank you!
ASKER