?
Solved

Regular Expression to remove JavaScript / CSS from HTML source

Posted on 2006-10-29
8
Medium Priority
?
411 Views
Last Modified: 2013-11-19
So far I have a regex that I use to strip the HTML tags from a page however this doesnt work correctly with CSS and JavaScript...

Im looking for a regular expression to remove script (javascript, etc) and styles from the html source i have in a local string variable

examples of what i need to remove:

[style type="text/css"] blah [/style]
[style] blah [/style]
[script language="JavaScript"] blah [/script]
[script type="text/javascript"] blah [/script]

is this possible w/ regexp?
0
Comment
Question by:mcainc
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 

Author Comment

by:mcainc
ID: 17832127
i'm using vb.net by the way (that is if there is a different method for doing this)
0
 
LVL 84

Accepted Solution

by:
ozo earned 2000 total points
ID: 17832137
"\\[style.*?\\]/style\\]"
"\\[script.*?\\[/script\\]"
but are you sure that your tags use [] and not <>?
0
 

Author Comment

by:mcainc
ID: 17832175
i didn't know i could post < > on here so i just used [ ] instead...
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:mcainc
ID: 17832182
hmm.. can you clean this up a bit with <> tags
0
 
LVL 84

Expert Comment

by:ozo
ID: 17832199
"<style.*?</style>"
"<script.*?</script>
0
 

Author Comment

by:mcainc
ID: 17832235
hmm, this doesn't seem to work:

here is the function returning a string

    Public Function RemoveStyleBlocks(ByVal strSource As String) As String
        Return Regex.Replace(strSource, "<style.*?</style>", "")
    End Function

i have a function that works for removing html tags for your reference, perhaps something else is required in your script/style regex?

    Public Function RemoveHTMLTags(ByVal strSource As String) As String
        Return Regex.Replace(strSource, "<[^>]*>", "")
    End Function
0
 
LVL 84

Expert Comment

by:ozo
ID: 17832263
if strSource spans multiple lines
Regex.Replace(strSource,"<style.*?</style>", "",RegexOptions.Singleline)
0
 

Author Comment

by:mcainc
ID: 17832270
ah great, that appears to work perfectly... thank you!
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

JavaScript has plenty of pieces of code people often just copy/paste from somewhere but never quite fully understand. Self-Executing functions are just one good example that I'll try to demystify here.
The SignAloud Glove is capable of translating American Sign Language signs into text and audio.
The viewer will learn the benefit of using external CSS files and the relationship between class and ID selectors. Create your external css file by saving it as style.css then set up your style tags: (CODE) Reference the nav tag and set your prop…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question