Solved

Regular Expression to remove JavaScript / CSS from HTML source

Posted on 2006-10-29
8
405 Views
Last Modified: 2013-11-19
So far I have a regex that I use to strip the HTML tags from a page however this doesnt work correctly with CSS and JavaScript...

Im looking for a regular expression to remove script (javascript, etc) and styles from the html source i have in a local string variable

examples of what i need to remove:

[style type="text/css"] blah [/style]
[style] blah [/style]
[script language="JavaScript"] blah [/script]
[script type="text/javascript"] blah [/script]

is this possible w/ regexp?
0
Comment
Question by:mcainc
  • 5
  • 3
8 Comments
 

Author Comment

by:mcainc
ID: 17832127
i'm using vb.net by the way (that is if there is a different method for doing this)
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 17832137
"\\[style.*?\\]/style\\]"
"\\[script.*?\\[/script\\]"
but are you sure that your tags use [] and not <>?
0
 

Author Comment

by:mcainc
ID: 17832175
i didn't know i could post < > on here so i just used [ ] instead...
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 

Author Comment

by:mcainc
ID: 17832182
hmm.. can you clean this up a bit with <> tags
0
 
LVL 84

Expert Comment

by:ozo
ID: 17832199
"<style.*?</style>"
"<script.*?</script>
0
 

Author Comment

by:mcainc
ID: 17832235
hmm, this doesn't seem to work:

here is the function returning a string

    Public Function RemoveStyleBlocks(ByVal strSource As String) As String
        Return Regex.Replace(strSource, "<style.*?</style>", "")
    End Function

i have a function that works for removing html tags for your reference, perhaps something else is required in your script/style regex?

    Public Function RemoveHTMLTags(ByVal strSource As String) As String
        Return Regex.Replace(strSource, "<[^>]*>", "")
    End Function
0
 
LVL 84

Expert Comment

by:ozo
ID: 17832263
if strSource spans multiple lines
Regex.Replace(strSource,"<style.*?</style>", "",RegexOptions.Singleline)
0
 

Author Comment

by:mcainc
ID: 17832270
ah great, that appears to work perfectly... thank you!
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
C# Error - Add Failed 12 87
t-sql split string into multiple rows 7 84
Regular expression help 2 25
What does != "" mean in programming 8 28
Preface This is the third article about the EE Collaborative Login Project. A Better Website Login System (http://www.experts-exchange.com/A_2902.html) introduces the Login System and shows how to implement a login page. The EE Collaborative Logi…
JavaScript has plenty of pieces of code people often just copy/paste from somewhere but never quite fully understand. Self-Executing functions are just one good example that I'll try to demystify here.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question