Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

How to remove Script tags from HTML

Posted on 2004-10-12
4
Medium Priority
?
594 Views
Last Modified: 2008-01-09
I want to remove all <script> tags with thier contents from within an HTML document.
I tried to replace it with blank space but following exception occurrs.
java.util.regex.PatternSyntaxException: Illegal repetition near index 86

can any one help ?
here is the code.


  private String removeTagsWithContents(String tagName, String data)
  {
    String cleanData = "";
    boolean hasMoreTags;
   
    if(data.indexOf(tagName) > 0)
      hasMoreTags = true;
    else
      hasMoreTags = false;
     
    while(hasMoreTags)
    {
      System.out.println(data);
      String strFound = data.substring(data.indexOf("<" + tagName ), data.indexOf("</" + tagName + ">") + (3 + tagName.length()));
      System.out.println(strFound);
      strFound = "";
      System.out.println(data.indexOf(strFound));
      data = data.replaceAll(strFound, " ");
      System.out.println(data);

      if(data.indexOf(tagName) > 0)
        hasMoreTags = true;
      else
        hasMoreTags = false;    
    }

    return cleanData;
  }
0
Comment
Question by:Naeemg
4 Comments
 
LVL 37

Accepted Solution

by:
zzynx earned 200 total points
ID: 12295360
So you want every occurrence of "<script>blah blah blah </script>" to be removed. Right?

System.out.println("abc <script>sf fgk,#@qsdf qdfg</script> def".replaceAll( "<script>([\\W\\w\\s])*</script>", "") );

So:

 private String removeTagsWithContents(String tagName, String data) {
     String regExp = "<" + tagName + ">([\\W\\w\\s])*</" + tagName + ">";
     return data.replaceAll(regExp, "");
 }
0
 
LVL 7

Assisted Solution

by:tomboshell
tomboshell earned 200 total points
ID: 12295377
Here you assign the discovered string to strFound >>    String strFound = data.substring(data.indexOf("<" + tagName ), data.indexOf("</" + tagName + ">") + (3 + tagName.length()));
   
Here the discovered string is set to an empty string. The beginnining of the problem!>>      strFound = "";
   
Here is the problem. You are saying to replace an empty string with a blank space      data = data.replaceAll(strFound, " ");

You should either not set the discovered string to an empty string, or do the removal before that.  Best is to not reassign
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question