Solved

How to remove Script tags from HTML

Posted on 2004-10-12
4
546 Views
Last Modified: 2008-01-09
I want to remove all <script> tags with thier contents from within an HTML document.
I tried to replace it with blank space but following exception occurrs.
java.util.regex.PatternSyntaxException: Illegal repetition near index 86

can any one help ?
here is the code.


  private String removeTagsWithContents(String tagName, String data)
  {
    String cleanData = "";
    boolean hasMoreTags;
   
    if(data.indexOf(tagName) > 0)
      hasMoreTags = true;
    else
      hasMoreTags = false;
     
    while(hasMoreTags)
    {
      System.out.println(data);
      String strFound = data.substring(data.indexOf("<" + tagName ), data.indexOf("</" + tagName + ">") + (3 + tagName.length()));
      System.out.println(strFound);
      strFound = "";
      System.out.println(data.indexOf(strFound));
      data = data.replaceAll(strFound, " ");
      System.out.println(data);

      if(data.indexOf(tagName) > 0)
        hasMoreTags = true;
      else
        hasMoreTags = false;    
    }

    return cleanData;
  }
0
Comment
Question by:Naeemg
4 Comments
 
LVL 37

Accepted Solution

by:
zzynx earned 50 total points
ID: 12295360
So you want every occurrence of "<script>blah blah blah </script>" to be removed. Right?

System.out.println("abc <script>sf fgk,#@qsdf qdfg</script> def".replaceAll( "<script>([\\W\\w\\s])*</script>", "") );

So:

 private String removeTagsWithContents(String tagName, String data) {
     String regExp = "<" + tagName + ">([\\W\\w\\s])*</" + tagName + ">";
     return data.replaceAll(regExp, "");
 }
0
 
LVL 7

Assisted Solution

by:tomboshell
tomboshell earned 50 total points
ID: 12295377
Here you assign the discovered string to strFound >>    String strFound = data.substring(data.indexOf("<" + tagName ), data.indexOf("</" + tagName + ">") + (3 + tagName.length()));
   
Here the discovered string is set to an empty string. The beginnining of the problem!>>      strFound = "";
   
Here is the problem. You are saying to replace an empty string with a blank space      data = data.replaceAll(strFound, " ");

You should either not set the discovered string to an empty string, or do the removal before that.  Best is to not reassign
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Viewers learn about the “while” loop and how to utilize it correctly in Java. Additionally, viewers begin exploring how to include conditional statements within a while loop and avoid an endless loop. Define While Loop: Basic Example: Explanatio…
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now