Solved

How to remove Script tags from HTML

Posted on 2004-10-12
4
549 Views
Last Modified: 2008-01-09
I want to remove all <script> tags with thier contents from within an HTML document.
I tried to replace it with blank space but following exception occurrs.
java.util.regex.PatternSyntaxException: Illegal repetition near index 86

can any one help ?
here is the code.


  private String removeTagsWithContents(String tagName, String data)
  {
    String cleanData = "";
    boolean hasMoreTags;
   
    if(data.indexOf(tagName) > 0)
      hasMoreTags = true;
    else
      hasMoreTags = false;
     
    while(hasMoreTags)
    {
      System.out.println(data);
      String strFound = data.substring(data.indexOf("<" + tagName ), data.indexOf("</" + tagName + ">") + (3 + tagName.length()));
      System.out.println(strFound);
      strFound = "";
      System.out.println(data.indexOf(strFound));
      data = data.replaceAll(strFound, " ");
      System.out.println(data);

      if(data.indexOf(tagName) > 0)
        hasMoreTags = true;
      else
        hasMoreTags = false;    
    }

    return cleanData;
  }
0
Comment
Question by:Naeemg
4 Comments
 
LVL 37

Accepted Solution

by:
zzynx earned 50 total points
ID: 12295360
So you want every occurrence of "<script>blah blah blah </script>" to be removed. Right?

System.out.println("abc <script>sf fgk,#@qsdf qdfg</script> def".replaceAll( "<script>([\\W\\w\\s])*</script>", "") );

So:

 private String removeTagsWithContents(String tagName, String data) {
     String regExp = "<" + tagName + ">([\\W\\w\\s])*</" + tagName + ">";
     return data.replaceAll(regExp, "");
 }
0
 
LVL 7

Assisted Solution

by:tomboshell
tomboshell earned 50 total points
ID: 12295377
Here you assign the discovered string to strFound >>    String strFound = data.substring(data.indexOf("<" + tagName ), data.indexOf("</" + tagName + ">") + (3 + tagName.length()));
   
Here the discovered string is set to an empty string. The beginnining of the problem!>>      strFound = "";
   
Here is the problem. You are saying to replace an empty string with a blank space      data = data.replaceAll(strFound, " ");

You should either not set the discovered string to an empty string, or do the removal before that.  Best is to not reassign
0

Featured Post

Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Groovy problem when using SOAPUI : DispatchException occurred 7 46
type mismatch (Object[] to double[] 4 24
hibernate example using maven 12 42
String array comparison 4 34
INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Java contains several comparison operators (e.g., <, <=, >, >=, ==, !=) that allow you to compare primitive values. However, these operators cannot be used to compare the contents of objects. Interface Comparable is used to allow objects of a cl…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question