Solved

How to remove Script tags from HTML

Posted on 2004-10-12
4
577 Views
Last Modified: 2008-01-09
I want to remove all <script> tags with thier contents from within an HTML document.
I tried to replace it with blank space but following exception occurrs.
java.util.regex.PatternSyntaxException: Illegal repetition near index 86

can any one help ?
here is the code.


  private String removeTagsWithContents(String tagName, String data)
  {
    String cleanData = "";
    boolean hasMoreTags;
   
    if(data.indexOf(tagName) > 0)
      hasMoreTags = true;
    else
      hasMoreTags = false;
     
    while(hasMoreTags)
    {
      System.out.println(data);
      String strFound = data.substring(data.indexOf("<" + tagName ), data.indexOf("</" + tagName + ">") + (3 + tagName.length()));
      System.out.println(strFound);
      strFound = "";
      System.out.println(data.indexOf(strFound));
      data = data.replaceAll(strFound, " ");
      System.out.println(data);

      if(data.indexOf(tagName) > 0)
        hasMoreTags = true;
      else
        hasMoreTags = false;    
    }

    return cleanData;
  }
0
Comment
Question by:Naeemg
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 37

Accepted Solution

by:
zzynx earned 50 total points
ID: 12295360
So you want every occurrence of "<script>blah blah blah </script>" to be removed. Right?

System.out.println("abc <script>sf fgk,#@qsdf qdfg</script> def".replaceAll( "<script>([\\W\\w\\s])*</script>", "") );

So:

 private String removeTagsWithContents(String tagName, String data) {
     String regExp = "<" + tagName + ">([\\W\\w\\s])*</" + tagName + ">";
     return data.replaceAll(regExp, "");
 }
0
 
LVL 7

Assisted Solution

by:tomboshell
tomboshell earned 50 total points
ID: 12295377
Here you assign the discovered string to strFound >>    String strFound = data.substring(data.indexOf("<" + tagName ), data.indexOf("</" + tagName + ">") + (3 + tagName.length()));
   
Here the discovered string is set to an empty string. The beginnining of the problem!>>      strFound = "";
   
Here is the problem. You are saying to replace an empty string with a blank space      data = data.replaceAll(strFound, " ");

You should either not set the discovered string to an empty string, or do the removal before that.  Best is to not reassign
0

Featured Post

Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
Suggested Courses

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question