Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

Troubleshooting
Research
Professional Opinions
Ask a Question
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

troubleshooting Question

Question about a array out of bounds exception in java

Avatar of errang
errangFlag for Afghanistan asked on
Programming Languages-OtherProgrammingJava
4 Comments1 Solution1038 ViewsLast Modified:
Hey,

       I got a question about an array out of bounds exception in Java.  I wrote an HTML parser in java, what it does is it takes in a text/html file and gets rid of the harmful script/javascript files.

This is the file.

<body>

<p> hi </p>

<script> this should not be here </script>

<p> bye </p>

</body>

This is what I get as output:

num = 13
0
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf320
1
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf320
2
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
3
{\colortbl;\red255\green255\blue255;}
4
\margl1440\margr1440\vieww9000\viewh8400\viewkind0
5
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural\pardirnatural
6

7
\f0\fs24 \cf0 <body>\
8
\
9
<p> hi </p>\
10
\
11
<script> this should not be here </script>\
12
\
13
<p> bye </p>\
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 13
      at html_parser.readFile(html_parser.java:58)
      at html_parser.main(html_parser.java:114)


My question is... I'm reading the number of lines in the file and allocating a string array to have that many lines, but for some reason that's not enough...

I don't know what's wrong with the program.

Appreciate any help on this.


import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Stack;

public class html_parser {

	public void readFile (String filename){
		try {
			BufferedReader in = new BufferedReader(new FileReader(filename));
			String line = in.readLine();
			line = line.trim();
			String line1;
			Boolean bool = false;
			String[] words = null;
			int num = 0;
			int i = 0;
			
			//Get the number of lines in the file.
			while(in.read() != -1){
				num++;
				line1 = in.readLine();
			}
			
			System.out.println("num = " + num);
			
			//num+num will fix this.
			words = new String[num];
			
			//Need to reset the file input after going through it to get the 
			//number of lines.
			in.close();
			in = new BufferedReader(new FileReader(filename));
			BufferedWriter out = new BufferedWriter(new FileWriter("/Users/sudhee1/Documents/stest.rtf"));
			
			while (line != null){
				//System.out.println("i got in here");
				
				System.out.println(i);
					System.out.println(line);
					
					if(line.contains("<script")){
						line = symbol_sanitizer(line);
						words[i] = line;
					}else if(line.contains("=\"")){
						line = symbol_sanitizer(line);
						words[i] = line;
					}else if(line.contains("alert(")){
						line = keyword_sanitizer(line);
						words[i] = line;
					}else if(line.contains("document")){
						line = keyword_sanitizer(line);
						words[i] = line;
					}else{
						words[i] = line;
					}
					
					i++;
					line = in.readLine();
				}
			
				//num+num will fix this.
				for(int x = 0; x < num; x ++){
					System.out.println("words = " + words[x]);
				//	System.out.println(x);
				}
				//System.out.println();
				out.write("\n");
				//System.out.println("Line: " + line);
				line = in.readLine();
				
				if (line != null)
					line = line.trim();
			
			in.close();
			out.close();
		} catch (FileNotFoundException e1) {
			e1.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public String symbol_sanitizer(String input){
		//System.out.println("I got: " + input);
		
		input = input.replaceAll("&", "&amp;");
		input = input.replaceAll("<", "&lt;");
		input = input.replaceAll(">", "&gt;");
		input = input.replaceAll("\"", "&quot;");
		input = input.replaceAll("'", "&#x27;");
		input = input.replaceAll("/", "&#x2F;");
	
		return input;
	}
	
	public String keyword_sanitizer(String input){
		
		input = input.replaceAll("alert(", "alert&#40;");
		input = input.replaceAll("(document.", "document&#46;");
		
		return input;
	}
	
	public void print(String input){
		System.out.println(input);
	}
	
	public static void main(String[] args) throws IOException {
		html_parser ss = new html_parser();
		ss.readFile("/Users/sudhee1/Documents/test.rtf");
		
		//System.out.println("=\"");
	
	}
}
ASKER CERTIFIED SOLUTION
Avatar of Mick Barry
Mick BarryFlag of Australia imageJava Developer
Commented:
This problem has been solved!
Unlock 1 Answer and 4 Comments.
See Answers