Question about a array out of bounds exception in java

errang
errang used Ask the Experts™
on
Hey,

       I got a question about an array out of bounds exception in Java.  I wrote an HTML parser in java, what it does is it takes in a text/html file and gets rid of the harmful script/javascript files.

This is the file.

<body>

<p> hi </p>

<script> this should not be here </script>

<p> bye </p>

</body>

This is what I get as output:

num = 13
0
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf320
1
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf320
2
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
3
{\colortbl;\red255\green255\blue255;}
4
\margl1440\margr1440\vieww9000\viewh8400\viewkind0
5
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\ql\qnatural\pardirnatural
6

7
\f0\fs24 \cf0 <body>\
8
\
9
<p> hi </p>\
10
\
11
<script> this should not be here </script>\
12
\
13
<p> bye </p>\
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 13
      at html_parser.readFile(html_parser.java:58)
      at html_parser.main(html_parser.java:114)


My question is... I'm reading the number of lines in the file and allocating a string array to have that many lines, but for some reason that's not enough...

I don't know what's wrong with the program.

Appreciate any help on this.


import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Stack;

public class html_parser {

	public void readFile (String filename){
		try {
			BufferedReader in = new BufferedReader(new FileReader(filename));
			String line = in.readLine();
			line = line.trim();
			String line1;
			Boolean bool = false;
			String[] words = null;
			int num = 0;
			int i = 0;
			
			//Get the number of lines in the file.
			while(in.read() != -1){
				num++;
				line1 = in.readLine();
			}
			
			System.out.println("num = " + num);
			
			//num+num will fix this.
			words = new String[num];
			
			//Need to reset the file input after going through it to get the 
			//number of lines.
			in.close();
			in = new BufferedReader(new FileReader(filename));
			BufferedWriter out = new BufferedWriter(new FileWriter("/Users/sudhee1/Documents/stest.rtf"));
			
			while (line != null){
				//System.out.println("i got in here");
				
				System.out.println(i);
					System.out.println(line);
					
					if(line.contains("<script")){
						line = symbol_sanitizer(line);
						words[i] = line;
					}else if(line.contains("=\"")){
						line = symbol_sanitizer(line);
						words[i] = line;
					}else if(line.contains("alert(")){
						line = keyword_sanitizer(line);
						words[i] = line;
					}else if(line.contains("document")){
						line = keyword_sanitizer(line);
						words[i] = line;
					}else{
						words[i] = line;
					}
					
					i++;
					line = in.readLine();
				}
			
				//num+num will fix this.
				for(int x = 0; x < num; x ++){
					System.out.println("words = " + words[x]);
				//	System.out.println(x);
				}
				//System.out.println();
				out.write("\n");
				//System.out.println("Line: " + line);
				line = in.readLine();
				
				if (line != null)
					line = line.trim();
			
			in.close();
			out.close();
		} catch (FileNotFoundException e1) {
			e1.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}
	
	public String symbol_sanitizer(String input){
		//System.out.println("I got: " + input);
		
		input = input.replaceAll("&", "&amp;");
		input = input.replaceAll("<", "&lt;");
		input = input.replaceAll(">", "&gt;");
		input = input.replaceAll("\"", "&quot;");
		input = input.replaceAll("'", "&#x27;");
		input = input.replaceAll("/", "&#x2F;");
	
		return input;
	}
	
	public String keyword_sanitizer(String input){
		
		input = input.replaceAll("alert(", "alert&#40;");
		input = input.replaceAll("(document.", "document&#46;");
		
		return input;
	}
	
	public void print(String input){
		System.out.println(input);
	}
	
	public static void main(String[] args) throws IOException {
		html_parser ss = new html_parser();
		ss.readFile("/Users/sudhee1/Documents/test.rtf");
		
		//System.out.println("=\"");
	
	}
}

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Author

Commented:
Oh... I'm also using a Mac by the way.... in case anyone is wondering why the file directory is different.
Java Developer
Top Expert 2010
Commented:
                 while(in.read() != -1){
                        num++;
                        line1 = in.readLine();
                  }

thats not going to count the number of lines

while (in.readLine()!=null) num++;

though better to use a list instead of an array, that way you don't need to know how many lines
mccarlIT Business Systems Analyst / Software Developer
Top Expert 2015
Commented:
Rather than using an array for "words", try using an ArrayList instead. This allows you to add() any number of lines as you go. This will mean that you can get rid of the code to determine the number of lines in the file (which it looks like where the problem is, it is very hard to follow what you have done there) which will also mean that you only have to read through the file once (not twice). Afterwards, you can convert the ArrayList to a normal array if you need, or you can just use the ArrayList as is.

Author

Commented:
Thanks.

I tried to accept the best solution, not close it... not sure why that button keeps disappearing.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial