[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 351
  • Last Modified:

What is the best way to implement a stopword list in Java

I am converting some code written in PHP to Java (JSP, actually)  I am relatively new to Java/JSP though I've done some conversion work between c++/Java in the past.

The PHP contains code that contains a large associative array where the keys in the array represent a list of words that are not to be allowed as data in the system and the data values associated with those keys are merely null values.  

For example, numbers spelled out are not allowed and would be represented as:

      $numbers = array('one'=>'', 'two'=>'', 'three'=>'', 'four'=>'', 'five'=>'', 'six'=>'', 'seven'=>'', 'eight'=>'', 'nine'=>'', 'ten'=>'');

This allows you to do the simple check:

      $wordToCheck = $word[$x};
      if (isset($numbers[$wordToCheck])) { // throw it out}

Question 1: I'm wondering what the most efficient way of dealing with this in Java would be.  Should I use a HashMap or is there some sort of other simple structure to use?  

Question 2: What is the most efficient way to initialize the structure?  There are currently 1000s of values in the table.  It is rather large and cumbersome.

Question 3: Currently, this structure is stored in an external include file, as it is very large and shared by various applications.  What is the easiest way to implement this in Java?
0
kirin0
Asked:
kirin0
  • 4
  • 2
  • 2
  • +1
1 Solution
 
rrzCommented:
>Question 3: Currently, this structure is stored in an external include file  
Could you show us part of that file ?
0
 
rrzCommented:
>the data values associated with those keys are merely null values.  
>Question 1: I'm wondering what the most efficient way of dealing with this in Java would be.  Should I use a HashMap or is there some sort of other simple structure to use?    
Why not use a simple String array ?  
String[] numbers = {one, two, three, four, five, six , seven, eight, nine, ten};
0
 
objectsCommented:
Sounds like a Set (eg. HashSet) would be the best fit
Use a Scanner to read the file
http://helpdesk.objects.com.au/java/using-scanner-to-read-words-from-text-file
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
kirin0Author Commented:
Hey rrz,

In answer to your first question, the include file just looks like the "$numbers" array that I included in the question.  It actually consists of many such arrays that are all combined into one, but the concept is the same.  

As for using the string array, how would I determine if a given value exists?  Without scanning the entire array over and over that is.  I need to be able to easily determine if "five" is in the numbers array.
0
 
for_yanCommented:
You store them in ArrayList
there is simple method

conatins(item) which returns boolean true if your arraylist contains
the string in the argument. Very conveneint.

If you don't care of te order you may probably use TreeSet,
but ArrayList will be good enaough
0
 
for_yanCommented:
ArrayList al = new ArayList();

as you parse your file you'll add the string to ArrayList:

al.add("one");
al.add("two");
...

when you encounter some test word later, you just check

String s=...you get your test string

if(al.contains(s)){
...

Of course behind the scene it will still have
to compare each item there, but in programming
it is very convenient and works very quickly even fro rather long ArrayLists
0
 
rrzCommented:
Either HashSet or ArrayList will work.
A Set doesn't allow duplicate elements though.  
0
 
objectsCommented:
An ArrayList will be slow
Generally there aren't duplicates in stop words
0
 
rrzCommented:
This JSP demonstrates objects suggestions. I have named your file "numbers.txt". Put it into your web app's root folder.
<%@ page import="java.io.*,java.util.*" %>
<%
   File file = new File(application.getRealPath("/numbers.txt"));
   HashSet<String> numbers = new HashSet<String>();
   Scanner input = new Scanner(file);
   while(input.hasNext()) {
	String word = input.next();
        word = word.replaceAll("array","");
        word = word.replaceAll("[^\\w]","");
        numbers.add(word);
   }
%>
The HashSet contains <%=numbers%>

Open in new window

0
 
kirin0Author Commented:
Rockin'  Thanks for the straight-forward lead.  I was able to code it up in about 10 minutes.
0

Featured Post

Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

  • 4
  • 2
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now