Link to home
Start Free TrialLog in
Avatar of adura
adura

asked on

SEARCHING WORDS IN A DICTIONARY

I have a 'dictionary.dat' file and I want to search for words from it. This is a scrabble game, by the way. How do I go about this?
Do I have to load all the words from the dictionary into a java data structure and then search through it? And if I have to, how do I go about doing this?
Thank You.
Avatar of antonsigur
antonsigur
Flag of Iceland image

This puts the "found" to true if the "stringToFind" is found in the file...

       boolean found = false; //Tell me if record is found
       String stringToFind = "someWord"; //The word you are serching for
        File f = new File("dictionary.dat");  //Open the file namde dictionary.dat
       FileInputStream fis = null;
       try {
           fis = new FileInputStream(f);
       } catch (FileNotFoundException e) {
           //Error handling if file not found
       }
       BufferedInputStream bis = new BufferedInputStream(fis);  
        DataInputStream dis = new DataInputStream(bis);  
        String record = null;  
        try {

           
       while ( (record=dis.readLine()) != null ) {  //Read all the file
            if (stringToFind.equalsIgnoreCase(record.trim()))
               found = true; // The record was found    
       }  
          dis.close(); //close the stream....
   
    } catch (IOException e) {//error handling}  
You should add a "break;" in the if-statement, so you won't need to go trhough all the file...

If you are reading the file often, you should read it all in a String array and search it (it's faster) but that should also depend on the dictionary size....
Avatar of adura
adura

ASKER

so do i have to have a new class for this? or this can be done in the class?
Avatar of adura

ASKER

5,000 words?
You should put it @-least in a special function.... but a class would be more flexible, for later implemention changes...

public class Search { .... }


----- The other class

Search search = new Search("dictionary.dat");

.....

if ( search.forWord("someWord") ) {
  //we found the word!!
}
5000 words is not that much, you can read it into memory if you want. You could sort the String array, and that would provide you much faster search, if you would like to implement that... ( but that way you need to code much more than I was showing you... )
Avatar of adura

ASKER

thank you, that was just an estimate of the words, i probably have tens of thousands.
Thanks a lot.
Is this working for you?
Avatar of adura

ASKER

thank you, that was just an estimate of the words, i probably have tens of thousands.
Thanks a lot.
Avatar of adura

ASKER

i am trying it out at the moment.
Avatar of adura

ASKER

what's the filenotfound exception?
If it don't find the file (you give a wrong filename)
Avatar of adura

ASKER

there's something about deprecation i.e. using a BufferedReader so I am about to try that out. But apart from that it looks okay, although, it accepted a word that was not in the dictionary "yu" LOL
Avatar of adura

ASKER

looking at the BufferedInputStream in your code, that should be okay though, but I'll try
Avatar of adura

ASKER

Thank you so so so so much, I used a BufferedReader and it is working perfectly, thank you so much, that's the last bit of my project due on Friday. You've saved me.
ASKER CERTIFIED SOLUTION
Avatar of antonsigur
antonsigur
Flag of Iceland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of adura

ASKER

i did exactly this so thanks a lot
You're probably going to need to do this lookup more than once, so it makes sense to read all the data into memory just once. (Unless the dataset is huge or you're running on a very weedy computer you'll be OK.)

So use the file reading code given elsewhere here, and then just store the data into a HashMap. This internally uses a Hash table (dig out the computer science books for more), which essentially does a calculation on a value to enable it to be retrieved very efficiently.

You may wish to convert the words to upper case first (using the stringVariable.toUpperCase() method. This will avoid case mismatch problems.

You can then use hashmapObject.containsKey("TEST") to find the word.
Try this one:

This uses binary search to look for the word. Assumed that file will have each word on one line.

import java.io.*;
import java.util.*;

public class WordSearch {


    public static void main(String args[]) throws IOException {

         ArrayList words = loadFile("dictionary.dat"); // Load the word in arrayList

         Collections.sort(words); //sort ArrayList required for binarySearch

         BufferedReader stdin = new BufferedReader(new InputStreamReader(System.in));
         while (true) {
            System.out.print("Enter Word (q to quit) ");
            System.out.print("\n");
            String option = stdin.readLine();

            if (option.equalsIgnoreCase("q")) System.exit(-1);
            else {

               if (Arrays.binarySearch(words.toArray(),option.toLowerCase())== -1)
                   System.out.println("Word is not found in dictionary.");
               else
                   System.out.println("Word Found");
            }

         }

    }

   private static ArrayList loadFile(String fileName) throws IOException{
        String word;

        File file  = new File(fileName);
        BufferedReader bfreader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        ArrayList wordList = new ArrayList();
        while ((word=bfreader.readLine()) != null){
         wordList.add(word.toLowerCase());

        }

        return wordList;

   }
}
adura:
This old question needs to be finalized -- accept an answer, split points, or get a refund.  For information on your options, please click here-> http:/help/closing.jsp#1 
EXPERTS:
Post your closing recommendations!  No comment means you don't care.
Avatar of girionis

No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

- Points to antonsigur

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

girionis
EE Cleanup Volunteer
I have picked up the code submitted by allahabad.
Although I'm probably trying to do something completely different, this topic seems very compatible with the solution that i'm seeking.

I'm taking a character stream say ABB and creating an array of all words begining with it, then getting another character, "R", say, and seeing which words in my array contain ABBR and basically trying to get the largest word that fits.
(in this case i'd eventually end up with a choice of abbreviate, abbreviated, abbreviation, abbreviature)

Problem I seem to have with your code (using manually with user keyboard input) is that it always returns found! - even for words that don't exist...

"TEST" will return found.
"TESTX" will return found (it shouldn't)
"TESXT" will return foun (it shouldn't)

Can anyone help me fix this or explain if this is just because the above code was developed for a different application?

Cheers,

Phil.