• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 431
  • Last Modified:

How to make a very big dictionary available to every program?

Hi,

I need to use Java load a very big dictionary (tens of thousands of words) into some Java data type, and likely load it into memory so that all programs (some are web applications and the other are just back-end programs) can easily access to the dictionary.

One example is that when users input some keywords into an Input box on the web page, our program can use the dictionary to check if some keywords are spelled wrong and if some keywords are the most frequently used keywords.

The way that I am thinking of is to use hashset and load the dictionary into the hashset object. But how to make the object loaded once and how to make it available to all the other programs seem to be a challenge to me. Please help.

Thanks
0
wsyy
Asked:
wsyy
3 Solutions
 
mwochnickCommented:
use the java singleton pattern
here are two articles on the pattern with code - the second one covers common mistakes as well
http://www.javabeginner.com/learn-java/java-singleton-design-pattern
http://java.sun.com/developer/technicalArticles/Programming/singletons/
0
 
CEHJCommented:
I would make the resource available via JNDI. It can then be used by java apps and java web apps alike
0
 
wsyyAuthor Commented:
Thanks for replies.

CEHJ, could you please be specific about how to make JNDI work in my case? To my understanding JNDI is widely used in connection pool. Then how can this work in the dictionary which I assume that an hashset object is used?

mwochnick, I think singleton is an alternative whose purpose is to make a single dictionary available for all accesses. However, like accessing to a database connection, accessing to this singleton object may cause performance issue due to so many concurrent accesses in my case.

Now is it possible to combine the singleton solution and the JNDI solution?

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
CEHJCommented:
>>To my understanding JNDI is widely used in connection pool. Then how can this work in the dictionary which I assume that an hashset object is used?

You can make any type of object available with JNDI. See the tutorial. This example uses a Hashtable, but HashSet would be exactly the same:

http://download.oracle.com/javase/jndi/tutorial/getStarted/examples/naming.html
0
 
wsyyAuthor Commented:
CEHJ, can I combined JNDI and singleton togather? Seem like each has unique, useful features.
0
 
CEHJCommented:
If you use JNDI, it will automatically be a singleton - it's just ONE resource
0
 
wsyyAuthor Commented:
Is it loaded into memory at the very beginning?

If not, how is the performance of using JNDI compared with using something else that can be loaded into memory?

Sorry these questions just comes to my mind?
0
 
gordon_vt02Commented:
You have to use something like JNDI, or wrap your dictionary in its own web service if you want multiple JVMs to have access to it.  The key there is that you have to have a centralized service the others can communicate with -- otherwise they will all have to load their own copy.

The singleton pattern usually refers to objects within a single JVM -- only one of that object is ever created and is shared by all other objects.  In the service you create, your dictionary should definitely be a singleton, especially if it is large and takes a lot of storage and/or initialization time.

As far as concurrency goes, you have several options.  If you know that the dictionary will never change and you will only be performing reads, you can encapsulate the HashSet within a service class that only provides read methods, making that class immutable and, therefore, thread safe.  This approach will give you the best performance because you won't need to deal with any synchronization issues beyond insuring the dictionary is initialized before its first use.  Alternatively, you can use a ConcurrentHashMap which will handle all of the synchronization for you and has been shown to be faster than wrapping a Map in Collections.synchronizedMap() and handling synchronization yourself in the majority of cases.

Bottom line is, you need to wrap the dictionary in a service that can be used by your other applications.  How you expose that service -- JNDI, web service, RMI, etc. -- is up to you.  It sounds like the dictionary will be read-only, and therefore immutable, so a single instance can be shared by multiple threads without concurrency problems.  Hope that helps!
0
 
wsyyAuthor Commented:
gordon_vt02: thanks a lot for such detailed suggestions.

The dictionary is most likely read only, and shared by both web services as well as back-end programs. For that case, will JNDI be a better way since non-web programs are using it too.

I have one additional questions forgive me:

1) How JNDI be accessible to web services and back-end programs at the same time? Do you know some posts that are helpful?
0
 
CEHJCommented:
>>How JNDI be accessible to web services and back-end programs at the same time? Do you know some posts that are helpful?

Because it's a network-enabled service. The tutorial i posted a link to is all you need
0
 
wsyyAuthor Commented:
Excellent
0
 
CEHJCommented:
:)
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now