How to build a parallel search engine in java?

Posted on 2011-04-23
Last Modified: 2012-05-11

       I was trying to make a parallel search engine in java, using a parallelized depth first search algorithm, but I didn't know where to start... are there any example programs I could start off with?

Thanks in advance!
Question by:errang
    LVL 47

    Expert Comment


    Isn't this something related to what you are looking for:

    Author Comment

    Yea, I ran across that, but I was looking for something more basic, that would explain how your thread is going to move through a set of nodes, when the nodes don't contain any information.
    LVL 47

    Assisted Solution


    Well, if you google like that:

    parallelized depth first search algorithm java

    there seems to be a lot of information with more
    explanations and less explanations.
    Well, if you happen to encounter some specialists here,
    maybe you'd get some better explanation.
    I'm not sure that algorithms zone here is as active as java zone.
    Alternatively, perhaps some graph algorithms forum would be
    your better choice. After all, it would probably not  matter for you too much
    if this is in Java or C++

    LVL 30

    Accepted Solution

    In fact, a parallel search engine algorithm requires a good knowledge of artificial intelligence.

    To find an algorithm for a specific problem, the nature of the the search will depend on the nature of data being searched - i.e. numerical or text, the length of the data field or fields that are being searched, data base or flat file, and so forth.

    You have surely seen these already, here is a link to a scholarly paper that has a theoretical algorithm, A Scalable Distributed Parallel Breadth-First Search, which mentions:
    for distributed parallel BFS algorithms where the computation moves to the processor owning the data.  Obviously, the scalability of the distributed BFS algorithm for very large graphs becomes a critical issue, since the demand for local memory and inter-processor communication increases as the graph size increases.

    This article Parallel Search concerns parallel searches as they apply to chess programming and contains explainations for these algorithms:
    Shared Hash Table
    Parallel Alpha-Beta
    Principal Variation Splitting (PVS)

    LVL 26

    Assisted Solution

    Are you clear on how to write the single threaded depth first search algorithm?  You walk the nodes of the tree going steadily deeper until you either find the goal of the search or hit a leaf of the tree, then you back up.

    I'd make sure you know how to code that first before you move to parallelizing it.  Let's assume you have a method that implements single threads depth first search like this:

    public void implementDFS(Node root) { ... }

    To make it run in parallel, one approach would be to spawn a thread for each child of the top node of the tree.  For a binary tree this gives you 2 threads - not a huge increase in parallelism, but pretty easy to see how to do this I hope:

    public void simpleParallelSeach(Node root) {
         Node firstChild = root.getLeftChild() ;
         Node secondChild = root.getRightChild() ;

         // Not the best way to start a thread (not even sure it'll compile), but you get the idea...
         new Thread() { public void run() { implementDFS(firstChild) ; } }.start();
         new Thread() {  public void run() { implementDFS(secondChild) ; } }.start() ;

    If you can get that working now you've got 2 threads, each doing half of the work.  Depending on the work (e.g. looking up some data) you may need to combine the results.  It's probably fine to just do this combining step in a single thread at the end, as long as the search happens in parallel.

    Now imagine you wanted 4 threads instead of 2.  I hope you can see that you could use the same approach, but start the threads one level deeper in the tree (again still assuming it's binary).  You'd locate all 4 grandchildren of the root and start the DFS on each of those 4 nodes.  (If the tree isn't binary you can still use the same method but for n-children per node you'd get n^2 grandkids and n^2 threads).

    If you can get that all working, then you can probably start to see how you could pass in the target number of threads as a parameter and have it start that many threads.  You generally want about 1-2 per core of the machine you're running on (e.g. on a quad core CPU you want 4-8 threads generally) for maximum performance.

    I hope that gets your started.


    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Looking for New Ways to Advertise?

    Engage with tech pros in our community with native advertising, as a Vendor Expert, and more.

    Suggested Solutions

    Article by: Nadia
    Linear search (searching each index in an array one by one) works almost everywhere but it is not optimal in many cases. Let's assume, we have a book which has 42949672960 pages. We also have a table of contents. Now we want to read the content on p…
    Prime numbers are natural numbers greater than 1 that have only two divisors (the number itself and 1). By “divisible” we mean dividend % divisor = 0 (% indicates MODULAR. It gives the reminder of a division operation). We’ll follow multiple approac…
    This video teaches viewers about errors in exception handling.
    The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

    761 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    14 Experts available now in Live!

    Get 1:1 Help Now