Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 491
  • Last Modified:

Java Executor Framework: Best Way To Divide A List Equally Among n Threads

Looking for the best way to divide a List of objects equally among multiple instances of a class that processes those objects, using the Executor framework, and a fixed number of threads, say 10.  As an simplified example, assume we have a List of 10 million File objects that need to be parsed and those that contain some string of characters are to be deleted.  One simple way would be to divide the List into 10 new Lists, each containing 1 million objects; create an instance of the FileProcessor class, which implements Runnable, passing its allocation of the first 1 million File objects to it; then calling ThreadPoolExecutor's execute method, passing in that instance of FileProcessor; then create a new List of the next one million, etc.
My question is, is there a better way to do this that does not require creating those 10 new "sub-lists?"  For one thing, it gets more complicated when the division does not yield an even number; for another, it predetermines what each thread is responsible for, and if a thread happens to crash, its one million objects (or some portion thereof) don't get processed.      
0
whandley
Asked:
whandley
  • 9
  • 4
  • 3
  • +2
1 Solution
 
objectsCommented:
why split them at all. Be simpler have have each thread simply pulling objects off a single list and processing them.
0
 
objectsCommented:
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
whandleyAuthor Commented:
"why split them at all. Be simpler have have each thread simply pulling objects off a single list and processing them."

Exactly, I'm looking for a simpler method. However, I don't understand what you mean, can you expand on this?  If you have a list of 10 million objects and you give the same list to 10 threads, what's to keep all ten threads from processing the entire list?  Your examples unfortunately don't address the problem of dividing a fixed quantity of work among a number of threads.
0
 
objectsCommented:
because each thread will take the item off the list before processing. And by using a thread safe list we ensure the threads all interact safely with the list. Once finished they could store them on another list if you need to track work done.

> Your examples unfortunately don't address the problem of dividing a fixed quantity of work among a number of threads.

Thats actually exactly what they do :)
0
 
objectsCommented:
for example http://www.exampledepot.com/egs/java.lang/FixedWorkQueue.html

queue is the list of jobs to be processed
and workers is an array of threads processing items off the list
0
 
msk_apkCommented:
Another way would be having master/worker threads. When worker thread ask for the task, it returns the next available task. Instead of dividing by 1 million, let each thread comes and ask for the task, master returns next task i.e one file object.
if u do like this, there is no division problem and restoration problem will get minimized.
Runnable object can be created for each file object which can be passed to ThreadPoolExecutor.

for example,

public class FileRunnable implments Runnable
{
      public FileRunnable(int fileid)
      {
       
      }

       public void run()
       {
                   //process with that fileid
      }

      public static void main(String[] a)
      {
               ThreadPoolExecutor executor = new ThreadPoolExecutor();
                for(int i=0;i<100000;i++)
                {
                        executor.execute(new FileRunnable(i));
                }
       }
}

0
 
objectsCommented:
> let each thread comes and ask for the task, master returns next task i.e one file object.

thats basically what I suggested above
0
 
msk_apkCommented:
@objects, when i finished posting, i saw 7 replies. when i started typing my answer there is no reply at all. expecting some positive replies.
0
 
objectsCommented:
thats ok :)
0
 
CEHJCommented:
whandley, why are you using multiple threads at all btw? Do you have multiple processors? Unless the answer is yes or there are special reasons for doing so, it will almost certainly slow the process down.
0
 
phutsaCommented:
it's help me a lot. good
0
 
whandleyAuthor Commented:
Objects:  I see, so each thread removes the entry from the list after successfully doing its work. Now it makes sense.  Have to think about whether this is better, or msk's idea of master/worker threads. Thanks to all for your help.
CEJH: Yes, of course; dual Xeon 5680 processors with 6 cores each.
0
 
msk_apkCommented:
if something needs to be removed from the list, that too for a larger quanity, certainly it will cost you. its better to truncate/clear the list than removing it one by one. if 10 million fileids/file names cannot be loaded in the memory, then the master thread can load them in the memory lazily. i.e first 1 million file names in the list, then discard the list and then second 1 million file names in the list etc. Let the worker thread know the filename/fileid alone from the master thread. by this design, you are avoiding the synchronization completely i.e only one master thread is always looking the filenames in the list.  
Always have 10 file objects loaded in the memory by 10 worker threads.
0
 
whandleyAuthor Commented:
msk:  Thanks, good points.  
All:  I'm wondering this also: 10 workers are iterating over this shared list; worker 5 removes the object he is working on from the list; does that not invalidate the iterators the other 9 have?  I'm sure the answer is "no, if it's done in a thread-safe manner," but exactly how?  I don't think synchronizing the removal (possibly by using your own wrapper, or one of the built-in synchronized wrappers) is sufficient.
0
 
objectsCommented:
you wouldn't use an Iterator. You'd use a queue and each thread would just pull the top element off the queue
0
 
objectsCommented:
and you would use a thread safe queue, such as BlockingQueue
0
 
whandleyAuthor Commented:
Thank you, the solution you stated turned out to be exactly correct. I graded the answer a "B," however, in fairness to those who take the time in their answers to be more thorough and explicit.
For the sake of others who need this information for a similar solution, here's a little more detail:  the first series of posts for the accepted answer talk about "pulling items off a list...," which had me thoroughly confused, since I was pretty sure you couldn't do that in a threaded environment. Finally this was switched to a "queue." turns out what was meant was not a list at all, but a Java object introduced in 1.5 called a BlockingQueue that I was  not familiar with, but should have been.  This object greater simplifies the multithreaded consumer-producer pattern that you used to have to create a lot of classes for and implement very carefully. The BlockingQueue does all of the work of thread safety for you. This was an ideal solution to my problem.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 9
  • 4
  • 3
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now