Posted on 2002-03-15
1- A list of files and their sizes in bytes, and
2- a list of servers and their "processing power" in bytes per second, and
3- the fact that there are many more files than servers.
How can the files be processed in the least amount of time?
In other words what would be the best algorithm to divide the files into sub-lists, one list for each server so that the overall processing time would be minimal?
My solution so far is to sort the lists in descending order. Then to assign the biggest file to the fastest server, the second biggest file to the second fastest sever etc. When the end of the server list is reached the next file would be assigned to the fastest server, and the file after that to the second fastest server etc. This process would be continued until the file list was used up.
Seems to me that this problem as been faced many times before, and the might be a better solution.
This problem is not hypothetical. I am trying to use old computers and DCOM to process a huge number of large log files. I want to preconfigure the lists to save network overhead. The servers are all fully dedicated. The “processing power” of each server would be determined by a standard benchmark test. Each of the files is simply converted from binary to text, before further processing.