Solved

linux cluster

Posted on 2012-03-18
10
381 Views
Last Modified: 2012-09-27
Hi All

I have been reading for hours now online trying to find if it would be possible - assuming one had the hardware - say for example 50 X 1U servers - to build a Linux cluster - that could run windows 7 OS on the cluster? I see most if not all the articles online state that you can only run cluster aware applications on a cluster - which makes sense to me - but I also saw in some cases some people make reference to running for example Virtual Box on a cluster - and windows within it. But at the same time - others say that Virtual box doesn’t support MPI so it won’t make use of the nodes, and no, I am not confusing people saying you can run a cluster in a Virtual Environment, just in case. I can see for testing sake, one could run 4 VM’s – and then setup a cluster etc
Anyway what I am getting at is there is a lot of info - of yes / no - this that etc - also a lot is outdated.
Is Windows 7 a parallel processing OS? If it is, surely with PVM on a linux cluster, you could run Windows 7 on the cluster?
Some provide statements they have accomplished this, using Windows Applications like 3D studio Max – but don’t provide details.

Any advice on this would be appreciated.
0
Comment
Question by:basilthompson
  • 6
  • 4
10 Comments
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 37735695
Hi,

First of all lets put it this way. HPC or MPC concepts are developed around "Shared Nothing" concept. So HPC means several nodes of computers with no shared components such as memory or CPU. So no surprise that they will work in parallel only if the OS on each node are controlled by HPC enabled OS and in tun with HPC enabled software.

You should also understand that this is not an environment for Visualization. Visualization software works on Shared CPU/disk I/O / CPU systems and allocating a fraction of the performance of these resources to the virtual hosts

So in this sense Windows 7 is not a HPC os. So it is designed to run over a single hardware and they are desktop operating system missing some other treats for HPC.

Cheers,
K.
0
 

Author Comment

by:basilthompson
ID: 37736510
Cool - I am starting to understand it better. I guess I somehow thought that HPC clusters were seen as one logical system - with the head unit basically being the pc - and each compute node - an additional core in a way for that pc - with the cluster software , be it MPI or PVM the technology in how the head node sends computational requests to each core if you could call it that. I had hoped there was someway to get a Windows OS running on the head node but failing that - the application that I wish to run also runs on Linux - and in my case - Fedora, but I could probably get it going on Centos. Also the application is a non GUI application - its a program that reads in a scene, then does a render - then writes a file out. There is a GUI for the application, and a batch mode. So in this case - do you think its possible?

With a render farm - you need a license for each render node - and as we have a 200 node render farm - I am thinking if we broke our render farm into clusters - say 20 X 10 node clusters - that we would not need the same amount of licenses - assuming each cluster is treated as a single logical system. I see some guys saying they have built a linux cluster for rendering - but then if I read their methods - it seems they are confusing a typical render farm with a cluster (unless I dont understand what a cluster is) because they still use a render manager and see each node of their "cluster" as a render node.

In a HPC cluster - do you run an HPC aware program on each node - or just the head node and that head node uses the cpu power of each compute node. I read that you can even have diskless nodes - which even further makes me think I have the correct idea of how a HPC cluster works - but I could still be wrong. It just sounds like you dont need the applications on each node - only the head node - hence a actual cluster and not a render farm approach where each node does a frame - in a cluster - each node would contribute calculations towards the same frame that the head node is working on.

What do you think?
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 37736702
I had hoped there was someway to get a Windows OS running on the head node but failing that - the application that I wish to run also runs on Linux - and in my case - Fedora, but I could probably get it going on Centos. Also the application is a non GUI application - its a program that reads in a scene, then does a render - then writes a file out. There is a GUI for the application, and a batch mode. So in this case - do you think its possible?

Yeah this is the problem HPC/MPC is something that you use some number of independent nodes and you distribute load get the results back. So this is not an environment you can host Windows. When it Comes to Linux your Application must be able to use Linux HPC libraries that should be installed over your system..

Here's a book on IBM Redbooks family which is a know-how for Linux HPC.
http://tldp.org/HOWTO/Parallel-Processing-HOWTO.html

Please ask the application developer for Linux HPC support. The important thing about the application HPC support is not whether it is GUI based or not. The key point here is if it is able to distribute computing load among several nodes or not . Also what HPC/MPC features this program supports.

I am sending you this document on Linux Parallel Processing and models:

http://tldp.org/HOWTO/Parallel-Processing-HOWTO.html


With a render farm - you need a license for each render node - and as we have a 200 node render farm - I am thinking if we broke our render farm into clusters - say 20 X 10 node clusters - that we would not need the same amount of licenses - assuming each cluster is treated as a single logical system. I see some guys saying they have built a linux cluster for rendering - but then if I read their methods - it seems they are confusing a typical render farm with a cluster (unless I dont understand what a cluster is) because they still use a render manager and see each node of their "cluster" as a render node.

I don't think it is all about reducing licensing costs onthe contrarary they would like to sell you extra licenses to run over a parallel cluster. A cluster is simply some systems running in pararallel. In your PC htehre are multi-Cores but as you know they share a memory and some caching mechanism which could be at times a the bottleneck. This is why ssytems don't scale up linearly. But in a cluster there are a number of independent nodes so they can render similar shapes without a bottleneck in theri memory / cahing disks etc. So they scale up perfectly well. But since nodes are simply different boxes you have a network interconnect between them. This could be a bottleneck at times. But nowadays 10Gig Ethernet and 40 Gbps QDR Infiniband are available options for interconnect so this should not be an issue for you.


In a HPC cluster - do you run an HPC aware program on each node - or just the head node and that head node uses the cpu power of each compute node. I read that you can even have diskless nodes - which even further makes me think I have the correct idea of how a HPC cluster works - but I could still be wrong. It just sounds like you dont need the applications on each node - only the head node - hence a actual cluster and not a render farm approach where each node does a frame - in a cluster - each node would contribute calculations towards the same frame that the head node is working on.

Your program should be Parallelism aware whether it communicates with a master node or it distributes load among different nodes it requires you to run a software which is capable of parallel multiprocessing.

Just think of this operation:

A=5+B
C=B*2
D=B/2

Well you can't parallelize A and C but you can parallellize C and D. So your software should decide what to parallelize and how to parallelize.

So all the software should be parallelism aware but whether on a single node or multiple nodes are problems of implementation of HPC.

Cheers,
K.
0
 

Author Comment

by:basilthompson
ID: 37737578
I don't think it is all about reducing licensing costs onthe contrarary they would like to sell you extra licenses to run over a parallel cluster. A cluster is simply some systems running in pararallel. In your PC htehre are multi-Cores but as you know they share a memory and some caching mechanism which could be at times a the bottleneck. This is why ssytems don't scale up linearly. But in a cluster there are a number of independent nodes so they can render similar shapes without a bottleneck in theri memory / cahing disks etc. So they scale up perfectly well. But since nodes are simply different boxes you have a network interconnect between them. This could be a bottleneck at times. But nowadays 10Gig Ethernet and 40 Gbps QDR Infiniband are available options for interconnect so this should not be an issue for you.

I will go over those links, I think I may have already come across them during my researching this, but thank you none the less. However, the licensing is one of the main reasons I am considering this approach - as we already have our setup in the farm method with a rendering management application controlling which render node renders which frame. Each nodes needs to be licensed - for the rendering application as well as the chosen renderer - those costs are R10K per box. So you can see that if we could combine processing power of 10 systems into one system - the savings is huge. Apparently our application supports parallel processing - so I would like to setup a test lab - unless a HPC cluster requires licensign for each compute node - which I dont know if the application in question would even be aware of how many compute nodes exist in the cluster - isnt that transparent to the application?

The program is Autodesk Softimage 2012 - some googling states thats its engine supports parallel processing. I support a company that uses this app - yet I dont know much about the application - just how to make sure it runs - and the artists who drive the systems cant tell me for sure anything technical about the program.

I see a few people online trying to do the same thing. Even reference to the movie "the Titanic" being done on a beowulf cluster.

Given the ambiguity in the usage of terminology and blurred boundaries between various technologies, we will stick to the definition of clusters given by Pfister[]:

A cluster is a type of parallel or distributed system that:

    consists of a collection of interconnected whole computers
    and is used as single, unified computing resource.

The ``whole computer'' in above definition can have one or more processors built into a single operating system image.


Found that in one article - so do you agree that it sounds like you would only need 1 license per cluster? assuming the application could run on a HPC cluster - am I correct in reading the documents as saying that the head node presents itself as the "computer" with the compute nodes as the additional cpu proccessing for that head node?
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 37739943
Found that in one article - so do you agree that it sounds like you would only need 1 license per cluster? assuming the application could run on a HPC cluster - am I correct in reading the documents as saying that the head node presents itself as the "computer" with the compute nodes as the additional cpu proccessing for that head node?

In fact the definition you've quoted here says basically the same thing with me.. Parallel system is Shared nothing system (having one or more processors/cores)  they act as a single computer resource. But here single computing resource refers to single  unified resource refers to computing power not necessarily the licensing costs.

Currently Audtodesk Softimage has a licensing scheme so that every node in a HPC must be licensed and running a copy of Autodesk Softimage to use in a HPC environment. Another requirement is each node should run Windows 2008 R2 HPC (so you should buy these licenses for the OS too). There's no way you can do it over a Linux HPC cluster. As I told earlier one big system refers to computing power not licensing.

Just think for them would you be charging a single user or a cluster of 30 computers ?? So this is why they have this suite. To make more profit.

Cheers,
K.
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:basilthompson
ID: 37740939
hmmm, I think we are missing each other slightly - my whole view is based on the fact that a cluster is supposed to be a single system - not multiple systems - otherwise its just a collection of computers - each working independently of the other - and thus licensed independently. I agree - obviously Autodesk would never want someone to purchase 1 license - but rather 20 or 30 or even more - but thats why people come up with intuitive ways to keep costs down - hence I am looking into clustering - but not in the sense of a bunch of machines - but rather one logical system made up of many machines.

I dont understand why people go the route of a linux HPC cluster for rendering - if its easier and less complicated to go the route of a render farm - if the licensing costs are the same - what is the benifit of running an HPC cluster then or why do some post houses go that route? Something just doesnt make sense to me. The definition is incorrect in my mind then if you need to run the application on each node - because then each node acts independently - rendering a specific frame - which is how we currently have it.

A single computer - no matter how many cores it has - only uses one license - with the software we use - so if a cluster - a true cluster - not just a bunch of machines on a lan all working on the same job but different frames - is supposed to be a single super computer - why license each core? (license each compute node)?

I think I need to go over those documents you suggested - because I must be missing something - I thought I had read somewhere that in an HPC cluster - you run the hpc aware application on a single machine - refered to the head node - which then using high speed interconnects - like infiniband 40Gbps or something crazy like that - sends data to the individual nodes as if they were simply additional cores in a proccesor. Surely in a single computer - you run one instance of the application - not a copy on each core in a multicore cpu.

I guess I thought because HPC is used to work out complex calculations - say working out a single sum that would take one normal computer 100 years to compute - but take a super computer 1 hour to compute - that it works like one single super fast computer. So when it comes to rendering out frames - the cluster would deal with only one frame at a time.
0
 

Author Comment

by:basilthompson
ID: 37741008
In fact the definition you've quoted here says basically the same thing with me.. Parallel system is Shared nothing system (having one or more processors/cores)  they act as a single computer resource. But here single computing resource refers to single  unified resource refers to computing power not necessarily the licensing costs.

But that goes against what the documents you mentioned say - there is shared memory - either in a distrbuted or shared approach - when referring to parallel processing.


Distributed memory approach

It is useful to think a master-slave model here:

The master node divides the work between several slave nodes.
Slave nodes work on their respective tasks.
Slave nodes intercommunicate among themselves if they have to.
Slave nodes return back to the master.
The master node assembles the results, further distributes work, and so on.
Obvious practical problems in this approach stem from the distributed-memory organization. Because each node has access to only its own memory, data structures must be duplicated and sent over the network if other nodes want to access them, leading to network overhead. Keep these shortcomings and the master-slave model in mind in order to write effective distributed-memory programs.


So I understand it better now - I guess that the HPC clusters you are referring to are SMP - where they do not share memory. What is the purpose of using very fast interconnect though? I had assumed that those were for sharing memory in a distributed memory approach.
0
 
LVL 30

Accepted Solution

by:
Kerem ERSOY earned 500 total points
ID: 37741195
Nodes in HPC clusters may or may not be consisting of SMP systems. Since they are separate systems they don't share nothing. Fast interconnect is required to distribute load among the nodes and pick up the processed files and send them back to the requester. This is why there's a need for a fast interconnect or this will be the bottleneck and slow down the whole operation.

But that goes against what the documents you mentioned say - there is shared memory - either in a distrbuted or shared approach - when referring to parallel processing.

You are picking up best liked parts from some theoretical work as it pleases you. But things do not work that way. In reality what you have an implementation of an AutoDesk product it has a licensing scheme and a shared nothing implementation.   We're not discussing the merits of HPC/MPC. This is the implementation Autodesk guys picked and unfortunately theee's nothing to do about it.

Cheers,
K.
0
 

Author Comment

by:basilthompson
ID: 37741918
Hey hang on a sec, I am not picking up what I want - and I didnt realise its theoretical - I thought in practise it is possible to build a cluster with a shared memory approach - and if so you can get around Autodesks licensing limitations in a completely legal way, albeit at probably the same cost if not more - from a hardware point of view. I simply posted a question - and during the conversations I am learning more about what exists and how they work. I do agree with you - in a HPC cluster, which according to the documentation you refered to - in a SMP cluster, you do not get shared memory - , but as you wrote


Nodes in HPC clusters may or may not be consisting of SMP systems.

Its obvious that you do get clusters that can share memory - which according to online docs, use a proprietory interconnect to achieve this - again - thats not theoretical - but actually exists.


Shared memory is a model for interactions between processors within a parallel system. Systems like the multi-processor Pentium machines running Linux physically share a single memory among their processors, so that a value written to shared memory by one processor can be directly accessed by any processor. Alternatively, logically shared memory can be implemented for systems in which each processor has it own memory by converting each non-local memory reference into an appropriate inter-processor communication. Either implementation of shared memory is generally considered easier to use than message passing. Physically shared memory can have both high bandwidth and low latency, but only when multiple processors do not try to access the bus simultaneously; thus, data layout still can seriously impact performance, and cache effects, etc., can make it difficult to determine what the best layout is.

So - I think we can safely say that Autodesk will not work in a non shared memory HPC cluster - and I imagine they would never document the ability to work on a shared memory cluster configured in a specific way - hence I am relying on professional advice from this forum. Please stop telling me what will not work - and give consideration to other possible solutions - and if there are none - then put it out there as that - and I will no longer drive to achive this.

I just found out you get a SSI cluster - which is probably what I was originaly thinking a cluster is -


In distributed computing, a single system image (SSI) cluster is a cluster of machines that appears to be one single system.[1][2] The concept is often considered synonymous with that of a distributed operating system,[3][4] but a single image may be presented for more limited purposes, just job scheduling for instance, which may be achieved by means of an additional layer of software over conventional operating system images running on each node.[5] The interest in SSI clusters is based on the perception that they may be simpler to use and administer than more specialized clusters.

Different SSI systems may provide a more or less complete illusion of a single system.

Do you think a SSI cluster may work?
0
 

Author Closing Comment

by:basilthompson
ID: 38440336
Didnt really answer the question - but did provide useful information
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Using 'screen' for session sharing, The Simple Edition Step 1: user starts session with command: screen Step 2: other user (logged in with same user account) connects with command: screen -x Done. Both users are connected to the same CLI sessio…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now