wfaleiro
asked on
Sun Grid Installaion
Hi,
I am trying to install Sun grid on my linux machines. I have installed the software on a server. Its configured as the qmaster,administrative host, submit host and execution host.
I am trying to install another system as the execution host. When i try to run the install_execd script it exits saying it cannot contact the queue master. Can anyone help me out in this
Regards
Walter
I am trying to install Sun grid on my linux machines. I have installed the software on a server. Its configured as the qmaster,administrative host, submit host and execution host.
I am trying to install another system as the execution host. When i try to run the install_execd script it exits saying it cannot contact the queue master. Can anyone help me out in this
Regards
Walter
can you contact the other host via other means (telnet/ssh) and is the other host running any sort of firewall? (hosts.allow/hosts.deny correct etc etc)
ASKER
yes i can contact the hosts and there is no firewall running at the moment.
what do the logs say?
ASKER
03/12/2005 12:52:03|qmaster|lablin|I| read job database with 0 entries in 0 seconds
03/12/2005 12:52:03|qmaster|lablin|I| qmaster will use max. 1004 file descriptors for communication
03/12/2005 12:52:03|qmaster|lablin|I| qmaster will accept max. 99 dynamic event clients
03/12/2005 12:52:03|qmaster|lablin|E| no execd known on host lablin.marfic.local to send conf notification
03/12/2005 12:52:03|qmaster|lablin|I| starting up 6.0u3
03/12/2005 12:52:03|qmaster|lablin|I|
03/12/2005 12:52:03|qmaster|lablin|I|
03/12/2005 12:52:03|qmaster|lablin|E|
03/12/2005 12:52:03|qmaster|lablin|I|
which machine is "03/12/2005 12:52:03|qmaster|lablin|E| no execd known on host lablin.marfic.local to send conf notification" this error on? the master or the execution host? looks like it's from the master, which if that's the case, it's not running the necessary daemon. I am not familiar enough with sun grid installs to help you w/ much detail, but I would try running the execd daemon on that host.
ASKER
[root@lablin etc]# cat /etc/services | grep sge
sge_qmaster 536/tcp
sge_execd 537/tcp
[root@lablin etc]#
sge_qmaster 536/tcp
sge_execd 537/tcp
[root@lablin etc]#
ASKER
root@lablin named]# ps -ef | grep sge
sgeadmin 1470 1 0 14:09 ? 00:00:00 /opt/sge/bin/lx24-x86/sge_ execd
sgeadmin 1846 1 0 14:12 ? 00:00:00 /opt/sge/bin/lx24-x86/sge_ qmaster
sgeadmin 1865 1 0 14:13 ? 00:00:00 /opt/sge/bin/lx24-x86/sge_ schedd
root 2650 1185 0 15:18 pts/1 00:00:00 grep sge
[root@lablin named]#
sgeadmin 1470 1 0 14:09 ? 00:00:00 /opt/sge/bin/lx24-x86/sge_
sgeadmin 1846 1 0 14:12 ? 00:00:00 /opt/sge/bin/lx24-x86/sge_
sgeadmin 1865 1 0 14:13 ? 00:00:00 /opt/sge/bin/lx24-x86/sge_
root 2650 1185 0 15:18 pts/1 00:00:00 grep sge
[root@lablin named]#
ASKER
Hi,
I was doing the installation incorrectly. The sge_root directory should be accessible to all the hosts who are supposed to be on Grid Network. So set the sge_rot via nfs and it worked. But now am thinking of how to mount it automatically every time the host needs it. Is automount a good option
thanks
Walter
I was doing the installation incorrectly. The sge_root directory should be accessible to all the hosts who are supposed to be on Grid Network. So set the sge_rot via nfs and it worked. But now am thinking of how to mount it automatically every time the host needs it. Is automount a good option
thanks
Walter
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.