Link to home
Start Free TrialLog in
Avatar of wfaleiro
wfaleiroFlag for India

asked on

Sun Grid Installaion

Hi,
I am trying to install Sun grid on my linux machines. I have installed the software on a server. Its configured as the qmaster,administrative host, submit host and execution host.
I am trying to install another system as the execution host. When i try to run the install_execd script it  exits saying it cannot contact the queue master. Can anyone help me out in this

Regards
Walter
Avatar of David Piniella
David Piniella
Flag of United States of America image

can you contact the other host via other means (telnet/ssh) and is the other host running any sort of firewall? (hosts.allow/hosts.deny correct etc etc)
Avatar of wfaleiro

ASKER

yes i can contact the hosts and there is no firewall running at the moment.
what do the logs say?
03/12/2005 12:52:03|qmaster|lablin|I|read job database with 0 entries in 0 seconds
03/12/2005 12:52:03|qmaster|lablin|I|qmaster will use max. 1004 file descriptors for communication
03/12/2005 12:52:03|qmaster|lablin|I|qmaster will accept max. 99 dynamic event clients
03/12/2005 12:52:03|qmaster|lablin|E|no execd known on host lablin.marfic.local to send conf notification
03/12/2005 12:52:03|qmaster|lablin|I|starting up 6.0u3
which machine is "03/12/2005 12:52:03|qmaster|lablin|E|no execd known on host lablin.marfic.local to send conf notification" this error on? the master or the execution host? looks like it's from the master, which if that's the case, it's not running the necessary daemon. I am not familiar enough with sun grid installs to help you w/ much detail, but I would try running the execd daemon on that host.
[root@lablin etc]# cat /etc/services | grep sge
sge_qmaster     536/tcp
sge_execd       537/tcp
[root@lablin etc]#
root@lablin named]# ps -ef | grep sge
sgeadmin  1470     1  0 14:09 ?        00:00:00 /opt/sge/bin/lx24-x86/sge_execd
sgeadmin  1846     1  0 14:12 ?        00:00:00 /opt/sge/bin/lx24-x86/sge_qmaster
sgeadmin  1865     1  0 14:13 ?        00:00:00 /opt/sge/bin/lx24-x86/sge_schedd
root      2650  1185  0 15:18 pts/1    00:00:00 grep sge
[root@lablin named]#
Hi,
I was doing the installation incorrectly. The sge_root directory should be accessible to all the hosts who are supposed to be on Grid Network. So set the sge_rot via nfs and it worked. But now am thinking of how to mount it automatically every time the host needs it. Is automount a good option

thanks
Walter
ASKER CERTIFIED SOLUTION
Avatar of David Piniella
David Piniella
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial