Solved

Bash Shell Sript to Echo and write to text file.

Posted on 2013-12-17
9
482 Views
Last Modified: 2013-12-19
Hi, I am working with a ROCKS cluster on RHEL 5.9. It is running PBS as the grid. I want to make sure that all ot the nodes of the cluster are responive, so I need a script that will possibly grab the name, date and time stamp from each node and write that info back to a text file on the front end node.

Can someone provide a quick example? Google is of too much help on the subject!
0
Comment
Question by:capperdog13
9 Comments
 
LVL 40

Assisted Solution

by:omarfarid
omarfarid earned 25 total points
ID: 39725921
You can run a crontab job on each node, say every 5 min, that will run

myhost=`hostname`
/usr/bin/hostname > /tmp/$myhost
/usr/bin/date >> /tmp/$myhost

You could then schedule an ftp or sftp job to copy the files to the front server
0
 
LVL 19

Assisted Solution

by:simon3270
simon3270 earned 25 total points
ID: 39725985
Or, if you have password-less logins to the nodes in the cluster (with ssh keys), you could run this as root on your central server.  Have a file with just a list of node names in it called hosts.lst, then:
while read host; do
    rsp=$(ssh $host 'echo $(hostname): $(date) 2>/dev/null' 2>/dev/null </dev/null)
    if [ "$rsp" != "" ]; then
        echo $rsp
    else
        echo Could not connect to $host at $(date)
    fi
done < hosts.lst > hosts.log

Open in new window

0
 

Author Comment

by:capperdog13
ID: 39726795
Great! Let me work with this and I will respond later today. The nodes do require a password, so the rsh script I don't think will apply.

Many thanks! Will get back with you.
0
 
LVL 27

Expert Comment

by:serialband
ID: 39728348
Do you have the ganglia roll installed and enabled?  You can just use that to track all the compute nodes.

Rocks also includes the tentakel command to query all the hosts more quickly, since it forks all the calls at once.  It should be set up if you loaded all the compute nodes with the rocks installer.  If you want the results to come back in order, you can sort the results afterwards.  The while loop could take quite a while if you have a lot of compute nodes.

It's much simpler to run this line to query all the hosts simultaneously.  Your results will likely come back out of order, but it'll be much faster than running the while loop and waiting for each node's network to respond.

tentakel "hostname; date; hostaname" >> compute_nodes.txt

If I remember correctly, I think you actually just need

tentakel date >> compute_nodes.txt

since tentakel already outputs the hostname of the system with the command.

The head node should have an ssh key automatically installed on each of the compute node already.  You shouldn't need a password when you run tentakel or ssh to the compute nodes, unless the installer messed up somehow or the system becomes corrupted by the users code crashing.  That does happen frequently enough when you have hundreds of systems, but the compute nodes should be easy and quick to reinstall.

http://www.rocksclusters.org/roll-documentation/base/5.5/index.html  You can install other linux distros with Rocks.  Rocks 6 is out and that supports Redhat 6
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:capperdog13
ID: 39729691
Hi yes we do have Ganglia installed and from the Web Front End all looks fine. Thanks for the tentakel date >> compute_nodes.txt It says all is fine as well.

I was just handed this old POS, so is it safe to say that from a high level that this cluster is functioning as it should relying on the tentakel cmd and the Gaglia front end??
0
 

Author Comment

by:capperdog13
ID: 39729706
Also, I do notice one problem you may be able to help with. The nodes are not reloading when I tell them to on a hard reboot. PXE is enabled on the nodes and they do make contact with the front end, but the frontend never sends a packet for the reload, time out occurs and the node boots back up to old image.

Any suggestions here?
0
 
LVL 27

Accepted Solution

by:
serialband earned 200 total points
ID: 39730033
From a high level, if Ganglia shows the system as functional and tentakel returns ok, then you should be good to go.

Your system is set to boot instead of install.  You need to change the setting
on the head node with the rocks command

rocks set host boot compute-0-0 action=install

Once the system is up, the action should revert.  If not, you can set the action back to boot.  You can list the settings with:

rocks list host boot

--

Some helpful hints:

The best place to ask rocks questions is through the rocks mailing list.  They have more experienced users as well as the developers checking the list.  You can sign up here.  https://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

It's been a year since I touched a Rocks cluster.  It depends on the error message.  Problems happen frequently with rocks when users run their computations on the head node.  Keep your users from running their processes on the head node.  

The other thing to check is that the rocks kickstart directories are working.  They sometimes get corrupted and don't show up properly.  You need to check that Apache is started correctly on the head node and that it's sharing the rocks directories for the compute nodes to connect to.  They need them to install.

Sometimes the compute nodes get corrupt and you just need to stick a live distro on them to completely wipe the  partition.  Unfortunately, the old kickstart on Redhat 5.x  doesn't work on a GUID partition, so anything 2 TB or larger needs to be tweaked.  It's simplest, and quickest, to stick a smaller drive in the system as the primary boot and configure kickstart to mount the secondary drive for processing space.

If all else fails, sometimes you just have to redo the head node installation.  This will take some time, but once set up, the compute nodes are quick to install.  They will install very quickly, but out of order on the rack if you turn them on all at once.  If you want them installed in order, you'll have to turn them on one at a time starting with the first one.  You'll need to watch until DHCP accepts them.  They'll automatically be numbered starting with rack 0, computer 0.
0
 

Author Comment

by:capperdog13
ID: 39730287
Hey thanks a bunch for all the info! I come from a Windows background and was literally tossed into the sea of Linux and told to fix that cluster...

I did the commands on the head node and forced an install on one of the nodes. I checked it with ROCKS LIST HOST BOOT before I hard rebotted the node, but it still did not reload. The nodes are not getting the info back from the server to reload like I mentioned ealier.

Anyway I am going to post this to the ROCKS site you gave me. You've been a big help!
Many thanks and have a happy holiday!!
0
 

Author Closing Comment

by:capperdog13
ID: 39730300
The original question was about a script to help me check a ROCKS cluster. Simon supplied me with a couple of great examples. thanks Simon! I did get the most help from serial, who has ROCKS experiance and went over and above with tips and links to help out.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

If you have a server on collocation with the super-fast CPU, that doesn't mean that you get it running at full power. Here is a preamble. When doing inventory of Linux servers, that I'm administering, I've found that some of them are running on l…
Linux users are sometimes dumbfounded by the severe lack of documentation on a topic. Sometimes, the documentation is copious, but other times, you end up with some obscure "it varies depending on your distribution" over and over when searching for …
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now