Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


Bash Shell Sript to Echo and write to text file.

Posted on 2013-12-17
Medium Priority
Last Modified: 2013-12-19
Hi, I am working with a ROCKS cluster on RHEL 5.9. It is running PBS as the grid. I want to make sure that all ot the nodes of the cluster are responive, so I need a script that will possibly grab the name, date and time stamp from each node and write that info back to a text file on the front end node.

Can someone provide a quick example? Google is of too much help on the subject!
Question by:capperdog13
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 40

Assisted Solution

omarfarid earned 100 total points
ID: 39725921
You can run a crontab job on each node, say every 5 min, that will run

/usr/bin/hostname > /tmp/$myhost
/usr/bin/date >> /tmp/$myhost

You could then schedule an ftp or sftp job to copy the files to the front server
LVL 20

Assisted Solution

simon3270 earned 100 total points
ID: 39725985
Or, if you have password-less logins to the nodes in the cluster (with ssh keys), you could run this as root on your central server.  Have a file with just a list of node names in it called hosts.lst, then:
while read host; do
    rsp=$(ssh $host 'echo $(hostname): $(date) 2>/dev/null' 2>/dev/null </dev/null)
    if [ "$rsp" != "" ]; then
        echo $rsp
        echo Could not connect to $host at $(date)
done < hosts.lst > hosts.log

Open in new window


Author Comment

ID: 39726795
Great! Let me work with this and I will respond later today. The nodes do require a password, so the rsh script I don't think will apply.

Many thanks! Will get back with you.
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

LVL 30

Expert Comment

ID: 39728348
Do you have the ganglia roll installed and enabled?  You can just use that to track all the compute nodes.

Rocks also includes the tentakel command to query all the hosts more quickly, since it forks all the calls at once.  It should be set up if you loaded all the compute nodes with the rocks installer.  If you want the results to come back in order, you can sort the results afterwards.  The while loop could take quite a while if you have a lot of compute nodes.

It's much simpler to run this line to query all the hosts simultaneously.  Your results will likely come back out of order, but it'll be much faster than running the while loop and waiting for each node's network to respond.

tentakel "hostname; date; hostaname" >> compute_nodes.txt

If I remember correctly, I think you actually just need

tentakel date >> compute_nodes.txt

since tentakel already outputs the hostname of the system with the command.

The head node should have an ssh key automatically installed on each of the compute node already.  You shouldn't need a password when you run tentakel or ssh to the compute nodes, unless the installer messed up somehow or the system becomes corrupted by the users code crashing.  That does happen frequently enough when you have hundreds of systems, but the compute nodes should be easy and quick to reinstall.

http://www.rocksclusters.org/roll-documentation/base/5.5/index.html  You can install other linux distros with Rocks.  Rocks 6 is out and that supports Redhat 6

Author Comment

ID: 39729691
Hi yes we do have Ganglia installed and from the Web Front End all looks fine. Thanks for the tentakel date >> compute_nodes.txt It says all is fine as well.

I was just handed this old POS, so is it safe to say that from a high level that this cluster is functioning as it should relying on the tentakel cmd and the Gaglia front end??

Author Comment

ID: 39729706
Also, I do notice one problem you may be able to help with. The nodes are not reloading when I tell them to on a hard reboot. PXE is enabled on the nodes and they do make contact with the front end, but the frontend never sends a packet for the reload, time out occurs and the node boots back up to old image.

Any suggestions here?
LVL 30

Accepted Solution

serialband earned 800 total points
ID: 39730033
From a high level, if Ganglia shows the system as functional and tentakel returns ok, then you should be good to go.

Your system is set to boot instead of install.  You need to change the setting
on the head node with the rocks command

rocks set host boot compute-0-0 action=install

Once the system is up, the action should revert.  If not, you can set the action back to boot.  You can list the settings with:

rocks list host boot


Some helpful hints:

The best place to ask rocks questions is through the rocks mailing list.  They have more experienced users as well as the developers checking the list.  You can sign up here.  https://lists.sdsc.edu/mailman/listinfo.cgi/npaci-rocks-discussion

It's been a year since I touched a Rocks cluster.  It depends on the error message.  Problems happen frequently with rocks when users run their computations on the head node.  Keep your users from running their processes on the head node.  

The other thing to check is that the rocks kickstart directories are working.  They sometimes get corrupted and don't show up properly.  You need to check that Apache is started correctly on the head node and that it's sharing the rocks directories for the compute nodes to connect to.  They need them to install.

Sometimes the compute nodes get corrupt and you just need to stick a live distro on them to completely wipe the  partition.  Unfortunately, the old kickstart on Redhat 5.x  doesn't work on a GUID partition, so anything 2 TB or larger needs to be tweaked.  It's simplest, and quickest, to stick a smaller drive in the system as the primary boot and configure kickstart to mount the secondary drive for processing space.

If all else fails, sometimes you just have to redo the head node installation.  This will take some time, but once set up, the compute nodes are quick to install.  They will install very quickly, but out of order on the rack if you turn them on all at once.  If you want them installed in order, you'll have to turn them on one at a time starting with the first one.  You'll need to watch until DHCP accepts them.  They'll automatically be numbered starting with rack 0, computer 0.

Author Comment

ID: 39730287
Hey thanks a bunch for all the info! I come from a Windows background and was literally tossed into the sea of Linux and told to fix that cluster...

I did the commands on the head node and forced an install on one of the nodes. I checked it with ROCKS LIST HOST BOOT before I hard rebotted the node, but it still did not reload. The nodes are not getting the info back from the server to reload like I mentioned ealier.

Anyway I am going to post this to the ROCKS site you gave me. You've been a big help!
Many thanks and have a happy holiday!!

Author Closing Comment

ID: 39730300
The original question was about a script to help me check a ROCKS cluster. Simon supplied me with a couple of great examples. thanks Simon! I did get the most help from serial, who has ROCKS experiance and went over and above with tips and links to help out.

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question