• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 5512
  • Last Modified:

Pre-testing ssh connection

I have to administer many RHEL4 linux nodes from my desktop. I do this by passing commands from the desktop to remote nodes through a trusted ssh channel. It works fine except when the remote node is in a semi-dead state such that network is alive, remote ssh server accepts connection but does not do any thing beyond this. As a result ssh connection neither fails nor work completely i.e. my command passing hangs. I can interrupt it by a ctrl-c and go to the next node which works only if I am running the script interactively. How can I skip a node in a cron job?

I tried pre-testing ssh connection by "ssh -o BatchMode=yes -o ConnectTimeout=2 nodexx /bin/true" but this does not timeout after 2 seconds.
  • 4
  • 2
1 Solution
There are perhaps a few ways to get around it.

In your ssh_config file (/etc/ssh/ssh_config in Slackware), you have the option for a couple of variables that might help.

             Specifies the number of tries (one per second) to make before exiting.  The argument must be an integer.  This may be useful
             in scripts if the connection sometimes fails.  The default is 1.

             Specifies the timeout (in seconds) used when connecting to the SSH server, instead of using the default system TCP timeout.
             This value is used only when the target is down or really unreachable, not when it refuses the connection.

The 'ConnectTimeout' might help you out here - if it doesn't get a full connection in X amount of time, it should disconnect the session and drop back to shell.   In Slackware, the default ConnectTimeout is 0 - or disabled.  (Actually, 0 is even commented out.  So it's probably 0 by default).   I haven't logged into one of my RH boxes to check this one.

Another workaround would be to have a process 'watch' the SSH stream, and keep an eye out - if it doesn't see the shell prompt within X seconds, terminate and go to the next.  

Hopefully the ConnectTimeout fixes the problem.  
vinodAuthor Commented:
As I said in my original posting:

I tried pre-testing ssh connection by "ssh -o BatchMode=yes -o ConnectTimeout=2 nodexx /bin/true" but this does not timeout after 2 seconds.

ConnectTimeout given on command line should override the default, but it does not work:(
'watch' also works interactively. I need something that works in batch mode.
If you need it working in batch mode, I'd try the ConnectTimeout in the main configuration file, rather than command line.  It may be that it won't work correctly from the command line.

You _could_ run a loop that first attempts a telnet session to the ssh port - if the SSH isn't responding correctly, it won't give the right answer.  

You might test it with the next 'hung' server - do a telnet to SSH.

It should give you something like the following
Connected to localhost.
Escape character is '^]'.

run the telnet, break the connection (run telnet, pipe the input to a file, capture the PID, wait five seconds, kill the PID), parse the output from telnet, pass a boolean to make SSH run or skip to the next machine.  

Additional options, that may or may not help


Also, I don't know if it helps, but here's a link to a suggestion to another person with the same issue.

Cloud Class® Course: Amazon Web Services - Basic

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

As I don't know if my suggestion could help, I have no objections to either having it finalized, or simply removed.   I could see the information assisting someone else, but as it's incomplete, the assistance would be minor.
vinodAuthor Commented:
I got around this problem by adding a host_alive shell function that tests ssh connection in the back ground, returns 0 if success else cleans up hung ssh and returns 1. The main loop passes command via ssh only if host_alive tests OK.

Since this solution was inspired by suggestions from Bibliophage, moderator may award the points to him/her.


# Run a command on a remote host via ssh only if the remote sshd
# is actually responding to ssh connections. ssh keys are assumed
# to be already set up.

host_alive ()
    ping -c 1 -q -w 5 $host >/dev/null 2>&1;
    if [ $? -ne 0 ]; then
        echo $host does not ping;
        return 1;
        ssh root@$host /bin/true >/dev/null 2>&1 & timeo=50;  # run the test in bg with a timout of 5 secs
        while [ $timeo -gt 0 ]; do
            pid=`/bin/ps auwx | grep "ssh root@$host /bin/true" | grep -v grep | awk '{print $2}'`;
            if [ "$pid" ]; then
                usleep 100000;
        if [ "$pid" ]; then
            kill -9 $pid >/dev/null 2>&1;
            echo $host pings but does not ssh;
            return 1;

# The main loop

while read h; do
  host_alive $h && ssh $h my-command;
done < hosts.lis
No real objection.  I may have pointed him the right way, but he came up with his own solution.
PAQed with points refunded (125)

EE Admin
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now