kapshure
asked on
Nagios "No Route to Host" error on CentOS
I've got a Nagios server (on CentOS 5), and a monitored node (also on CentOS 5). I initially had a problem with SSH key-exchange, but that has been solved, and I'm still receiving a No Route to Host.
Nagios server: 10.0.100.130
monitored node: 10.0.100.143
Yet, I can do the following from Nagios Server:
also can do this from the Nagios Server:
I can successfully ping 10.0.100.143 from Nagios server as well.
grep for the monitored node in /var/log/messages pulls this up:
am a bit confused here. any help is much appreciated
Nagios server: 10.0.100.130
monitored node: 10.0.100.143
Yet, I can do the following from Nagios Server:
/usr/local/nagios/libexec/check_tcp -H 10.0.100.143 -p 5666
TCP OK - 0.000 second response time on port 5666|time=0.000361s;0.000000;0.000000;0.000000;10.000000
also can do this from the Nagios Server:
ssh 10.0.100.143 /usr/local/nagios/libexec/check_procs
PROCS OK: 603 processes
I can successfully ping 10.0.100.143 from Nagios server as well.
grep for the monitored node in /var/log/messages pulls this up:
Nov 10 00:00:00 nagiosbox nagios: CURRENT HOST STATE: monitorednode;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.21 ms
Nov 10 00:00:00 nagiosbox nagios: CURRENT SERVICE STATE: monitorednode;Home Page;CRITICAL;HARD;1;No route to host
am a bit confused here. any help is much appreciated
ASKER
from monitored node:
ping 10.0.100.130
PING 10.0.100.130 (10.0.100.130) 56(84) bytes of data.
64 bytes from 10.0.100.130: icmp_seq=1 ttl=64 time=0.897 ms
monitored node ifconfig:
ifconfig
eth0    Link encap:Ethernet  HWaddr 00:1D:09:2C:C3:2A Â
     inet addr:10.0.100.143  Bcast:10.0.100.255  Mask:255.255.255.0
     inet6 addr: fe80::21d:9ff:fe2c:c32a/64 Scope:Link
     UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
     RX packets:151840310 errors:0 dropped:0 overruns:0 frame:0
     TX packets:20026487 errors:0 dropped:0 overruns:0 carrier:0
     collisions:0 txqueuelen:1000
     RX bytes:145578488128 (135.5 GiB)  TX bytes:2364444581 (2.2 GiB)
     Interrupt:169 Memory:f8000000-f8012800
eth0:1   Link encap:Ethernet  HWaddr 00:1D:09:2C:C3:2A Â
     inet addr:10.0.100.144  Bcast:10.0.100.255  Mask:255.255.255.0
     UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
     Interrupt:169 Memory:f8000000-f8012800
"route" from monitored node:
 route
Kernel IP routing table
Destination   Gateway     Genmask     Flags Metric Ref   Use Iface
10.0.100.0 Â Â Â * Â Â Â Â Â Â Â 255.255.255.0 Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth0
169.254.0.0 Â Â * Â Â Â Â Â Â Â 255.255.0.0 Â Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth0
default     10.0.100.1    0.0.0.0     UG   0    0     0 eth0
from Nagios box, ifconfig:
/sbin/ifconfig
eth0    Link encap:Ethernet  HWaddr 00:1C:23:C8:96:AE Â
     inet addr:10.0.100.130  Bcast:10.0.100.255  Mask:255.255.255.0
     inet6 addr: fe80::21c:23ff:fec8:96ae/6 4 Scope:Link
     UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
     RX packets:1968825668 errors:0 dropped:0 overruns:0 frame:0
     TX packets:2112609296 errors:0 dropped:0 overruns:0 carrier:0
     collisions:0 txqueuelen:1000
     RX bytes:708043528943 (659.4 GiB)  TX bytes:995965269105 (927.5 GiB)
     Interrupt:169 Memory:f8000000-f8011100
"route" from nagios box:
 /sbin/route
Kernel IP routing table
Destination   Gateway     Genmask     Flags Metric Ref   Use Iface
10.0.101.0 Â Â Â * Â Â Â Â Â Â Â 255.255.255.0 Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth1
10.0.100.0 Â Â Â * Â Â Â Â Â Â Â 255.255.255.0 Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth0
169.254.0.0 Â Â * Â Â Â Â Â Â Â 255.255.0.0 Â Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth0
default     10.0.100.1    0.0.0.0     UG   0    0     0 eth0
ping 10.0.100.130
PING 10.0.100.130 (10.0.100.130) 56(84) bytes of data.
64 bytes from 10.0.100.130: icmp_seq=1 ttl=64 time=0.897 ms
monitored node ifconfig:
ifconfig
eth0    Link encap:Ethernet  HWaddr 00:1D:09:2C:C3:2A Â
     inet addr:10.0.100.143  Bcast:10.0.100.255  Mask:255.255.255.0
     inet6 addr: fe80::21d:9ff:fe2c:c32a/64
     UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
     RX packets:151840310 errors:0 dropped:0 overruns:0 frame:0
     TX packets:20026487 errors:0 dropped:0 overruns:0 carrier:0
     collisions:0 txqueuelen:1000
     RX bytes:145578488128 (135.5 GiB)  TX bytes:2364444581 (2.2 GiB)
     Interrupt:169 Memory:f8000000-f8012800
eth0:1   Link encap:Ethernet  HWaddr 00:1D:09:2C:C3:2A Â
     inet addr:10.0.100.144  Bcast:10.0.100.255  Mask:255.255.255.0
     UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
     Interrupt:169 Memory:f8000000-f8012800
"route" from monitored node:
 route
Kernel IP routing table
Destination   Gateway     Genmask     Flags Metric Ref   Use Iface
10.0.100.0 Â Â Â * Â Â Â Â Â Â Â 255.255.255.0 Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth0
169.254.0.0 Â Â * Â Â Â Â Â Â Â 255.255.0.0 Â Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth0
default     10.0.100.1    0.0.0.0     UG   0    0     0 eth0
from Nagios box, ifconfig:
/sbin/ifconfig
eth0    Link encap:Ethernet  HWaddr 00:1C:23:C8:96:AE Â
     inet addr:10.0.100.130  Bcast:10.0.100.255  Mask:255.255.255.0
     inet6 addr: fe80::21c:23ff:fec8:96ae/6
     UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
     RX packets:1968825668 errors:0 dropped:0 overruns:0 frame:0
     TX packets:2112609296 errors:0 dropped:0 overruns:0 carrier:0
     collisions:0 txqueuelen:1000
     RX bytes:708043528943 (659.4 GiB)  TX bytes:995965269105 (927.5 GiB)
     Interrupt:169 Memory:f8000000-f8011100
"route" from nagios box:
 /sbin/route
Kernel IP routing table
Destination   Gateway     Genmask     Flags Metric Ref   Use Iface
10.0.101.0 Â Â Â * Â Â Â Â Â Â Â 255.255.255.0 Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth1
10.0.100.0 Â Â Â * Â Â Â Â Â Â Â 255.255.255.0 Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth0
169.254.0.0 Â Â * Â Â Â Â Â Â Â 255.255.0.0 Â Â U Â Â 0 Â Â Â 0 Â Â Â Â 0 eth0
default     10.0.100.1    0.0.0.0     UG   0    0     0 eth0
Well that all looks fine to me.
In the Nagios server setup - are you calling the remote server by IP or by name ?
In the Nagios server setup - are you calling the remote server by IP or by name ?
Where exactly are you seeing this error message?
ASKER
@edster9999:
i have a bucket container:
[code]/usr/local/nagios/et c/servers/ monitoredn ode.cfg:
define host{
      use linux-server ; Inherit default values from a template
    host_name monitorednode ; The name we're giving to this server
    alias monitorednode ; A longer name for the server
    address 10.0.100.143 ; IP address of the server
}
define service{
    use generic-service
    host_name            monitorednode
    service_description       Home Page
    check_command          check_http!ww2[/code]
is that what you mean?
@sangamc:
if you click on Tactical Overview, then under the Services section, you see Critical, Warning, Unknown, OK, Pending.
Under Critical, thats where it is. You can also see Service Status Totals from Service Detail, its there under the status information that says: No Route to Host.
On the Host status details main page, it shows the system as UP.
question though.....
I have active checks disabled right now,, is this error message b/c of that?
i have a bucket container:
[code]/usr/local/nagios/et
define host{
      use linux-server ; Inherit default values from a template
    host_name monitorednode ; The name we're giving to this server
    alias monitorednode ; A longer name for the server
    address 10.0.100.143 ; IP address of the server
}
define service{
    use generic-service
    host_name            monitorednode
    service_description       Home Page
    check_command          check_http!ww2[/code]
is that what you mean?
@sangamc:
if you click on Tactical Overview, then under the Services section, you see Critical, Warning, Unknown, OK, Pending.
Under Critical, thats where it is. You can also see Service Status Totals from Service Detail, its there under the status information that says: No Route to Host.
On the Host status details main page, it shows the system as UP.
question though.....
I have active checks disabled right now,, is this error message b/c of that?
Hi,
The thing is I guess "monitorednode" is not resolving to 10.0.100.143. Please try to do this over the nagios server:
ping monitoredhost
I guess it resolves to another addreess.
If this is the case try to edit your DNS if you have one or try to edit your /etc/hosts. Please make sure that :
- Your host name is not assigned to 127.0.0.1 If this is the case just correct it and add your hostname to your IP.
- Then add an entry for the monitored host such as:
10.0.100.143  monitoredhost.domain.com  monitoredhost
- Cehck your /etc/resolv.com for your dearch domain (appended after monitoredhost) to create a FQDN. such as :
nameserver  x.x.x.x
search domain.com
Save and exit and make sure that you should now be able to ping with the host with these commands.
ping monitoredhost
ping monitoredhost.domain.com
Please replace domain.com with your domain.
Cheers,
K.
The thing is I guess "monitorednode" is not resolving to 10.0.100.143. Please try to do this over the nagios server:
ping monitoredhost
I guess it resolves to another addreess.
If this is the case try to edit your DNS if you have one or try to edit your /etc/hosts. Please make sure that :
- Your host name is not assigned to 127.0.0.1 If this is the case just correct it and add your hostname to your IP.
- Then add an entry for the monitored host such as:
10.0.100.143  monitoredhost.domain.com  monitoredhost
- Cehck your /etc/resolv.com for your dearch domain (appended after monitoredhost) to create a FQDN. such as :
nameserver  x.x.x.x
search domain.com
Save and exit and make sure that you should now be able to ping with the host with these commands.
ping monitoredhost
ping monitoredhost.domain.com
Please replace domain.com with your domain.
Cheers,
K.
ASKER
@KeremE
if you look above for the ifconfig on the monitorednode, you can see there is a 10.0.100.143 on eth0, and then 10.0.100.144 on eth0:1 --- I know this is an alias on the interface, but I am not sure how it is/if affecting this scenario:
I changed the .cfg file, on the nagios server, Â for monitorednode, to both, .143, and then to .144 &Â tested.
I also tested /etc/hosts entry with .144, and .143
if I ping monitorednode(domain.com), I can get successful ICMP replies back for both IP addresses.
If I do a ./check_http -H 10.0.100.143, I get a connection refused, Unable to open TCP socket. I can't telnet to 80 on that box either.
If I do a ./check_http -H 10.0.100.144, I get:
OK - HTTP/1.1 301 Moved Permanently - 0.003 second response time |time=0.002535s;;;0.000000 size=434B;;;0
I can telnet successfully to 80 on .144
Someone mentioned that this error isn't Nagios, but with the OS. specifically stating that the "Home Page" check isn't looking at a valid host name or address vs the check_ping plugin. Problem is... I can't find any reference to "Home Page" anywhere.
I got these from /usr/local/nagios/etc/obje cts/comman ds.cfg
'check-host-alive' command definition
define command{
    command_name   check-host-alive
    command_line   $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
'check_ping' command definition
define command{
    command_name   check_ping
    command_line   $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
# 'check_http' command definition
define command{
    command_name   check_http
    command_line   $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
    }
Under /etc/rc.d/init.d/nagios I can see that I've got the paths right:
prefix="/usr/local/nagios"
exec_prefix="/usr/local/na gios"
exec="/usr/local/nagios/bi n/nagios"
config="/usr/local/nagios/ etc/nagios .cfg"
thoughts>?
if you look above for the ifconfig on the monitorednode, you can see there is a 10.0.100.143 on eth0, and then 10.0.100.144 on eth0:1 --- I know this is an alias on the interface, but I am not sure how it is/if affecting this scenario:
I changed the .cfg file, on the nagios server, Â for monitorednode, to both, .143, and then to .144 &Â tested.
I also tested /etc/hosts entry with .144, and .143
if I ping monitorednode(domain.com),
If I do a ./check_http -H 10.0.100.143, I get a connection refused, Unable to open TCP socket. I can't telnet to 80 on that box either.
If I do a ./check_http -H 10.0.100.144, I get:
OK - HTTP/1.1 301 Moved Permanently - 0.003 second response time |time=0.002535s;;;0.000000
I can telnet successfully to 80 on .144
Someone mentioned that this error isn't Nagios, but with the OS. specifically stating that the "Home Page" check isn't looking at a valid host name or address vs the check_ping plugin. Problem is... I can't find any reference to "Home Page" anywhere.
I got these from /usr/local/nagios/etc/obje
'check-host-alive' command definition
define command{
    command_name   check-host-alive
    command_line   $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
'check_ping' command definition
define command{
    command_name   check_ping
    command_line   $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
# 'check_http' command definition
define command{
    command_name   check_http
    command_line   $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
    }
Under /etc/rc.d/init.d/nagios I can see that I've got the paths right:
prefix="/usr/local/nagios"
exec_prefix="/usr/local/na
exec="/usr/local/nagios/bi
config="/usr/local/nagios/
thoughts>?
ASKER
Can I can get a bump on this? I raised the points to 500. I'm really struggling with this. Â I can supplement this:
     Â
so its the Home Page alert?
see Home Page in relevant .cfg file below
     Â
the IP listed above is correct for the host. But again, no reference to an IP defined in the command file found here: /usr/local/nagios/etc/obje cts/comman ds.cfg
Thoughts????
     Â
Nov 9 00:00:00 nagiosbox nagios: CURRENT SERVICE STATE: monitorednode;Home Page;CRITICAL;HARD;1;No route to host
Nov 10 00:00:00 nagiosbox nagios: CURRENT HOST STATE: monitorednode;UP;HARD;1;PING OK - Packet loss = 0%, RTA = 0.21 ms
Nov 10 00:00:00 nagiosbox nagios: CURRENT SERVICE STATE: monitorednode;Home Page;CRITICAL;HARD;1;No route to host
so its the Home Page alert?
see Home Page in relevant .cfg file below
     Â
define host{
use linux-server ; Inherit default values from a template
host_name monitorednode ; The name we're giving to this server
alias monitorednode ; A longer name for the server
address 10.0.100.143 ; IP address of the server
}
define service{
use generic-service
host_name monitorednode
service_description Home Page
check_command check_http!ww2
}
the IP listed above is correct for the host. But again, no reference to an IP defined in the command file found here: /usr/local/nagios/etc/obje
Thoughts????
what happens if you take away the !ww2 arg?
ASKER
I edited the .cfg file, so now it should just refresh, or do I need to do a nagios reload?
i actually tried that anyways, but I cant get it to execute
/
nagios is still running though, and monitoring
i actually tried that anyways, but I cant get it to execute
/
etc/rc.d/init.d/nagios reload
nagios dead but subsys locked
nagios is still running though, and monitoring
subsys locked usually indicates the lock file still exists. Reboot your server and see if you still get the host not reachable error message and let us know.
Ps if you are centos you should be able to use "service nagios reload" and "service nagios restart" to reload or restart the nagios services.
Ps if you are centos you should be able to use "service nagios reload" and "service nagios restart" to reload or restart the nagios services.
ASKER
I cant reboot this box. its our primary monitoring solution for the datacenter.
/sbin/service nagios status
nagios (pid 20266) is running...
then i tried reload:
bash-3.1# /sbin/service nagios reload
nagios (pid 20266) is running...
Reloading nagios: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [FAILED]
but its still running
/sbin/service nagios status
nagios (pid 20266) is running...
then i tried reload:
bash-3.1# /sbin/service nagios reload
nagios (pid 20266) is running...
Reloading nagios: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â [FAILED]
but its still running
Try this instead from: http://nagios.sourceforge.net/docs/2_0/stoprestart.html
ps axu | grep nagios
The output should look something like this:
nagios  6808  0.0  0.7  840  352  p3 S   13:44  0:00 grep nagios
nagios 11149 Â 0.2 Â 1.0 Â 868 Â 488 Â ? Â S Â Feb 27 Â 6:33 /usr/local/nagios/bin/nagi os nagios.cfg
From the program output, you will notice that Nagios was started by user nagios and is running as process id 11149.
Manually Stopping Nagios
In order to stop Nagios, use the kill command as follows...
kill 11149
Then do service nagios start
ps axu | grep nagios
The output should look something like this:
nagios  6808  0.0  0.7  840  352  p3 S   13:44  0:00 grep nagios
nagios 11149 Â 0.2 Â 1.0 Â 868 Â 488 Â ? Â S Â Feb 27 Â 6:33 /usr/local/nagios/bin/nagi
From the program output, you will notice that Nagios was started by user nagios and is running as process id 11149.
Manually Stopping Nagios
In order to stop Nagios, use the kill command as follows...
kill 11149
Then do service nagios start
ASKER
I got nagios to reload. Still see the service alert though. I need to roll out a cleaned up box, but for now, having 2 sets of binaries on here is throwing me off.
not sure what now
not sure what now
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
well the previous admin did an upgrade, and I can see two different versions, and two different program paths. he didn't rpm anything, so I suspect something was done incorrectly on the upgrade.
/usr/bin/nagios -v
3.2.1
usage: /usr/bin/nagios
/usr/local/nagios/bin/nagi os -v
3.0b7
usage: /usr/local/nagios/bin/nagi os
so I killed the process this time, and restarted nagios.
service nagios restart
i tail /var/log/messages, and  you can see that Nagios restarted, but look at the version #.. I need to find out how to make 3.2.1 restart, but that may not be the issue.
Nov 18 10:03:46 sacdcdev01 nagios: Successfully shutdown... (PID=20266)
Nov 18 10:03:54 sacdcdev01 nagios: Nagios 3.0b7 starting... (PID=10255)
/usr/bin/nagios -v
3.2.1
usage: /usr/bin/nagios
/usr/local/nagios/bin/nagi
3.0b7
usage: /usr/local/nagios/bin/nagi
so I killed the process this time, and restarted nagios.
service nagios restart
i tail /var/log/messages, and  you can see that Nagios restarted, but look at the version #.. I need to find out how to make 3.2.1 restart, but that may not be the issue.
Nov 18 10:03:46 sacdcdev01 nagios: Successfully shutdown... (PID=20266)
Nov 18 10:03:54 sacdcdev01 nagios: Nagios 3.0b7 starting... (PID=10255)
ASKER
forgot to add this:
nagios  10256   1  0 10:03 ?     00:00:00 /usr/local/nagios/bin/nagi os -d /usr/local/nagios/etc/nagi os.cfg
3.0b7 definitely running. Maybe when I get better at Nagios config/setup, I will just deploy another Nagios roll-out. This is ridics! :)
nagios  10256   1  0 10:03 ?     00:00:00 /usr/local/nagios/bin/nagi
3.0b7 definitely running. Maybe when I get better at Nagios config/setup, I will just deploy another Nagios roll-out. This is ridics! :)
ASKER
Bueller? anyone? Bueller?
ASKER
OK, I believe I have found a possible lead on this.
I changed the monitorednode.cfg to this:
took out the "Home Page" and the "check_http!ww2"
so what I get now in /var/log/messages is:
nagios: CURRENT SERVICE STATE: sacdcweb03;HTTP;CRITICAL;H ARD;3;Conn ection refused
so now connection refused troubleshooting talks about checking version differences on the Nagios server, and the monitored node where NRPE daemon is running.. sooo.. I found that the monitored node has 2.12, and the Nagios server has 2.8
I ran a "make clean" in the original directory on the monitored node, but I can still execute check_nrpe plugin and see V 2.12 status returned.
How do I correctly remove v2.12 NRPE from the monitored node? I'm suspecting that re-installing the NRPE daemon with 2.8 will possibly clean this up!
anyone>?
I changed the monitorednode.cfg to this:
define service{
use generic-service
host_name sacdcweb03
service_description HTTP
check_command check_http
}
took out the "Home Page" and the "check_http!ww2"
service_description Home Page
check_command check_http!ww2
so what I get now in /var/log/messages is:
nagios: CURRENT SERVICE STATE: sacdcweb03;HTTP;CRITICAL;H
so now connection refused troubleshooting talks about checking version differences on the Nagios server, and the monitored node where NRPE daemon is running.. sooo.. I found that the monitored node has 2.12, and the Nagios server has 2.8
I ran a "make clean" in the original directory on the monitored node, but I can still execute check_nrpe plugin and see V 2.12 status returned.
How do I correctly remove v2.12 NRPE from the monitored node? I'm suspecting that re-installing the NRPE daemon with 2.8 will possibly clean this up!
anyone>?
ASKER
OK, so I went back and I have now installed NRPE 2.8 on the monitored node, and I can verify that 2.8 is replying
but I am still getting connection refused in Nagios. Â Can anyone shed any light on this?
/usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.8
but I am still getting connection refused in Nagios. Â Can anyone shed any light on this?
ASKER
I went back and changed the monitorednode.cfg on Nagios server, to reflect back to the "Home Page" check; even though this seemed to be incorrect previously:
then i bounced nagios, and now the critical error message has cleared.
the only thing I get now, that I'm not quite sure on:
OK - HTTP/1.1 301 Moved Permanently
define service{
use generic-service
host_name monitorednode
service_description Home Page
check_command check_http!ww1
}
then i bounced nagios, and now the critical error message has cleared.
the only thing I get now, that I'm not quite sure on:
OK - HTTP/1.1 301 Moved Permanently
HTTP/1.1 301 Moved Permanently
is a problem with the config on the webserver, thats where you need to trouble shoot the error message
is a problem with the config on the webserver, thats where you need to trouble shoot the error message
ASKER
alrighty! i'll look into it. Â but for all intents and purposes, would you say that Nagios could at least be reliable on monitoring this host, even though this message is popping up? its not being classified as Warning, or Critical
thanks again for your help on this sangamc
thanks again for your help on this sangamc
Yes it is ... i had a similar situation with a third party websever. The site designer didnt think it a priority to fix the 301 redirect error so we monitored from nagios and took that into account. When the site went down due to network outage nagios would show site as down. and when it came back up, the status would return OK. which is what we were looking for.
SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
I found the binary difference in NRPE between the 2 systems. These guys just help make the problem and resolution manifest
can you do an ifconfig for both machines and show that
and maybe include a 'route' for both machines too so we see the route setup