On Linux I have nagios running.
The person who set it up seems to have it doing a check_ping in too short a time period.
Where this box lives and the machines it checks on there is sometimes a "glitch" in the network and I get all of the ping errors from nagios.
I think it is the check_ping that is doing this - but don't know much about it.
Here is a sample of the command in the minimal.cfg file....................
==========================
==========
==========
==========
define service{
use generic-service ; Name of service template to use
host_name prod
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
contact_groups admins,dbadmins
notification_options w,u,c,r
notification_interval 960
notification_period 24x7
check_command check_ping!100.0,20%!500.0
,60%
}
==========================
==========
==========
==========
I think this is the command that sends out the PING errors/recovery messages.
I don't really know what that does but is there a way to lengthen the time that it checks?
Here is a sample email that I get...............
***** Nagios *****
Notification Type: PROBLEM
Service: PING
Host: prod
Address:
State: WARNING
Date/Time: Wed May 21 09:42:49 CDT 2008
Additional Info:
PING WARNING - Packet loss = 0%, RTA = 109.30 ms
==========================
==========
======
***** Nagios *****
Notification Type: RECOVERY
Service: PING
Host: prod
Address:
State: OK
Date/Time: Wed May 21 09:47:49 CDT 2008
Additional Info:
PING OK - Packet loss = 0%, RTA = 85.93 ms
==========================
==========
==========
As you can see it is always less than a minute or two when the recovery message is sent.
I simply want to check say in a five minutes interval if a ping doesn't come back then send a message.
I hope that all makes sense.
Thanks
Start Free Trial