We help IT Professionals succeed at work.

nagiosxi strange issue

sword12
sword12 asked
on
Hi Experts

we have in our environment NAgios-XI  the commercial version of Nagios

and i am monitoring file systems under red-hat box  and everything was ok no issues

yesterday i extend the filesystem from 1.3 TB to 2.9 TB and after one day i start to get critical massage from Nagios-XI


[root@nagios ~]# /usr/local/nagios/libexec/check_nrpe -H 192.168.42.11 -t 30 -c check_disk -a '-w 50% -c 80% -p /oracle/RAN/oraarch'
DISK CRITICAL - free space: /oracle/RAN/oraarch 2291546 MB (77.93% inode=100%);| /oracle/RAN/oraarch=648895MB;1471509;588603;0;2943018


but this is not true the file system status like this


/dev/mapper/oraarch-oraarch                         2.9T  632G  2.2T  22% /oracle/RAN/oraarch



so only 22% used from this file system

i am monitoring another file systems on the same box and the everything ok just this file system so it cannot be agent issue on the red hat box


any idea here



thanks
Sword
Comment
Watch Question

Hi, 


Either reload the nagios agent on the box you resized the volume for or if possible reboot the box at the earliest convenience. 


Cheers

Author

Commented:
thank you for update

this is a critical system we will not be able to reboot it at least for 2 weeks

can you help me and give me the command which reboot nagios agent service

https://assets.nagios.com/downloads/nagiosxi/docs/Restarting-Linux-Services-With-NRPE.pdf


i dont find the right command here

do you think if i reboot nagios server this will help ?


or just the agent service on the Linux box

i need the command please

Hi, 


It's the linux agent that is reporting back faulty information so my guess is that rebooting the Nagios server won't help in this case.


if you logon to the linux box lookup the nrpe process:


# ps -ef | grep -i nrpe


You should get similar output like:

     UID   PID  PPID  C    STIME TTY      TIME CMD
  nagios 10813514        1   0   Apr 26      -  7:38 /usr/local/nagios/nrpe -c /usr/local/nagios/nrpe.cfg -d


 Where in this example you would kill PID number 10813514,restart the NRPE client and show info about the newly started process:


# kill -9 10813514;/usr/local/nagios/nrpe -c /usr/local/nagios/nrpe.cfg -d;ps -ef | grep -i nrpe


As output you should get the newly started process info of NRPE.


Then run the drive check manually to see if the values turn out okay.


Cheers

Author

Commented:
sorry but

i got this

please check the attached file
nagios.jpg

Hi,


Are you sure you're even using NRPE? Try and look for 'nagios' instead and post the output here. 


Cheers

Author

Commented:
Hi

when i configure a new host on nagios-xi i use this doc to installe the agent


https://assets.nagios.com/downloads/nagiosxi/docs/Installing_The_XI_Linux_Agent.pdf

this is a photos about our nagios-xi
0.jpg
1.jpg
2.jpg
3.jpg
4.jpg

Author

Commented:
i found it

[root@svrran nagios]# ps -ef | grep inetd
root      8742     1  0 13:26 ?        00:00:00 /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid
root     10827  7478  0 13:32 pts/2    00:00:00 grep --color=auto inetd
[root@svrran nagios]#

Hi,


maybe your system uses nrpe as a service:


systemctl status nrpe


Then to restart:


systemctl restart nrpe


and re run the check.


Or if you're dealing with an NFS mount then you can try to remount the nfs share and recheck after that.


Cheers




BTW restarting Inetd might also  interrupt other services you may have running under inetd.

Duncan RoeSoftware Developer

Commented:
After restarting nrpe, what numbers are you seeing?