Link to home
Start Free TrialLog in
Avatar of Xetroximyn
XetroximynFlag for United States of America

asked on

any good free monitoring tools for RHEL/CentOS? (i.e. email me if a disk volume is getting near full.... CPU/Load Average spiking?

any good free monitoring tools for RHEL/CentOS? (i.e. email me if a disk volume is getting near full.... CPU/Load Average spiking?
SOLUTION
Avatar of Sam P
Sam P
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Ertan SENOYAR
Ertan SENOYAR

Best software for this is NAGIOS. It is Open source
SOLUTION
Avatar of Nick Upson
Nick Upson
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Xetroximyn

ASKER

Thanks!  For Nagios... is there a free edition?  If so is it worth using? Or is it trimmed to the point of not being useful.

Mostly I just want emails when
a. Disk near full
b. CPU spiking
c. load average is spiking

On RHEL/CentOS systems

what is the easiest one of these to install and use?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I would say monit is the easiest to install & setup, 10 mins should do it
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
does nagios have steep learning curve?  which if any of these can I install with yum?
Try cacti. It will be easy. I have mentioned step link in first reply.
It has yum command and all necessary steps

Hope that help.
Redhat support recommended Performance Co-Pilot... anyone have experience with that?  thoughts?
I've been crazy busy so I've been sort of bouncing around between things.  I just spent a few minutes quickly skimming over some websites regarding cacti/nagios/etc.  

A few questions before I get too deep into any of these.  Cacti seem to really big on how it was a graphing tool.  It would be great to see graphs of system performance and all that but that's really not my main goal.  I'm not sure if "monitoring" is really the right word for what I'm looking for....  

I mainly looking for a very simple alerting tool.  Right now I just use a couple scripts to look for discs getting full and to email myself.  I was just thinking it might be better to upgrade so to speak to an actual tool that might also alert me about other issues.

Another question is that these looked sort of like you set up a central server.  Do you then install clients?  Or do you install the full thing on all the machines?  Or does the server reach out and get information remotely from the other machines?
How many systems do you have to monitor/alert?

Yes it commonly has one system that handles the querying and depending on the tool you pick, nagios commonly has a client component installed on the system being monitored that it checks when you need disk space, memory use etc.
There are tools you can use snmp polling which means only snmpd needs to be configured and enabled on each system and then you would poll the system for the OID/MIB of interest to you, and make determinations based on that.

There are many different tools some use several in conjunction with one another depending on the need.
Cacti is a visual representation tool. I've not looked at whether cacti can trigger an event when a threshold is reached.........
You could/should look at the available plugins for cacti, http://docs.cacti.net/plugins see if the threshold alert monitor is what you are looking for,...
does nagios have a steep learning curve if I just want some simple alerts?   unless it's really hard to install or use I think I might like to use that.  I've seen it used in another place I work.... (Not the administrator side but I was copied on some of the alert emails)....  it seems to be a very popular tool as well.  Am I tend to like things with clients because in my experience it's usually easier.  You just install the client and pointed at the server and you're done.  Or is it seems like for agentless stuff there's a lot more configuration on the remote machines.

I have about 5 RHEL/CentOS servers I want to monitor.  (I actually have three Windows servers as well)
I think, it is Fairly straight forward. You could monitor all your systems.
I searched "nagios quick start" and found this.  https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/quickstart-fedora.html.... I sort of followed it blindly on a fresh VM while doing other stuff before realizing it's an old version of nagios.... should I upgrade or scrap it and start over?
Upgrade. User/group do not change, the binary will install over the older binary.
after some warnings nagios would take weeks to install and set up... and trying OMD which was supposed to be easier and that didn't even work... I just gave up on the who monitoring software thing for now... too busy... will revisit later.... for now... CPU or memory will cause performance degradation and people will complain.  Disk getting full is really the only type of alert i NEED to get in my env.  Everything else I can wait for people to complain.

So I just set up a script to cron every minute and if disk is found near full send an email to a service that blows up my phone (and/or other people if desired)

the script does rely on df to work.... I recall sometime in the past either df or fuser would lock up and never return if there was some network filesystem in a weird state... can't remember which.... anyway... what do people think?  is df pretty reliable to return?   Any less reliable than any monitoring software?  

#!/bin/csh

set count = `df | grep -c "\(90\|91\|92\|93\|94\)%"`
if ( $count == "0" ) else
   df | mutt -s "disk almost full on $HOSTNAME" alert@myloc.pagerduty.com
endif


set count2 = `df | grep -c "\(95\|96\|97\|98\|99\|100\)%"`
if ( $count2 == "0" ) else
   df | mutt -s" disk DANGEROUSLY full on $HOSTNAME" alert@myloc.pagerduty.com
endif

Open in new window

df will hang if you have an NFS mount without the option to hard fail and the mount point runs into trouble.
You would avoid this issue if you set a loop such that df -k $mountpoint
is what you test where $mountpoint represents the partitions whose space availability you are interested in.
using alarm around each run can limit the hang duration if one of the partitions whose status you are interested is an NFS share and runs into issues.

Doing this way you will also have information on which partition has the issue. and ..

The problem with thresholds is that 95% utilization is a concern only when taking into account the space consumption and the available space. i.e. a 95% utilization on a 1TB drive is 50GB


I often when generating email notifications, prefer to use the direct submission to /usr/sbin/sendmail versus using an email client as you have.

you can easily create a function within shell/bash to which your processing script will pass the information and then
it will output the data to | /usr/sbin/sendmail -oi -t

The data has to be formatted in an email
To:
From:
Subject:

Message contents

....
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks!  Does it have agents?  or rely on SNMP setup?
> Thanks!  Does it have agents?  or rely on SNMP setup?

No, no agents.  It relies on SNMP, plus other standard protocols (e.g. SSH for Linux).  The linux-specific sensors are listed here (and you'll notice a lot of SNMP and SSH), plus many of the regular SNMP sensors will work for Linux too.