Xetroximyn
asked on
any good free monitoring tools for RHEL/CentOS? (i.e. email me if a disk volume is getting near full.... CPU/Load Average spiking?
any good free monitoring tools for RHEL/CentOS? (i.e. email me if a disk volume is getting near full.... CPU/Load Average spiking?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Best software for this is NAGIOS. It is Open source
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks! For Nagios... is there a free edition? If so is it worth using? Or is it trimmed to the point of not being useful.
Mostly I just want emails when
a. Disk near full
b. CPU spiking
c. load average is spiking
On RHEL/CentOS systems
what is the easiest one of these to install and use?
Mostly I just want emails when
a. Disk near full
b. CPU spiking
c. load average is spiking
On RHEL/CentOS systems
what is the easiest one of these to install and use?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I would say monit is the easiest to install & setup, 10 mins should do it
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
does nagios have steep learning curve? which if any of these can I install with yum?
Try cacti. It will be easy. I have mentioned step link in first reply.
It has yum command and all necessary steps
Hope that help.
It has yum command and all necessary steps
Hope that help.
ASKER
Redhat support recommended Performance Co-Pilot... anyone have experience with that? thoughts?
ASKER
I've been crazy busy so I've been sort of bouncing around between things. I just spent a few minutes quickly skimming over some websites regarding cacti/nagios/etc.
A few questions before I get too deep into any of these. Cacti seem to really big on how it was a graphing tool. It would be great to see graphs of system performance and all that but that's really not my main goal. I'm not sure if "monitoring" is really the right word for what I'm looking for....
I mainly looking for a very simple alerting tool. Right now I just use a couple scripts to look for discs getting full and to email myself. I was just thinking it might be better to upgrade so to speak to an actual tool that might also alert me about other issues.
Another question is that these looked sort of like you set up a central server. Do you then install clients? Or do you install the full thing on all the machines? Or does the server reach out and get information remotely from the other machines?
A few questions before I get too deep into any of these. Cacti seem to really big on how it was a graphing tool. It would be great to see graphs of system performance and all that but that's really not my main goal. I'm not sure if "monitoring" is really the right word for what I'm looking for....
I mainly looking for a very simple alerting tool. Right now I just use a couple scripts to look for discs getting full and to email myself. I was just thinking it might be better to upgrade so to speak to an actual tool that might also alert me about other issues.
Another question is that these looked sort of like you set up a central server. Do you then install clients? Or do you install the full thing on all the machines? Or does the server reach out and get information remotely from the other machines?
How many systems do you have to monitor/alert?
Yes it commonly has one system that handles the querying and depending on the tool you pick, nagios commonly has a client component installed on the system being monitored that it checks when you need disk space, memory use etc.
There are tools you can use snmp polling which means only snmpd needs to be configured and enabled on each system and then you would poll the system for the OID/MIB of interest to you, and make determinations based on that.
There are many different tools some use several in conjunction with one another depending on the need.
Cacti is a visual representation tool. I've not looked at whether cacti can trigger an event when a threshold is reached.........
Yes it commonly has one system that handles the querying and depending on the tool you pick, nagios commonly has a client component installed on the system being monitored that it checks when you need disk space, memory use etc.
There are tools you can use snmp polling which means only snmpd needs to be configured and enabled on each system and then you would poll the system for the OID/MIB of interest to you, and make determinations based on that.
There are many different tools some use several in conjunction with one another depending on the need.
Cacti is a visual representation tool. I've not looked at whether cacti can trigger an event when a threshold is reached.........
You could/should look at the available plugins for cacti, http://docs.cacti.net/plugins see if the threshold alert monitor is what you are looking for,...
ASKER
does nagios have a steep learning curve if I just want some simple alerts? unless it's really hard to install or use I think I might like to use that. I've seen it used in another place I work.... (Not the administrator side but I was copied on some of the alert emails).... it seems to be a very popular tool as well. Am I tend to like things with clients because in my experience it's usually easier. You just install the client and pointed at the server and you're done. Or is it seems like for agentless stuff there's a lot more configuration on the remote machines.
I have about 5 RHEL/CentOS servers I want to monitor. (I actually have three Windows servers as well)
I have about 5 RHEL/CentOS servers I want to monitor. (I actually have three Windows servers as well)
I think, it is Fairly straight forward. You could monitor all your systems.
ASKER
I searched "nagios quick start" and found this. https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/quickstart-fedora.html.... I sort of followed it blindly on a fresh VM while doing other stuff before realizing it's an old version of nagios.... should I upgrade or scrap it and start over?
Upgrade. User/group do not change, the binary will install over the older binary.
ASKER
after some warnings nagios would take weeks to install and set up... and trying OMD which was supposed to be easier and that didn't even work... I just gave up on the who monitoring software thing for now... too busy... will revisit later.... for now... CPU or memory will cause performance degradation and people will complain. Disk getting full is really the only type of alert i NEED to get in my env. Everything else I can wait for people to complain.
So I just set up a script to cron every minute and if disk is found near full send an email to a service that blows up my phone (and/or other people if desired)
the script does rely on df to work.... I recall sometime in the past either df or fuser would lock up and never return if there was some network filesystem in a weird state... can't remember which.... anyway... what do people think? is df pretty reliable to return? Any less reliable than any monitoring software?
So I just set up a script to cron every minute and if disk is found near full send an email to a service that blows up my phone (and/or other people if desired)
the script does rely on df to work.... I recall sometime in the past either df or fuser would lock up and never return if there was some network filesystem in a weird state... can't remember which.... anyway... what do people think? is df pretty reliable to return? Any less reliable than any monitoring software?
#!/bin/csh
set count = `df | grep -c "\(90\|91\|92\|93\|94\)%"`
if ( $count == "0" ) else
df | mutt -s "disk almost full on $HOSTNAME" alert@myloc.pagerduty.com
endif
set count2 = `df | grep -c "\(95\|96\|97\|98\|99\|100\)%"`
if ( $count2 == "0" ) else
df | mutt -s" disk DANGEROUSLY full on $HOSTNAME" alert@myloc.pagerduty.com
endif
df will hang if you have an NFS mount without the option to hard fail and the mount point runs into trouble.
You would avoid this issue if you set a loop such that df -k $mountpoint
is what you test where $mountpoint represents the partitions whose space availability you are interested in.
using alarm around each run can limit the hang duration if one of the partitions whose status you are interested is an NFS share and runs into issues.
Doing this way you will also have information on which partition has the issue. and ..
The problem with thresholds is that 95% utilization is a concern only when taking into account the space consumption and the available space. i.e. a 95% utilization on a 1TB drive is 50GB
I often when generating email notifications, prefer to use the direct submission to /usr/sbin/sendmail versus using an email client as you have.
you can easily create a function within shell/bash to which your processing script will pass the information and then
it will output the data to | /usr/sbin/sendmail -oi -t
The data has to be formatted in an email
To:
From:
Subject:
Message contents
....
You would avoid this issue if you set a loop such that df -k $mountpoint
is what you test where $mountpoint represents the partitions whose space availability you are interested in.
using alarm around each run can limit the hang duration if one of the partitions whose status you are interested is an NFS share and runs into issues.
Doing this way you will also have information on which partition has the issue. and ..
The problem with thresholds is that 95% utilization is a concern only when taking into account the space consumption and the available space. i.e. a 95% utilization on a 1TB drive is 50GB
I often when generating email notifications, prefer to use the direct submission to /usr/sbin/sendmail versus using an email client as you have.
you can easily create a function within shell/bash to which your processing script will pass the information and then
it will output the data to | /usr/sbin/sendmail -oi -t
The data has to be formatted in an email
To:
From:
Subject:
Message contents
....
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks! Does it have agents? or rely on SNMP setup?
> Thanks! Does it have agents? or rely on SNMP setup?
No, no agents. It relies on SNMP, plus other standard protocols (e.g. SSH for Linux). The linux-specific sensors are listed here (and you'll notice a lot of SNMP and SSH), plus many of the regular SNMP sensors will work for Linux too.
No, no agents. It relies on SNMP, plus other standard protocols (e.g. SSH for Linux). The linux-specific sensors are listed here (and you'll notice a lot of SNMP and SSH), plus many of the regular SNMP sensors will work for Linux too.