Thanks! For Nagios... is there a free edition? If so is it worth using? Or is it trimmed to the point of not being useful.

Mostly I just want emails when
a. Disk near full
b. CPU spiking
c. load average is spiking

On RHEL/CentOS systems

what is the easiest one of these to install and use?

SOLUTION

arnold

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Nick Upson

I would say monit is the easiest to install & setup, 10 mins should do it

SOLUTION

Ertan SENOYAR

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Xetroximyn

ASKER

does nagios have steep learning curve? which if any of these can I install with yum?

Sam P

Try cacti. It will be easy. I have mentioned step link in first reply.
It has yum command and all necessary steps

Hope that help.

Xetroximyn

ASKER

Redhat support recommended Performance Co-Pilot... anyone have experience with that? thoughts?

Xetroximyn

ASKER

I've been crazy busy so I've been sort of bouncing around between things. I just spent a few minutes quickly skimming over some websites regarding cacti/nagios/etc.

A few questions before I get too deep into any of these. Cacti seem to really big on how it was a graphing tool. It would be great to see graphs of system performance and all that but that's really not my main goal. I'm not sure if "monitoring" is really the right word for what I'm looking for....

I mainly looking for a very simple alerting tool. Right now I just use a couple scripts to look for discs getting full and to email myself. I was just thinking it might be better to upgrade so to speak to an actual tool that might also alert me about other issues.

Another question is that these looked sort of like you set up a central server. Do you then install clients? Or do you install the full thing on all the machines? Or does the server reach out and get information remotely from the other machines?

arnold

How many systems do you have to monitor/alert?

Yes it commonly has one system that handles the querying and depending on the tool you pick, nagios commonly has a client component installed on the system being monitored that it checks when you need disk space, memory use etc.
There are tools you can use snmp polling which means only snmpd needs to be configured and enabled on each system and then you would poll the system for the OID/MIB of interest to you, and make determinations based on that.

There are many different tools some use several in conjunction with one another depending on the need.
Cacti is a visual representation tool. I've not looked at whether cacti can trigger an event when a threshold is reached.........

arnold

You could/should look at the available plugins for cacti, http://docs.cacti.net/plugins see if the threshold alert monitor is what you are looking for,...

Xetroximyn

ASKER

does nagios have a steep learning curve if I just want some simple alerts? unless it's really hard to install or use I think I might like to use that. I've seen it used in another place I work.... (Not the administrator side but I was copied on some of the alert emails).... it seems to be a very popular tool as well. Am I tend to like things with clients because in my experience it's usually easier. You just install the client and pointed at the server and you're done. Or is it seems like for agentless stuff there's a lot more configuration on the remote machines.

I have about 5 RHEL/CentOS servers I want to monitor. (I actually have three Windows servers as well)

arnold

I think, it is Fairly straight forward. You could monitor all your systems.

Xetroximyn

ASKER

I searched "nagios quick start" and found this. https://assets.nagios.com/downloads/nagioscore/docs/nagioscore/4/en/quickstart-fedora.html.... I sort of followed it blindly on a fresh VM while doing other stuff before realizing it's an old version of nagios.... should I upgrade or scrap it and start over?

arnold

Upgrade. User/group do not change, the binary will install over the older binary.

Xetroximyn

ASKER

after some warnings nagios would take weeks to install and set up... and trying OMD which was supposed to be easier and that didn't even work... I just gave up on the who monitoring software thing for now... too busy... will revisit later.... for now... CPU or memory will cause performance degradation and people will complain. Disk getting full is really the only type of alert i NEED to get in my env. Everything else I can wait for people to complain.

So I just set up a script to cron every minute and if disk is found near full send an email to a service that blows up my phone (and/or other people if desired)

the script does rely on df to work.... I recall sometime in the past either df or fuser would lock up and never return if there was some network filesystem in a weird state... can't remember which.... anyway... what do people think? is df pretty reliable to return? Any less reliable than any monitoring software?

#!/bin/csh

set count = `df | grep -c "\(90\|91\|92\|93\|94\)%"`
if ( $count == "0" ) else
   df | mutt -s "disk almost full on $HOSTNAME" alert@myloc.pagerduty.com
endif


set count2 = `df | grep -c "\(95\|96\|97\|98\|99\|100\)%"`
if ( $count2 == "0" ) else
   df | mutt -s" disk DANGEROUSLY full on $HOSTNAME" alert@myloc.pagerduty.com
endif

Open in new window

arnold

df will hang if you have an NFS mount without the option to hard fail and the mount point runs into trouble.
You would avoid this issue if you set a loop such that df -k $mountpoint
is what you test where $mountpoint represents the partitions whose space availability you are interested in.
using alarm around each run can limit the hang duration if one of the partitions whose status you are interested is an NFS share and runs into issues.

Doing this way you will also have information on which partition has the issue. and ..

The problem with thresholds is that 95% utilization is a concern only when taking into account the space consumption and the available space. i.e. a 95% utilization on a 1TB drive is 50GB

I often when generating email notifications, prefer to use the direct submission to /usr/sbin/sendmail versus using an email client as you have.

you can easily create a function within shell/bash to which your processing script will pass the information and then
it will output the data to | /usr/sbin/sendmail -oi -t

The data has to be formatted in an email
To:
From:
Subject:

Message contents

....

ASKER CERTIFIED SOLUTION

Kimberley from Paessler

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Xetroximyn

ASKER

Thanks! Does it have agents? or rely on SNMP setup?

Kimberley from Paessler

> Thanks! Does it have agents? or rely on SNMP setup?

No, no agents. It relies on SNMP, plus other standard protocols (e.g. SSH for Linux). The linux-specific sensors are listed here (and you'll notice a lot of SNMP and SSH), plus many of the regular SNMP sensors will work for Linux too.