?
Solved

CentOS:  monitor services

Posted on 2011-05-11
48
Medium Priority
?
908 Views
Last Modified: 2012-05-11
Hi All,

I'm running a CentOS server and want to ensure the following services are running all the time.

mysqld
named
postfix

What is the best way to monitor them, restart them/reboot the server if they stop running and email me?
0
Comment
Question by:detox1978
  • 18
  • 16
  • 11
  • +1
48 Comments
 
LVL 17

Expert Comment

by:sweetfa2
ID: 35741436
nagios


Otherwise you can have cron job that runs every minute to check the status of them and do the emailing that way.
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35741449
You can write a simple script that would check the services every minute or you can use some monitoring tool that would do everything for you but would be  harder to implement.  If you have just one server, it makes sense just to have a small script.  If you have many servers, you can consider some monitoring tool like Nagios or something much simpler like Xymon

http://www.xymon.com/xymon/help/about.html
0
 
LVL 17

Expert Comment

by:sweetfa2
ID: 35741493
#!/bin/bash
#
#  This Nagios plugin was created to check the status of a service
#

PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION="1.0.0"

. $PROGPATH/utils.sh

usage()
{
        echo "Usage ${PROGNAME} service"
        exit $STATE_UNKNOWN
}

if [ $# -ne 1 ];
then
        usage
fi
service=$1

status=`sudo -u root /sbin/service $service status  2>>/tmp/errors | sed -n '$p' | sed 's/^.*\W//'`

case $status in
        running)
                echo "OK : Service is running"
                exit $STATE_OK
                ;;
        unused)
                echo "WARNING : Service is unused"
                exit $STATE_WARNING
                ;;
        dead)
                echo "CRITICAL : Service is dead"
                exit $STATE_CRITICAL
                ;;
        *)
                echo "Unknown: Service is $status"
                exit $STATE_UNKNOWN
                ;;
esac

Open in new window

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 17

Expert Comment

by:sweetfa2
ID: 35741510
The script above works in Nagios.  It is simple enough to modify it to work straight out of cron.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35741878
nagios or similar program will be your answer. Nagios just checks various services over plugins. It has plugins for checking MySQL, Postfix and named. Then it will update a a webserver so that you can monitor the service status over the web. It can also notify you if some of the services are stopped and when they restore them about the service been restored.
0
 
LVL 2

Author Comment

by:detox1978
ID: 35742191
I'd prefer not to install addition software.

Can someone help me write a cron script that runs the following commands;

service mysqld status
service named status
service postfix status

searching for the word running, and if it's not found restart the service and email an email?
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35742323
Hi,

This script could be run from within a cron tast. It will not print anything if all services are running but it will send the stopped service name in an email  to the crion task runner.

#!/bin/bash
for i in  "mysqld" "named" "postfix"; do
  /sbin/service $i status | grep "stop"
done

Open in new window


Cheers,
K.
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35742336
Try this script:

#!/bin/bash

SERVICES="mysqld named postfix"
EMAIL_ADDR="admin@mydomain.com second@yahoo.com"
FAILED_SERVICES=""

for service in $SERVICES
do
   if (( $(netstat -npl | grep -c $service) == 0 ))
   then
        FAILED_SERVICES=$FAILED_SERVICES" $services"
    fi
done

if (( $#FAILED_SERVICES !=0 ))
then
     echo "Services failed: $FAILED_SERVICES" | mail -s "Services NOT running" $EMAIL_ADDR
fi

Open in new window

0
 
LVL 31

Expert Comment

by:farzanj
ID: 35742426
Sorry, there's a typo in line 11, it should be
   FAILED_SERVICES=$FAILED_SERVICES" $service"
0
 
LVL 2

Author Comment

by:detox1978
ID: 35742520
That is what i'm looking for farzanj.

What do i save it as and how do I get it to auto restart them?
0
 
LVL 31

Assisted Solution

by:farzanj
farzanj earned 600 total points
ID: 35742584
Here is the modified version:

Save it as /root/monitor.sh

 
#!/bin/bash

SERVICES="mysqld named postfix"
EMAIL_ADDR="admin@mydomain.com second@yahoo.com"
FAILED_SERVICES=""

#Finding services that are not running
for service in $SERVICES
do
   if (( $(netstat -npl | grep -c $service) == 0 ))
   then
        FAILED_SERVICES=$FAILED_SERVICES" $service"
        #Attempt to restart the service
        service $service restart
    fi
done

#Emailing about failure
if (( $#FAILED_SERVICES !=0 ))
then
     echo "Services failed: $FAILED_SERVICES" | mail -s "Services NOT running" $EMAIL_ADDR
fi

Open in new window


Then enable in crontab as root

crontab -e
 
*/2 * * * * /root/monitor.sh

Open in new window


You can keep the location of the script as you deem reasonable.


Do you need any further modifications?
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35742618
How about this?

#!/bin/bash
RECIPIENTS="rec1@domain.com rec2@domain.com"

for i in  "mysqld" "named" "postfix"; do
  if /sbin/service $i status 2>/dev/null | grep "stop" > /dev/null
  then
     echo -n $1 stopped attemping restart:
     if /sbin/service $i start 2>&1 > /dev/null
     then
        echo $i restarted
     else
        echo $i could not restart!!
     fi
  fi
done > /tmp/mail

( echo "Problems"; cat /tmp/mail ) | mail -s "Service problems detected !!!" $RECIPIENTS

Open in new window

0
 
LVL 2

Author Comment

by:detox1978
ID: 35742732
For some reason it keeps restarting postfix and also returns an error;


Shutting down postfix:                                     [  OK  ]
Starting postfix:                                          [  OK  ]
/root/monitorservices.sh: line 19: ((: 0FAILED_SERVICES: value too great for base (error token is "0FAILED_SERVICES")
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35742764
Ok.

In line 19, please change (( )) with [[ ]]

So you should have

if [[ $#FAILED_SERVICES !=0 ]]

Second, please show me the output of command

netstat -antpl | grep 25
0
 
LVL 2

Author Comment

by:detox1978
ID: 35742787
same issue;

#netstat -antpl | grep 25
tcp        0      0 0.0.0.0:25                  0.0.0.0:*                   LISTEN      29728/master


KeremE, yours sends an email every time.  I need it to only email when a service is not running.
0
 
LVL 2

Author Comment

by:detox1978
ID: 35742816
I think it restarts the postfix service because it is called master in the services list
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35742830
Sorry about that.  Didn't realize that postfix doesn't return its name.  Try this one.  Sorry for the inconvenience.
#!/bin/bash

SERVICES="mysqld named postfix"
EMAIL_ADDR="admin@mydomain.com second@yahoo.com"
FAILED_SERVICES=""

#Finding services that are not running
for service in $SERVICES
do
   if (( $(service $service status | grep -c running) == 0 ))
   then
        FAILED_SERVICES=$FAILED_SERVICES" $service"
        #Attempt to restart the service
        service $service restart
    fi
done

#Emailing about failure
if [[ ${#FAILED_SERVICES} !=0 ]]
then
     echo "Services failed: $FAILED_SERVICES" | mail -s "Services NOT running" $EMAIL_ADDR
fi

Open in new window

0
 
LVL 2

Author Comment

by:detox1978
ID: 35742860
thanks that has fixed the postfix restarting.  But it is still erroring on line 19.


/root/monitorservices.sh: line 19: conditional binary operator expected
/root/monitorservices.sh: line 19: syntax error near `!=0'
/root/monitorservices.sh: line 19: `if [[ ${#FAILED_SERVICES} !=0 ]]'
0
 
LVL 2

Author Comment

by:detox1978
ID: 35742890
I've moved where i sends the email and it works great.  thanks. :-)
#!/bin/bash

SERVICES="mysqld named postfix"
EMAIL_ADDR="admin@mydomain.com second@yahoo.com"
FAILED_SERVICES=""

#Finding services that are not running
for service in $SERVICES
do
   if (( $(service $service status | grep -c running) == 0 ))
   then
        FAILED_SERVICES=$FAILED_SERVICES" $service"
        #Attempt to restart the service
        service $service restart
        echo "Services failed: $FAILED_SERVICES" | mail -s "Services NOT running" $EMAIL_ADDR
    fi
done

Open in new window

0
 
LVL 31

Expert Comment

by:farzanj
ID: 35742909
Welcome :)  Glad it worked for you.
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35742928
Sorry again.  The reason it was failing because it should have been
if [[ ${#FAILED_SERVICES} != 0 ]]

instead of
if [[ ${#FAILED_SERVICES} !=0 ]]

Yes, space before 0

You can use any of this code.  Anything else?
0
 
LVL 2

Author Comment

by:detox1978
ID: 35743048
When i manually run it, it works perfect.

But when i do it via the cron job it passes the service names incorrectly and get emails like this;

Services failed:  mysqld named postfix


even though the services are running.  I guess this is because it is looking for a service called "mysqld named postfix" rather than parsing them one at a time?
0
 
LVL 2

Author Comment

by:detox1978
ID: 35743058
It is sending three emails with the body of;

1# Services failed:  named mysqld postfix
2# Services failed:  named mysqld
3# Services failed:  named

I'm guess its not splitting the services names correctly
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35743132
Here are the problems.

First, email:  The reason I had a separate section for emails is because I wanted to finalize the status of all the services and then send one consolidated email.

Second:
>  I guess this is because it is looking for a service called "mysqld named postfix" rather than parsing them one at a time?
No, when it runs without cron, it should run with cron EXCEPT for perhaps the path issues and permission issues.  So let me try one more time.  See how it goes.
 
#!/bin/bash
PATH=$PATH:/sbin
SERVICES="mysqld named postfix"
EMAIL_ADDR="admin@mydomain.com second@yahoo.com"
FAILED_SERVICES=""

#Finding services that are not running
for service in $SERVICES
do
   if (( $(service $service status | grep -c running) == 0 ))
   then
        #Attempt to restart the service
        service $service restart
        if [[ $? != 0 ]]
        then
            FAILED_SERVICES=$FAILED_SERVICES" $service"
        fi
    fi
done

#Emailing about failure
if [[ ${#FAILED_SERVICES} != 0 ]]
then
     echo "Services failed: $FAILED_SERVICES" | mail -s "Services NOT running" $EMAIL_ADDR
fi

Open in new window


Did you cron it as root user?  Please cron as root and please cron it as follows:
 
*/3 * * * * /root/monitorservices.sh > /root/error.txt 2>&1

Open in new window


If it doesn't work out, I want to see the contents of the error file.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743174
Why would you work this hard to correct a non working script ?  I've already sent you a simpler script which works 100% ???

 
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743175
If you don't care for a working script what are you trying to accomplish ?
0
 
LVL 2

Author Comment

by:detox1978
ID: 35743176
you didn't reply;

"KeremE, yours sends an email every time.  I need it to only email when a service is not running."
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743191
Here's your code with e-mail only when tehre's a service failure:

#!/bin/bash
RECIPIENTS="kerem@sibernet.com.tr"

for i in  "mysqld" "named" "postfix"; do
  if /sbin/service $i status 2>/dev/null | grep "stop" > /dev/null  
  then      
     echo -n $i stopped attemping restart:
     if /sbin/service $i start 2>&1 > /dev/null
     then 
        echo $i restarted
     else
        echo $i could not restart!!
     fi
  fi   
done > /tmp/mail

if [ -s /tmp/mail ]
then 
( echo "Problems:"; cat /tmp/mail ) | mail -s "Service problems detected !!!" $RECIPIENTS
fi

Open in new window

0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743217
Some shorter version would be:

#!/bin/bash
RECIPIENTS="rec1@example.com rec2@example.com"

for i in  "mysqld" "postfix"; do        
  if /sbin/service $i status 2>/dev/null | grep "stop" > /dev/null  
  then      
     echo -n $i "stopped attemping restart: "
     if /sbin/service $i start 2>&1 > /dev/null
     then 
        echo $i restarted
     else
        echo $i failed to restart!!
     fi
  fi   
done > /tmp/mail

test  -s /tmp/mail && ( echo "Problems:"; cat /tmp/mail ) | mail -s "Service problems detected !!!" $RECIPIENTS

Open in new window

0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743335
we can further eliminate the grep:

#!/bin/bash
RECIPIENTS="rec1@example.com rec2@example.com"

for i in  "mysqld" "named" "postfix"; do
  if ! /sbin/service $i status 2>&1 >/dev/null  
  then      
     echo -n $i " service stopped attemping restart: "
     if /sbin/service $i start 2>&1 >/dev/null 
     then 
        echo $i restarted
     else
        echo $i failed to restart!!
     fi
  fi   
done  > /tmp/mail 

test -s /tmp/mail && ( echo "Problems:"; /bin/cat /tmp/mail ) | /bin/mail -s "Service problems detected !!!" $RECIPIENTS

Open in new window

0
 
LVL 2

Author Comment

by:detox1978
ID: 35743388
That works.  however the named service always returns the following message;

rndc: no server specified and no default

so it will alway email me.  Is there a way around this?
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35743399
Is it my code or KeremE's

Did you try out my last code in cron?

Did you try KeremE's code in cron?
0
 
LVL 2

Author Comment

by:detox1978
ID: 35743407
That was for KeremE.

Yours errored again with the same issue.
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35743418
Do you have output of the error file?  Did it error out in cron or without cron?
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743428
> That works.  however the named service always returns the following message;

> rndc: no server specified and no default

> so it will alway email me.  Is there a way around this?

it shouldn't are you sure that you did not omit " 2>&1 >/dev/null " after each service commnad ??
Can you recopy the last version and retry.. These are to contain this rndc error..

0
 
LVL 2

Author Comment

by:detox1978
ID: 35743444
KeremE, recopied it and still get an email everytime it's run.  The content of the email is "Problems: rndc: no server specified and no default"

farzanj, I'll re run it and check for the error file.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743471
In fact the rndc error is output to the stderr and the command output to stdout. This is why I redirect them both to /dev/null and I use only the exit status instead of depending on the text inside it. This is why it should not happen. Please try my latest version HERE.



0
 
LVL 2

Author Comment

by:detox1978
ID: 35743482
That code sends an email every time it is run.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743494
Are you running the file as root ?? If you've run the file as root the first time from the command line as root then you've started it from cron as another user then you might not able to override the file. Please try to remove the file (/tmp/mail) manually before the cron job runs.

0
 
LVL 2

Author Comment

by:detox1978
ID: 35743514
KeremE, yes i am running it as root.  I've not tested it as a cron job because it emails everytime.

farzanj, it now isn't sending emails.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743533
Ok but the thing is are you sure your named server could be started from the command line ?? It seems that named configuration missing rndc fie which should be iin /etc/rndc.conf.

0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743551
I mean your named.conf is missing the rndc info. Please check your named.conf. It should be referencing a non-existing  rndc key file (wihich is /etc/rndc.key) by default. But it seesm that you2ve reconfigured your named.conf.

will you replace the

#/bin/bash

in the first line with

#!/bin/bash -x

and please post the output here.
0
 
LVL 2

Author Comment

by:detox1978
ID: 35743567
+ RECIPIENTS=detox1978@yahoo.co.uk
+ for i in '"mysqld"' '"named"' '"postfix"'
+ /sbin/service mysqld status
+ for i in '"mysqld"' '"named"' '"postfix"'
+ /sbin/service named status
+ for i in '"mysqld"' '"named"' '"postfix"'
+ /sbin/service postfix status
+ test -s /tmp/mail
+ /bin/mail -s 'Service problems detected !!!' detox1978@yahoo.co.uk
+ echo Problems:
+ /bin/cat /tmp/mail
0
 
LVL 30

Accepted Solution

by:
Kerem ERSOY earned 1400 total points
ID: 35743571
Ok I've got it there was a problem with redirection. I've redirected stderr before I've redirected stdout.. The ordering problem. Please use this code instead:

#!/bin/bash
RECIPIENTS="rec1@example.com rec2@example.com"

for i in  "mysqld" "named" "postfix"; do
  if ! /sbin/service $i status >/dev/null 2>&1 
  then      
     echo -n $i " service stopped attemping restart: "
     if /sbin/service $i start >/dev/null 2>&1 
     then 
        echo $i restarted
     else
        echo $i failed to restart!!
     fi
  fi   
done  > /tmp/mail 

test -s /tmp/mail && ( echo "Problems:"; /bin/cat /tmp/mail ) | /bin/mail -s "Service problems detected !!!" $RECIPIENTS

Open in new window

0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743584
But you've still got the rndc error. Please fix it using the advice here:

http://serverfault.com/questions/231749/dns-on-redhat-rdnc-no-server-specified-and-no-default

0
 
LVL 2

Author Comment

by:detox1978
ID: 35743630
The service monitor now works.


i tried to fix the rcdn, but it now says "rndc: decode base64 secret: bad base64 encoding"
0
 
LVL 2

Author Comment

by:detox1978
ID: 35743639
its ok, i sorted it, I'd made a typo.
0
 
LVL 30

Expert Comment

by:Kerem ERSOY
ID: 35743644
:)

You know this is kind of off-topic but:

http://forum.parallels.com/showthread.php?t=87083

If you need further assistance I strongly suggest you to close this question and start another tread.

Cheers,
K.
 
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In order for businesses to be compliant with certain information security laws in some countries, you need to be able to prove that a user (which user it was becomes important to the business to take action against the user after an event has occurr…
If you use Debian 6 Squeeze and you are tired of looking at the childish graphical GDM login screen that is used by default, here's an easy way to change it. If you've already tried to change it you've probably discovered that none of the old met…
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
If you're a developer or IT admin, you’re probably tasked with managing multiple websites, servers, applications, and levels of security on a daily basis. While this can be extremely time consuming, it can also be frustrating when systems aren't wor…
Suggested Courses

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question