asked on

Help with Postfix+Nagios Setup - CentOS

Currently rolling-out Nagios for an internal business unit, and I've got the lion-share of the setup completed, except for outbound notifications working. I've yum installed postfix, ran through setup steps over at server-world.info/en. I also modified the commands.cfg file per this URL:

http://www.infosecprojects.net/en/linuxtutorials/nagios-sendmail.html

setup info

postfix-2.3.3-2.1.el5_2
2.6.18-194.26.1.el5
CentOS 5.5

here is output from postconf -n

alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
body_checks = regexp:/etc/postfix/body_checks
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/libexec/postfix
debug_peer_level = 2
header_checks = regexp:/etc/postfix/header_checks
html_directory = no
inet_interfaces = all
mail_owner = postfix
mailq_path = /usr/bin/mailq.postfix
manpage_directory = /usr/share/man
mydestination = $myhostname, localhost.$mydomain, localhost, $mydomain
mydomain = example.com
myhostname = nagios.example.com
mynetworks = 10.0.101.0/24, 127.0.0.0/8
myorigin = $mydomain
newaliases_path = /usr/bin/newaliases.postfix
queue_directory = /var/spool/postfix
readme_directory = /usr/share/doc/postfix-2.3.3/README_FILES
sample_directory = /usr/share/doc/postfix-2.3.3/samples
sendmail_path = /usr/sbin/sendmail.postfix
setgid_group = postdrop
unknown_local_recipient_reject_code = 550

Open in new window

tail on /var/log/maillog:

Nov 19 04:20:46 pov postfix/smtpd[22095]: fatal: open database /etc/aliases.db: No such file or directory
Nov 19 04:20:47 pov postfix/master[21874]: warning: process /usr/libexec/postfix/smtpd pid 22095 exit status 1
Nov 19 04:20:47 pov postfix/master[21874]: warning: /usr/libexec/postfix/smtpd: bad command startup -- throttling

Open in new window

tail /var/log/messages

nagios: Warning: Attempting to execute the command "/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: monitoredbox\nState: DOWN\nAddress: 10.0.101.221\nInfo: CRITICAL - Host Unreachable (10.0.101.221)\n\nDate/Time: Fri Nov 19 04:24:44 PST 2010\n" | /bin/mail -s "** PROBLEM Host Alert: zimbra is DOWN **" 5555555555@tmomail.net" resulted in a return code of 127.  Make sure the script or binary you are trying to execute actually exists...
Nov 19 04:41:04 pov nagios: Auto-save of retention data completed successfully.

Open in new window

postfix is running:

ps -ef | grep postfix
root     21874     1  0 04:06 ?        00:00:00 /usr/libexec/postfix/master
postfix  21876 21874  0 04:06 ?        00:00:00 pickup -l -t fifo -u
postfix  21877 21874  0 04:06 ?        00:00:00 qmgr -l -t fifo -u
root     22172 21934  0 04:26 pts/0    00:00:00 grep postfix

Open in new window

I can also telnet to the localhost via 25, and to the public IP from my workstation, but each time I telnet, it says connected, but EHLO, HELO commands generate no response from the server. I'm focusing on researching the maillog errors right now, if anyone could lend a hand that'd be great

ASKER CERTIFIED SOLUTION

LunarNRG

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

LunarNRG

The newaliases command creates /etc/aliases.db from the contents of /etc/aliases. Without a /etc/aliases.db file present, postfix will exhibit the other behavior you mentioned (no response to HELO or EHLO) - btw.

kapshure

ASKER

@LunarNRG

you're right, mailx wasn't installed; so I've done that.

I had already done the newaliases command, but I did it again.

Now I've got this in maillog:

lost connection after EHLO from firewall.hostcompany.com[12.34.56.78] < our office IP
postfix/smtpd[23150]: disconnect from firewall.hostcompany.com[12.34.56.78] < our office IP

Open in new window

LunarNRG

Are you running your telnet test from localhost? For example,

[user@host ~] telnet localhost 25

Open in new window

kapshure

ASKER

seems that the mailx and newaliases fixed the problem. I just got an alert from nagios to my phone :)

Now want to see a few more alerts come through and looks like maybe my setup for notifications is complete!

one question I have.

I had sendmail.postfix configured as the only MTA, so how come mailx was required? Could I have changed nagios.cfg to sync up w/ postfix instead of mailx?

thanks again.

LunarNRG

No problem, glad to hear it.

You're right you don't really need mailx, but the nagios defaults for host-notify-by-email, service-notify-by-email, etc. all use /usr/bin/mail, I believe. You could use /usr/sbin/sendmail for the same purpose, but you'd have to create your own macros.

Nagios just calls the command you specify in config, and in your case /usr/bin/mail was used, as in (from the previous warning):

/usr/bin/printf "%b" "***** Nagios *****
<snip>
Date/Time: Fri Nov 19 04:24:44 PST 2010\n" | /bin/mail -s "** PROBLEM Host Alert: zimbra is DOWN **" 5555555555@tmomail.net" resulted in a return code of 127

Open in new window

I just now noticed that you mentioned following these instructions change the nagios default:
http://www.infosecprojects.net/en/linuxtutorials/nagios-sendmail.html

... so it would seem your modification did not take, you may wish to review your settings and make sure they match the tutorial. Perhaps you need to restart the nagios service? Not sure. If you convince nagios to use /usr/sbin/sendmail then you can remove the mailx package.

HTH,
Marty

kapshure

ASKER

I'm getting the alerts now; just not as timely as they should be. seems that the UP alert comes back way faster than the DOWN. Or sometimes vice versa. May need to tweak some time-thresholds in nagios.

seems like for now though, that the mail part is working, as we're getting service alerts to (2) phones now, and to an email on a different mail server in a different domain.

Thanks again!