asked on

Postfix problems after moving box behind firewall

Postfix version 2.1.0

I set up a postfix machine as an SMTP-AUTH mail relay server. It does not handle any local delivery nor accept mail locally. It's only purpose is to spit out emails sent locally and to give smtp relay access for one domain and a few individual users (like road warriors and clients I have who do not have an outgoing smtp server on their network for whatever reason).

Everything worked fine when the box was visible to the world with it's own ip. I recently replaced the SDSL line the domain used to run on with a T1. When I made the switch, I decided it would be a good time to tidy up the domain and move all the servers behind firewalls. My DMZ now is the "green" side of a red-green IPCop box and all of the port forwarding seems to be working and configured correctly.

At first, no mail was working on the outgoing or the incoming mail servers. After changing some DNS records, IP addresses in configuration files and /etc/postfix/main.cf to contain the proxy_interfaces= line to the world-visible ip on the firewall, I can send fine locally. My relay users are still broken however and I don't know where to look as according to the logs, the relay users connect through the firewall just fine, establish a TLS session and authenticate fine, and seem to have their messages get sent. postqueue -p shows and empty queue after the message seems to get sent.

That's as far as it goes. None of my roadwarrior test messages ever went through. I sent several to an incoming mail server I have access to, and there were never any connections from the postfix machine in questions. No firewall hits, no errors, no bounces, nothing in logs that indicate an error or problem...it's as if the machine in question sends the mail to /dev/null and never tried to send it to the proper destination.

If I compose a mail from the command line via an ssh session, it sends fine. Nothing different appears in the logs comparing the successful local test sends and the unsuccessful road warrior test sends. They all show a log entry similar to: (nothing has been censored, brian@savebabies.org is one of mine and on a domain I can access all logs on, and mail5.fixnix.net is the machine I'm having trouble with)

Oct 18 13:53:47 mail5 postfix/pipe[30345]: 96651D8121: to=<brian@savebabies.org>, relay=spamfilter, delay=1, status=sent (mail5.fixnix.net)
Oct 18 13:53:47 mail5 postfix/qmgr[30334]: 96651D8121: removed

I'm using cyrus-sasl2 for the road warrior authentication, and since this is a small, low-traffic domain with only about 10 or so road warrior users, I am not using a MySQL or similar database for users. The road warrior users are maintained in /etc/passwd as normal users are, except with a shell of /bin/false. Authentication is working fine.

My next step will be running tcpdump to see if any outbound connection is attempted to hit the destination mail server, but I suspect it is not or I should have gotten some bounces back.

I'm hoping I just had a small oversight or incorrect interpretation of a config file entry. Any ideas?

rid

Is the IP that the Postfix server used to have "taken over" by the firewall, or are you using a new IP for that? Have DNS records been updated accordingly?
/RID

fixnix

ASKER

The postfix server in question, "mail5", used to have an IP of 216.184.199.37 on the old SDSL line. When I switched to the T1, I also changed ISP's and received a new block of IP's so mail5 now has a new IP of 209.137.176.165 (firewall address w/ port 25 forwarded to the internal address of 192.168.10.165).

This box is also the master nameserver for a couple domains and I thought that may be the problem. I am not running any internal nameserver for the DMZ. Instead, since there are only 5 IP addresses in use, I made a hosts file for them. When mail5 tries to lookup the MX record for savebabies.org it gets the external address for mail.savebabies.org...which is on the outside of the firewall. I have a mail.savebabies.org entry in /etc/hosts with it's matching 192.168.10.166 entry, but I think the hosts file is basically just A records, so an MX lookup may still hit the nameserver which will give an address it cannot hit. Sounded like a good theory, but even if that's the case, mail should still go out when I send to any other external domain (like my work account or yahoo, etc), but mail to those seem to go nowhere as well.

The old ip from the SDSL line is actually still live, and I have an old ultra sitting on that line taking every old IP address and forwarding the appropriate IP:port hits to the correct new address:port in case anyone is still resolving the old addresses. The port forwarding works fine.

I did a tcpdump while I sent a test message as a road warrior (email sent from a remote location, tcpdump running locally on mail5 while shelled in) and I see the connection come in from my road warrior to send the test message and all looks normal. I do not, however, see any attempt at a connection to the destination mailserver (test message was Cc'd to both savebabies.org which may be broken from being hit on the LAN as per 2 paragraphs up, and also to a domain that is not located on any of my networks.) I would have at least expected to see an MX lookup for malvernconsulting.com and an smtp connection to it for final delivery of a test message sent to a valid address in that domain. There was nothing.

I just did another tcpdump session and sent a test mail locally from the command line to the same 2 addresses I tried with the road warrior client, and they went through as expected, with tcpdump showing an immediate MX request, response, smtp connect, etc....the way it should be. Apparently, there is not an issue with not running a separate internal nameserver, either, as the test message cc'd to a domain on the same private network segment has been delivered.

The symptoms of my problem are fairly straightforward:

Mail sent locally from mail5.fixnix.net to anywhere works as expected.
Mail sent from outside mail5's local network using mail5 as a smtp relay does not work to anywhere.

Postfix is configured to allow relaying from authenticated users (SMTP-AUTH/TLS) and the authentication mechanism works flawlessly. My guess is a configuration option set wrong in /etc/postfix/main.cf or master.cf but what option(s) are wrong has been eluding me still :(

rid

OK. Now I've got a headache... You're probably right, though, in assuming a config issue.

What happens if you're on the local network and do a telnet into the postfix machine, telling it you are from a domain that would possibly crop up as sender domain (used by one of your road warriors) and try to send to a possible recipient outside of your domain (if I understand your post correctly, that is what the machine is doing), will it accept delivery?

To me it sounds like you need to configure the Postfix machine as an open relay but with SMTP-AUTH to prevent abuse. I think Postfix needs some tinkering to get into that state.

Sorry if I'm not very helpful... it's getting late over here...
regards
/RID

fixnix

ASKER

Rid: you are correct with "To me it sounds like you need to configure the Postfix machine as an open relay but with SMTP-AUTH to prevent abuse."

That is exactly how it is set up...it will only send a mail if the user logged in via SMTP-AUTH (with TLS also). Everything worked fine when mail5 was sitting out on a real-world IP. Nobody could relay without entering a username and password, and everyone that could supply proper credentials could relay. Worked like a champ. I hadn't touched it in 148 days of uptime and nobody had any email sending problems.

Now that the server is behind the firewall, on a private subnet, with the firewall having the public IP and forwarding port 25 traffic, relayed mail just seems to disapear. It doesn't bounce and no errors are reported in any logs on the mail5 machine. The postqueue is empty. postfix appears to send the messages according to the logs as per lines in the logfiles like I pasted earlier(

Oct 18 13:53:47 mail5 postfix/pipe[30345]: 96651D8121: to=<brian@savebabies.org>, relay=spamfilter, delay=1, status=sent (mail5.fixnix.net)
Oct 18 13:53:47 mail5 postfix/qmgr[30334]: 96651D8121: removed

)

"status=sent" and "96651D8121: removed" sure looks to me like the message was sent, however tcpdump shows absolutely no attempt to look up the destination MX record or any attempt to connect to the destination mail server via smtp on dest-port 25 and of course the messages never show up at their destination. I'm about to rip the thing off the LAN and throw it out in the open world again...I just wanted to get it working behind the firewall box as it did in the open...which *shouldn't* be difficult.

When a message is composed locally, the send is immediate and can be seen in the tcpdump output.
When a message is sent by a legit and authenticated user, it *seems* to be sent but never leaves the postfix box.

Oh, by the way....thanks a lot for taking the time to try to follow the problem. I'm usually on the answering side myself but this one has me stumped so I figured I'd put those unlimited asking points to use.

rid

Can the authentication process be viewed in tcpdump as well? I'm tinking rejection, but without failure notification. Does the SMTP-auth run through port 25 as well? What is the Postfix machine supposed to do when an authentication attempt fails?

/RID

fixnix

ASKER

Auth works fine and yes, smtp-auth runs on port 25. Tcpdump shows the road warrior connecting. Syslog shows antries from the mail daemon verifying the TLS connection is estabnlished, then authenticated. I do get auth failure entries if I ienter the wrong password, and everything did work fine prior to moving the box behind the firewall, so I really don't think it's a auth issue nor a TLS issue, as the only thing that changed from when it worked outside the firewall to being broken inside, were it's IP address, networks considered local, nameservers, and the line in /etc/postfix/main.cf to contain the proxy_interfaces=[external ip address here]. (the proxy_interfaces line is what fixed local sending...and in my opinion should also have fixed relay sending but it didn't).

Here's another oddity that could be related: At work, I'm using Outlook 2000 for a client (since we rely on calendar and contact sharing throughout the office and do not have an Exchange server). I have an account on that box set up to use my mail5 machine. From a command prompt, a nslookup mail5.fixnix.net resolves to the correct new IP address on the T1. However, according to the tcpdump and syslog, the connection is comming from the old IP address on the SDSL line (remember I mentioned I have an olds Sun box forwarding everything form the old IP:port combinations to the new IP:ports). I've done an ipconfig /flushdns on the work windows2k box but it still connects to the mail server via the old ip address. I suppose outlook caches the entry but for how long and how can it be cleared? I think I'll reboot the office machine today when I get in and see if outlook hits the right address. Maybe somehow everything will "just work" after a windows reboot.

I wont get in the office this morning til later as I'm starting the day at a client, and will be moving my office up a couple floors between today and tomorrow, but still need to squeeze in a fix for this mail problem ASAP. Any more posts/ideas from anyone woud be appreciated.

fixnix

ASKER

Well, I didn't make it in to the office today, but did try to send from a different machine and with the same results. I'm not sure what else to do except create a file following a timeline with the output from tcpdump and syslog for both a sucessful local send using /bin/mail and an unseccussful send from a road warrior if anyone would want to take a look to pick up something I may have missed.

If anyone is willing to eyeball such a file, let me know what tcpdump options you'd prefer. I usually use 96 for the packet lengths and have it display numeric ip info instead of resolving. 96 won't capture the full packet in many cases, but it has been enough to see what's going on (and larger than the default of I think 48).

If anyone would like any other info or tests run just ask. I've really been running out of ideas here. I don't understand what is different between a locally originated mail and a relayed mail as far as sending to the next hop. Once a message qualifies as being okay to relay (which is the case according to the mail log entries), it should be treated the same as a locally originated mail, shouldn't it? Locally generated mail goes out fine, relay mail reportedly gets sent but never arrives at the destination and in fact there is never an outbound connection attempt made...unlike locally composed mail which immediately induces an MX lookup for the destination and a connection established according to tcpdump...further evidenced by the fact that the mail makes it to it's destination!

fixnix

ASKER

UPDATE:

When I woke up this morning, all the test mails I had sent over the past 2 days suddenly arrived. It looks like the problem has "fixed itself". I can only suspect that somehow it was a DNS issue and whatever old records still floating around in DNS caches around the world has finally propogated to it's deserved oblivion and the proper records are available throughout the domain name system now...although that doesn't fully make sense to me since I have done digs on my local nameservers as well as a few others scattered around the net and all of them have had accurate A and MX records for my machine while I was having the problem. RDNS is still broken because the ISP missed a step in delegating reverse lookups to my nameservers, but I'll get that resolved today or tomorrow.

I'll still give points if anyone has a decent theory or can explain what caused my problem. Saying "it was a DNS issue" isn't going to help anyone finding this thread if they have the same problem since I don't know exactly what the magic change was that fixed it. I'm sure glad it works now though....been a real headache.

rid

Interesting. I think you have the solution in the time it takes for a change to settle on all involved parties, which you point out yourself. I didn't think of this at all.
Regards
/RID

fixnix

ASKER

Well, bad news. The test sends I did were from home, on a windows box that has it's own IP to the world, on the same subnet as the firewall box that the postfix machine sits behind. I still can't send from work and had a call from a client today that also uses my postfix box to relay mail out of his office (he's mooching off a T1 to his building but part of his "free use of the T1" is that he can't run any external services (mail, www, nntp, etc) and their ISP does not offer an outbound smtp server (so the landlord claimed, anyway....but the T1 is with Cavalier and I have a couple Phonom T1's (Cavalier owns Phonom) and they do offer an smtp relay...smtp.cavtel.net, so I may have that client try to use smtp.cavtel and see if it works).

I'm still baffled.

If it helps clarify my network and what works from what box, here's some ASCII art:

|-------------------| |----------|
| Internet router |=======| switch |
|-------------------| |----------|
_______________ | | _____________________
| IPCop Firewall |___________/ \_________| Windows Workstation |
|-------------------|Every IP here is world-visible|---------------------------|
|| This windows machine can relay mail just fine
|| Private IP on the postfix server Using SMTP-AUTH over a TLS connection on port 25
|-------------------|
| Postfix Server |
|-------------------|
Mail sent from a shell on the Postfix server itself works fine.

Any users trying to connect from anywhere else seems to connect and the message to send hits the Postfix queue, the queue gets cleared and a log entry says the message was sent, however no connection attempt is made by postfix to send the mail. No MX lookup for the destination address(es), no outbound TCP connection attempted (and therefore not blocked by the firewall), no errors in any logs, nothing.

It is not a matter of the windows workstation being on the same 32 IP subnet, either, because it just so happens that our T1 at my office is an adjacent block of IP's to my home T1, so in postfix's config file I told it the whole class C is to be considered local.

I'll mention again that the Postfix box relayed mail just fine when it sat outside the firewall (in the picture above, replace the IPCop Firewall box with the Postfix Server) and the port forwarding is working properly (otherwise the mail to send would never be accepted by postfix nor would there be log entries showing a successful TLS session established and the SMTP-AUTH user being authenticated sucessfully).

If I have time tonight, I'll move the postfix server back outside the firewall just to make sure it still works sitting out naked to the world as it did before, but I'd obviously rather have it sitting cozily protected in the DMZ among the inbound mail server, DNS, and www server boxes.

Any ideas?

rid

Curiouser and curiouser...
Not many ideas, no... I just sit back and read your reports of tests you have performed...learning....

I was thinking...did you reboot the postfix machine at any point or run the postfix refresh (or similar) command to make it register any possible alterations, about from whom mail should be accepted for relaying... but if that were the problem, messages wouldn't end up in the queue, would they, even less leave it. What rules have you set up for rejected messages (if any)?
/RID

fixnix

ASKER

I haven't rebooted it but have done either '/etc/init.d/postfix stop' and '/etc/init.d/postfix start' or 'postfix reload' after every change.

You are correct in unathorized relay attempts not ever putting a message in the queue. I get 100's of attempts per day to spew out spam, and they show up in the logs as being given a "You are not allowed to relay" error.

I stuck with the default postfix config files and added/changed the appropriate lines to allow relaying from authenticated users, to use TLS, and to force SMTP-AUTH. I also get spam relay attempts daily that establish the TLS connection sucessfully, then fail the authentication as should happen.

I didn't add any extra rules, and no messages are configured to be dropped silently (as far as I know....that will be something else to check on I suppose).

One user is my sister, and she is on comcast at her home. I never added her IP as trusted because I didn't know if it was static and didn't want to open up a wide range of IP's that she might get. Instead, she is treated like a road warrior and is authenticated by SMTP-AUTH over TLS. She had no problems sending mail when the Postfix box was outside the firewall, and she can't send now that it's behind one. Her messages appear to be sent both from her view in Outlook (message moves from the outbox to 'Sent Items') and from the postfix logs (TLS established, AUTH accepted, message accepted, queued, then supposedly sent...yet it never leaves the postfix box).

rid

How.... odd... gotta get some shuteye now. I'll be tuning in tomorrow.
/RID

fixnix

ASKER

I believe I finally figured it out!

Quite a while back, I had been playing around on the postfix box to make it handle incoming mail, POP and IMAP over TLS, and running spamassasin. After creating a spreadsheet with syslog and tcpdump output log entries combined, sorted by time, filtering out all but port 25 activity and 53 (to catch MX lookups), I noticed that the roadwarrior outbound messages were being sent through spamassasin, while the locally originated messages were not filtered by spamassasin. I had noticed the "relay=spamfilter" line in the postfix logs but did not consider it important, as I had never finished setting up spamassasin and everything worked before. I have not made any changes to spamassasin in about 6 months, and up until moving the postfix box behind the firewall 2 weeks ago, everything worked fine. Apparently, only checking the things that were changed wasn't enough haha. Here's what my fix was:

I changed one line in master.cf from:

smtp inet n - n - - smtpd
-o content_filter=spamfilter

to:

smtp inet n - n - - smtpd
# -o content_filter=spamfilter

and all is working again....finally!!

I suppose for a complete analysis of the problem, I should go through spamassasin's configuration to see what broke relayed mail when I moved the server from a world-visible address to a private subnet behind the firewall, but since I am not using this box as an incoming filter, and never intended to run outgoing mail through a spam filter (I can trust that my small userbase isn't spewing out spam), I'll just leave the extra detective work alone and bask in the feeling of having this headache removed.

Thanks, rid, for participating in this question. Even though you did not come up with a solution, just having someone active on the thread helped me keep my sanity. Is it appropriate to still dish out the points? Your moral support did help me stick with it long enough to solve the problem, but I'm pretty new to EE and don't want to go against their guidelines...but don't have time to read said guidelines right now so I figured I'd ask first.

ASKER CERTIFIED SOLUTION

rid

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

fixnix

ASKER

These points don't cost me anything...freebies from maintaining the 2000 or 3000 or however many it is expert points per month, so I'll award them to ya anyway. Seriously, just having someone to post back and forth with did help the thought process that eventually led to figuring out the solution, and since you're the only one who took the time to read, post, and ponder possible causes, there's no point in letting the points go to waste. Getting a refund wouldn't benefit me at all since I didn't pay for them and have unlimited asking points.

rid

I won't argue with that. Thanks.
/RID